[ ] augment the ERG with the necessary mal-rules for better coverage
	before tweaking bare-np rule:
	NOTE: parsed 1933 / 2500 sentences, avg 16204k, time 421.17581s
	 = 77.32% coverage, 168ms/sent
	after tweaking bare-np rule and adding robust-derived [SF comm] root:
		[ note: declarative sentences can still get through, for some reason -- the ERG is not very helpful about this with the SF property.  maybe the mood -> imperative property? ]
	NOTE: parsed 2473 / 2500 sentences, avg 29179k, time 843.70032s
	 = 98.92% coverage, 337ms/sent
	... using a root derived from root_robust_s instead:
	NOTE: parsed 2457 / 2500 sentences, avg 28547k, time 825.39240s
	 = 98.28% coverage

+ what is the ERG ambiguity rate?
	>= 10 readings for 71% of inputs
- RCL producer needs to
	1. include (some) ERG ambiguity
	2. generate its own (ranked) ambiguity,
		e.g. when 'move' vs 'take' is unclear
		e.g. when the preposition is unclear
	3. let the motion planner rule out some cases, take the first remaining one

[x] ERG doesn't have "square" as a measure noun
	Move the green cube one square to the left.
	should be parsed like
	Move the green cube one inch to the left.
	we *do* parse:
	Move red cube one step forward.
	direction clause gets detected as the loc_nonsp_rel,
		whose ARG2 is "one step", and "forward_p_rel" modifies "step"

[x] ERG doesn't have "left" as an adjective.
		"There are seventeen left penguins in that display."
		maybe it's genre-specific... but it seems clear that I should add it.

[x] ERG doesn't have "back" as an adjective, or "center"(?), or "front"(?)

[x] indicator(other) should just be dropped

[x] disallow named_rel ... "drop the red block" -> person named Red

[x] bad parse for 1169 coming from coordinating "put" with "is", inside the relative clause; could block by banning all readings with verbs with SF != comm
	- hypothesis: all relative clauses in the data are copula
	- same problem in 3369, hypothesis: accounts for most/all of the (sequence:) vs. (event:) problems
	+ that was about half of those sequence-event errors, but there are more.

[x] fail to parse:
	put the green block on the yellow block near the centre of the board
	- problem 'centre'??? doubt it...
	- hypothesis: acounts for most/all of the (event:) vs (sequence:) problems
	amazing... the sentence came up as [DIALECT br] and our root condition said [DIALECT us].
		(now fixed)

[x] multiple colors

[x] instance references
		show up in MRS as:
			pron_rel(x)
		show up in RCL as:
			id(1) on the antecedent
			type(reference) reference-id(1) on the anaphor
[x] type references
		show up in MRS as:
			generic_entity_rel(x)
			card_rel(x, "1")
		show up in RCL as:
			id(1) on the antecedent
			type(type-reference) reference-id(1) on the anaphor

[x] reject readings with a PP modifier on 'take' clauses or multiple PP modifiers on any clause

[x] try disabling NN compounding for better accuracy
	-- might help with 'Move red cube one step forward.', where there's a funky compounding first reading... though we seem to have another way to discard that too (no direction clause)

[x] reject non-imperative readings -- either as a filter or an ERG root_informal_imperative

[x] allow (multiple) indicators and multiple colors
	for now, only translate the indicator ``single''... pass all other adjectives as indicators as-is

[x] sometimes, a generic_entity_rel can be an instance reference...

[x] 'closest to X' should come out as 'nearest X'
	mrs says close_a_to_rel(e,w,x), superl(e)
	250-ish instances of 'nearest' -- probably realized different ways on the surface
		178 of them are close_a_to, presumably all with superlative

[x] hundreds of cases where we mispredict 'move' vs 'drop'

[x] decode left_n_1_rel as (indicator: left) (type: region), also right_n_1_rel and top_n_1_rel and back_n_1_rel and center_n_1_rel

[x] split sentence-level coordinations
[x] classifier -> what action
[x] which x is the entity
[x] which label or event represents the destination (if relevant)
[x] translate 'x' variables into (entity:) clauses
	[x] pronoun handling
			instance reference "it"
				empirically always refers to first entity (direct object of first verb)
			type reference "one"
				rare, usually refers to first entity (29  / 33 times in training data)
	[x] color classifier
	[x] type classifier
	[x] indicator classifier
	[x] cardinal classifier
	[x] translate PPs into (spatial-relations:) clauses
		[x] classifier -> what relation; ARG2 is always(?) the entity
		[x] direction clauses
		[x] entity modifiers
		[x] translate ARG2 s into an (entity:)


[x] try ERG's built-in robust barenp rule
	didn't help

evaluation??? exact match possible but perhaps too crude / unlikely
	... accuracies of each classifier

lots of classifiers to build --- all at once, and very simple?
converter builds RCL's optionally without translating, and then learner compares the output to the gold and defines a rewriter over all the string types
	- some will require special handling, e.g. ordinal/cardinal -- but perhaps most won't

destination detection -- required for 'move', illegal for 'take', and usually present but optional for 'drop'
	-- can rule out readings with no destination for 'move', and disprefer readings with no destination for 'drop'


[x] how do we know if a verb is 'move' or 'drop'?
	per verb, seq length, and seq position
		how many 'move', how many 'drop'?
	1322 sequences of length 1
		with destination
			793 'move.dest'
			50 'drop.dest'
			probably should never issue 'drop.dest' for a 1-sequence?
				what would a good reason be?
		no destination
			309 'take'
			170 'drop'
	678 sequences of length 2, all of which are (take, drop.dest)

[x] 179 cases where we fail to produce any output at all
	insufficient candidates
		13375 -- all but 3 / 252 analyses have (useless) compound_rels
			... revised grammar has just 3 analyses, but don't like them because 'take' gets a PP
		13983
			... first 2000+ readings all have _leave_v_transfer_rel
		17839?
			... no parses without compound_rel
			... no parses for 'The all gray stack arose.' without compound_rel
		19988? all compound_rel
		7773? all compound_rel
	hyphenated color causes compound_rel
		... lots of cases

[x] filter RCL hypotheses with the motion planner
	[x] move vs drop ambiguity
	[x] rule out entities that don't exist

[x] measure phrases
	65-ish instances of 'measure'
	relations: left, forward, backward come up as loc_nonsp_rel sometimes
	... this is the measure NP business

[x] fix *relative* measure phrases
		15334	move X 2 squares to the left of Y
			... measured spatial relations can be relative to something
			(spatial-relation: (measure: ...) (relation: ...) (entity: ...))
			the entity is *optional* and assumed to be the object being modified if absent

[ ] use dan's hyphenated-color fix				from 1308 to 1284 .. regression
	why regression?
[ ] try dan's 'all' lexeme in email
[ ] try trunk ERG

[x] kill off relative_mod_rel

[x] "left most" and "right most"
	... in some cases there will be bad parses of these
		"the left most cyan block"
		-> superl(cyan)
		those superl_rel's are broken anyway... ARG1 unbound
			catch that, at least.

[x] left SIDE [ of x ]
	... there are lots of these. significant component of the FAILs.
	on the right side of the red cube	-> (relation: right) (entity: cube)
	on the left side of the gray prism	-> (relation left) (entity: prism)
	on the left side of the red cube
	at right side			-> within right region
	on the right side of the red cube
	on the left side of the gray prism
	on the left side of the red block
	from the right side		-> within right region
[x] investigate 'within' vs 'above' preposition
	'from' region/corner -> within?
	'to' region/corner -> within?
[x] disable compound_rel in the grammar... it's swamping the top-10 limit
[x] increase ambiguity from ACE 10 -> 100?
	old ambiguity for 'train', with compound_rel enabled:
			 17 0
			125 1
			 25 2
			172 3
			 32 4
			  6 5
			 88 6
			 38 7
			 34 8
			 26 9
		   1437 10
	new ambiguity, with compound_rel disabled, but -n 100:
		     36	0
		    156	1
		     60	2
			253 3
			 83 4
			 31 5
			 56 6
			 15 7
			 29 8
			 34 9
			256 10-19
			 95 20-29
			 70 30-39
			 50 40-49
			132 50-99
			644 100+

next steps:
	- analyze more FAIL error cases
	- look at long error tail and find broad error groupings
		- see what ambiguities can be added in the RCL-generation stage and resolved by the planner
	there are 177 'FAIL' errors
	there are 269 'spatial-relation' structural errors
	those are mutually exclusive, and together account for:
		446 / 547 = 81.5% of remaining errors
	looking at errors involving structure of spatial relations...
		PP attached to "take" instead of its object:
			2599	Take the X from on top of the Y and place it Z
				... 'from on top of': from has ARG2 u19, but its ARG0 is modified by on
				yuck. quite rare... ignore it?

			19518	Move the X from the (top of the) Y over the Z
				attached "from Y" to "move" and "over Z" to Y,
					instead of "from Y" to X and "over Z" to "move"
				... world-filtering doesn't fix this for some reason
		bad relative clause
			18266	Place the X on the Y that is closer to the Z
			18040	Move the X on the Y that is closest to the Z
				parsed "that is closer" as its own relative clause, with "to the Z" separated
			14450	Pick up the X placed on top of the Y and move it on the Z
				just dropped the "placed on top of the Y" altogether...
		conclusion:
			some bad parses, some room for improved interpretation of existing parses

		14109	Place the red block next to the yellow block.
			incorrect gold annotation -- first time I've found that in this treebank!
				... gold semantics is functionally correct in a sense, but definitely flawed
		15832	pick yellow block and place it 2 steps right to green block
			wrong parse? probably... 'to green block' is PP modifier of 'place', and '2 steps right' is a separate subordinated modifier clause
		00048	move X on top of the Y in the corner nearest the Z
			'nearest the Z' can attach to 'Y' or 'corner'
			we guess differently than their annotation, but it turns out to mean the same thing


	2142  .... red and gray stack
		annotation shows it as red and WHITE stack
	2392 ... the light grey brick ... the dark grey brick ... the light grey tower ...
		annotation resolves these to white, gray, white
	3282 ... the grey block
		annotation: white
	are we expected to fix that with world knowledge?  I guess so.
		systematic ambiguity gray -?> white?

	10659 ... indicator 'top' in the command is dropped in the translation; with it present, nothing is found.

	... lots of 'near to', 'nearer', 'closer', etc to work through

	'that is sat on top of' !!


[ ] "right to the green block", "just right to the...", "left to"... ! who are these annotators???


[x] allow measure NPs to modify N's (danf)
	code from Dan to fix 'Put it 2 yards away.'
	correct the definition of `npadv_measnp_phrase' in syntax.tdl by changing the value of SYNSEM..HEAD.MOD..HEAD from `v_or_a' to `n_or_v_or_a'.
	- the extra ambiguity causes a slight regression, sadly

- disliking modified PPs experiment:
	these frequently cause trouble, but naively blocking it is a slight regression
	items that work without blocking and don't with blocking:
		2051	Drop the pyramid exactly down.
		7246	Move the yellow pyramid on top of the green block and place it on to the blue block.
		4700	Drop the pyramid exactly below.
		9533	place the red tetrahedron exactly down.
		6778	place the tetrahedron directly below
		9790	Take the blue block and place it on to green block.
		7352	Move the blue block and place it on to the green block.
		4213	Move the grey block on top of the red and green block and place it on to the single red block.
		9066	Move blue pyramid and place it on to the blue block.
		13081	Remove the  blue block from above the yellow block, and place it on the top of the green and blue slab.
		3878	Place the red pyramid exactly down
		7967	Just drop the red pyramid exactly below
		two categories:
			exactly/directly down/below
			on to X / from above X
			in both cases, the modified PP has no ARG2 ?
	items that with with blocking and don't without blocking:
		16811
		3488
		10922
		17607
		3647
		19542
		18338
	

- oracle experiment:
	oracle over RCL produced from all of top 100 parses
		correct RCL present but not first for 28 items
		... vs correct and first for 1854 items
		chances of reranking that correctly? zilch?


ANNOTATION ERRORS:
	19982	superfluous id(1) with no anaphor (semantically harmless)
	8215    superfluous id(1) with no anaphor (semantically harmless)
	13715   superfluous id(1) with no anaphor (semantically harmless)
	14109	destination mislabeled as modifier (semantically harmless)
	1670	cube-group instead of cube	scene 228
	12513	cube-group instead of cube	scene 232
	14005	cube-group instead of cube	scene 827
	17703	cube-group instead of cube	scene 228
	17545	cube-group instead of cube	scene 228

accomplishments:
[x] collapse {P the top of} into just {top}		from 211 to 302 ... nice.  15%.
[x] instance reference:							from 302 to 351
[x] multiple colors / indicators				from 351 to 383
[x] turquoise -> cyan							from 383 to 402 ... nice.  20%.
[x] pink/purple -> magenta						from 402 to 434
[x] 'corner' type [had 120 errors]				no change!
[x] top,right,rightmost,leftmost indicators		from 434 to 457
[x] allow spatial relations on entities			from 457 to 526 ... nice.  25%.
[x] 'block', 'space, 'square' measure lexemes	no change! (measure RCL not supported yet)
[x] 'left' lexeme and indicator					from 526 to 558
[x] adjectival spatial relations: 'closest'		from 558 to 571
[x] block readings with too many PPs on verbs	from 571 to 602 ... nice.  30%.
[x] convert nominal left, top, etc to 'region'	from 602 to 676
[x] block readings with named_rel				from 676 to 689
[x] drop indicator(other)						from 689 to 698
[x] drop indicator(light) and indicator(dark)	from 698 to 699
[x] _at_p -> within								from 699 to 711 ... nice.  35%.
[x] add center, front, back lexemes/indicators	from 711 to 734
[x] let move,drop have no destination			from 734 to 898 ... nice.  >40%.
[x] make 2-element sequences always take/drop	from 898 to 936 ... nice.  45%.
[x] disallow compound_rel						from 936 to 947
[x] change 'drop.dest' to 'move' for some words	from 947 to 1108 ... nice. 55%.
[x] center_n_of_rel -> region					from 1108 to 1118
[x] near,nearest,next+to,beside prepositions	from 1118 to 1133
[x] generic_entity_rel references				from 1133 to 1163
[x] card_rel type-reference						from 1163 to 1170
[x] disallow NP coordination					from 1170 to 1195
[x] cube->cube-group when multiple colors		from 1195 to 1204 ... nice. 60%.
[x] modifier card_rel -> (cardinal: x)			from 1204 to 1214
[x] change all hyphens to spaces				from 1214 to 1238
[x] allow [SF prop] except 'put' and 'leave'	from 1238 to 1239
[x] spatial-relations from verb+PP rel-clauses	from 1239 to 1308 ... nice. 65%.
[x] block drop or move based on whether holding	from 1308 to 1339
[x] repair 'P top of X' trick					from 1339 to 1396
[x] deal with 'to' part of move_v_from-to_rel	from 1396 to 1403 ... nice. 70%.
[x] deal with 'from' part of ''''''''''''''''	from 1403 to 1414
	... 'dev' evaluation = 68%
[x] simple measure phrases (not 'to the left')	from 1414 to 1441
[x] measure 'to the left' phrases				from 1441 to 1453
[x] treat 'to the left/right of X' like top		from 1453 to 1461
[x] 'move X 3 steps to the left of Y'			from 1461 to 1464
[x] 'take' resultatives as modifiers			from 1464 to 1503 ... nice. 75%.
[x] 'top' top region -> front region			from 1503 to 1529
[x] broken superlatives, MWe for 'left most'	from 1529 to 1547
[x] filter entities with no realizations		from 1547 to 1572
[x] '_row_n_of_rel' -> region					from 1572 to 1573
[x] block_n_1, triangle							from 1573 to 1574
[x] near_a_to_rel								from 1574 to 1608 ... nice. 80%.
[x] {robot_n_1_rel, you} -> robot				from 1608 to 1618
[x] nearest and closest as indicators			from 1618 to 1630
[x] forbid subord_rel							from 1630 to 1640
[x] relative clauses with resultatives			from 1640 to 1643
[x] relocate sprel when coercing move->take		from 1643 to 1653
[x] off_p_rel -> 'above'						from 1653 to 1655
[x] miscellaneous lexical handlers				from 1655 to 1670
[x] drop of_p modifier							from 1670 to 1675
[x] drop unknown indicators and adjectives		from 1675 to 1687
[x] 'bottom' adjective grammar lexeme			from 1687 to 1760 ... nice. 85%.
[x] disable compound_rel in grammar, -n 100		from 1760 to 1764
[x] no loc_nonsp_rel with bare NP				from 1760 to 1764
[x] on the left side (of x)						from 1764 to 1772
[x] to/from region/edge -> 'within'				from 1772 to 1782
[x] count resultatives in 'too many pp' check	from 1782 to 1789
[x] fix sequence handling bug					from 1789 to 1789
[x] vocab: pile, board, ground, border			from 1789 to 1791
[x] forbid _far_a ARG1 e						from 1791 to 1794
[x] forbid _square_v							from 1794 to 1795
[x] 3 squares left of X							from 1795 to 1796
[x] 3 squares above X							from 1796 to 1798
[x] 3 squares in front of X						from 1798 to 1800 ... nice. 90%.
	... 'dev' evaluation = 446 / 500 = 89.2%
[x] allow resultative w/ 'take' (-> mod)		from 1800 to 1801
[x] 'sky blue', 'blue sky', 'block tower',
'combination tower', 'block stack' MWE lexemes	from 1801 to 1833 ... 91.5%
[x] relax barenp rule to get 'block nearest X'	from 1833 to 1849
[x] allow/ignore 'just' verbal modifier			from 1849 to 1851 ... 92.5%
	... 'dev' evaluation = 454 / 500 = 90.8%
[x] disallow appos_rel in the grammar			from 1851 to 1853
[x] kill off all 'move.nodest's					from 1853 to 1854
[x] forbid in+order+to_x_rel					from 1854 to 1858
[x] forbid PPs that are modified unless no OOP	from 1858 to 1865
	the unless case is e.g. 'directly below', also the common typo 'on to X' or 'from above X'
[x] within(corner/region) for move_from-TO		from 1865 to 1869
[x] within(edge) -> above(edge)					from 1869 to 1873 ... 93.5%
[x] no 'of the board' relative measures			from 1873 to 1874
	... 'dev' evaluation = 461 / 500 = 92.2%
	... 'train' combined with berkeley			1911 / 2000 = 95.5%
	... 'dev' combined with berkeley			472 / 500 = 94.4%
- backoff to berkeley when mrs output contains '_rel':
	... 'train 'combined						1920 / 2000 = 96.0%
	... 'dev 'combined							474 / 500 = 94.8%

[x] improve statistical system: add 'above' when no relation in an sprel [ from 1670 = 83.5% to 1740 = 87%]
	... 'train' combined ->                     from 1920 to 1925
	... 'dev' combined -> still 474
[x] improve stat system about 'cube' vs 'cube-group' [ from 1740 to 1808 = 90.4% ]
    ... 'train' combined -> from 1925 to 1933 ... 96.5%
	... 'dev' combined -> 476 ... 95.2%
[x] used '-accurate' in stat parser [ from 1808 to 1816 = 90.5% ]
	... train combined -> 1935 (improvement)
	... 'dev' combined -> 475 ... 95.0% (regression)

+ analyze how many gold outputs actually have ellipsis (use 'token:' data to guestimate)
	6 items with ellipsis:
		4674
		19976
		3974
		6054
		5655
		21061
	-- exactly the same set identified in failcom.txt, so the system FAILs on all of them
		-- possibility to automatically detect them?
			-- likelihood of false-positives

- maybe: train a PCFG system to compare and fill in the gaps
	[ ] if so, be sure to make the MRS system FAIL whenever it's passing through untranslated symbols... (loc_nonsp_rel frequently, a few other preds here and there)
	- quite promising, in fact... script to-pst/pst.py to build phrase structure trees from RCL,
		then Berkeley parser trains on those out-of-the-box and can produce a parse for any input,
		including 'fooz bar bar bar baz'.
		very simple conversion back from PST to RCL ... to-pst/back.py
		[x] convert '-' to ' - ' and use NLTK tokenizer for input to parser (since the PST converter does that)
		[x] insert 'region' when no type present
		[x] add anaphora stuff
		results:
			training set	1670 / 2000 = 83.5%
			dev set			261 / 500 = 52.2%
		... better than nothing, hehehe

+ start version control


--- OFFICIAL EVALUATION //  March 21, 2014 around 4:15pm PDT ---
make eval =>  749 correct out of 909 = 82.4%
make beval => 741 correct out of 909 = 81.5%
make ceval => 841 correct out of 909 = 92.5%

what went wrong?
	=> ERG coverage on train and dev was 99%, but on eval only 90%. ???

also supposed to eval without spatial planner:
make eval =>  730 correct out of 909 = 80.3%
make beval => 741 correct out of 909 = 81.5%
make ceval => 823 correct ouf of 909 = 90.5%


--- experiments for my own curiosity munging input // May 9, 2014
s/\t\.\.\+[ ]*/\t
make eval =>  789 correct out of 909 = 86.8%    [ = +4.4% relative to official run ]
make beval => 741 correct out of 909 = 81.5%    [ = +0.0% relative to official run ]
make ceval => 858 correct out of 909 = 94.4%    [ = +1.9% relative to official run ]

--- experiments for my own curiosity, further munging input // June 30, 2014
s/ cell / square /g
s/ cells / squares /g
make eval =>  828 correct out of 909 = 91.1%    [ = +8.7% relative to official run ]
make beval => 743 correct out of 909 = 81.7%    [ = +0.2% relative to official run ]
make ceval => 859 correct out of 909 = 94.5%    [ = +2.0% relative to official run ]