[ ] augment the ERG with the necessary mal-rules for better coverage before tweaking bare-np rule: NOTE: parsed 1933 / 2500 sentences, avg 16204k, time 421.17581s = 77.32% coverage, 168ms/sent after tweaking bare-np rule and adding robust-derived [SF comm] root: [ note: declarative sentences can still get through, for some reason -- the ERG is not very helpful about this with the SF property. maybe the mood -> imperative property? ] NOTE: parsed 2473 / 2500 sentences, avg 29179k, time 843.70032s = 98.92% coverage, 337ms/sent ... using a root derived from root_robust_s instead: NOTE: parsed 2457 / 2500 sentences, avg 28547k, time 825.39240s = 98.28% coverage + what is the ERG ambiguity rate? >= 10 readings for 71% of inputs - RCL producer needs to 1. include (some) ERG ambiguity 2. generate its own (ranked) ambiguity, e.g. when 'move' vs 'take' is unclear e.g. when the preposition is unclear 3. let the motion planner rule out some cases, take the first remaining one [x] ERG doesn't have "square" as a measure noun Move the green cube one square to the left. should be parsed like Move the green cube one inch to the left. we *do* parse: Move red cube one step forward. direction clause gets detected as the loc_nonsp_rel, whose ARG2 is "one step", and "forward_p_rel" modifies "step" [x] ERG doesn't have "left" as an adjective. "There are seventeen left penguins in that display." maybe it's genre-specific... but it seems clear that I should add it. [x] ERG doesn't have "back" as an adjective, or "center"(?), or "front"(?) [x] indicator(other) should just be dropped [x] disallow named_rel ... "drop the red block" -> person named Red [x] bad parse for 1169 coming from coordinating "put" with "is", inside the relative clause; could block by banning all readings with verbs with SF != comm - hypothesis: all relative clauses in the data are copula - same problem in 3369, hypothesis: accounts for most/all of the (sequence:) vs. (event:) problems + that was about half of those sequence-event errors, but there are more. [x] fail to parse: put the green block on the yellow block near the centre of the board - problem 'centre'??? doubt it... - hypothesis: acounts for most/all of the (event:) vs (sequence:) problems amazing... the sentence came up as [DIALECT br] and our root condition said [DIALECT us]. (now fixed) [x] multiple colors [x] instance references show up in MRS as: pron_rel(x) show up in RCL as: id(1) on the antecedent type(reference) reference-id(1) on the anaphor [x] type references show up in MRS as: generic_entity_rel(x) card_rel(x, "1") show up in RCL as: id(1) on the antecedent type(type-reference) reference-id(1) on the anaphor [x] reject readings with a PP modifier on 'take' clauses or multiple PP modifiers on any clause [x] try disabling NN compounding for better accuracy -- might help with 'Move red cube one step forward.', where there's a funky compounding first reading... though we seem to have another way to discard that too (no direction clause) [x] reject non-imperative readings -- either as a filter or an ERG root_informal_imperative [x] allow (multiple) indicators and multiple colors for now, only translate the indicator ``single''... pass all other adjectives as indicators as-is [x] sometimes, a generic_entity_rel can be an instance reference... [x] 'closest to X' should come out as 'nearest X' mrs says close_a_to_rel(e,w,x), superl(e) 250-ish instances of 'nearest' -- probably realized different ways on the surface 178 of them are close_a_to, presumably all with superlative [x] hundreds of cases where we mispredict 'move' vs 'drop' [x] decode left_n_1_rel as (indicator: left) (type: region), also right_n_1_rel and top_n_1_rel and back_n_1_rel and center_n_1_rel [x] split sentence-level coordinations [x] classifier -> what action [x] which x is the entity [x] which label or event represents the destination (if relevant) [x] translate 'x' variables into (entity:) clauses [x] pronoun handling instance reference "it" empirically always refers to first entity (direct object of first verb) type reference "one" rare, usually refers to first entity (29 / 33 times in training data) [x] color classifier [x] type classifier [x] indicator classifier [x] cardinal classifier [x] translate PPs into (spatial-relations:) clauses [x] classifier -> what relation; ARG2 is always(?) the entity [x] direction clauses [x] entity modifiers [x] translate ARG2 s into an (entity:) [x] try ERG's built-in robust barenp rule didn't help evaluation??? exact match possible but perhaps too crude / unlikely ... accuracies of each classifier lots of classifiers to build --- all at once, and very simple? converter builds RCL's optionally without translating, and then learner compares the output to the gold and defines a rewriter over all the string types - some will require special handling, e.g. ordinal/cardinal -- but perhaps most won't destination detection -- required for 'move', illegal for 'take', and usually present but optional for 'drop' -- can rule out readings with no destination for 'move', and disprefer readings with no destination for 'drop' [x] how do we know if a verb is 'move' or 'drop'? per verb, seq length, and seq position how many 'move', how many 'drop'? 1322 sequences of length 1 with destination 793 'move.dest' 50 'drop.dest' probably should never issue 'drop.dest' for a 1-sequence? what would a good reason be? no destination 309 'take' 170 'drop' 678 sequences of length 2, all of which are (take, drop.dest) [x] 179 cases where we fail to produce any output at all insufficient candidates 13375 -- all but 3 / 252 analyses have (useless) compound_rels ... revised grammar has just 3 analyses, but don't like them because 'take' gets a PP 13983 ... first 2000+ readings all have _leave_v_transfer_rel 17839? ... no parses without compound_rel ... no parses for 'The all gray stack arose.' without compound_rel 19988? all compound_rel 7773? all compound_rel hyphenated color causes compound_rel ... lots of cases [x] filter RCL hypotheses with the motion planner [x] move vs drop ambiguity [x] rule out entities that don't exist [x] measure phrases 65-ish instances of 'measure' relations: left, forward, backward come up as loc_nonsp_rel sometimes ... this is the measure NP business [x] fix *relative* measure phrases 15334 move X 2 squares to the left of Y ... measured spatial relations can be relative to something (spatial-relation: (measure: ...) (relation: ...) (entity: ...)) the entity is *optional* and assumed to be the object being modified if absent [ ] use dan's hyphenated-color fix from 1308 to 1284 .. regression why regression? [ ] try dan's 'all' lexeme in email [ ] try trunk ERG [x] kill off relative_mod_rel [x] "left most" and "right most" ... in some cases there will be bad parses of these "the left most cyan block" -> superl(cyan) those superl_rel's are broken anyway... ARG1 unbound catch that, at least. [x] left SIDE [ of x ] ... there are lots of these. significant component of the FAILs. on the right side of the red cube -> (relation: right) (entity: cube) on the left side of the gray prism -> (relation left) (entity: prism) on the left side of the red cube at right side -> within right region on the right side of the red cube on the left side of the gray prism on the left side of the red block from the right side -> within right region [x] investigate 'within' vs 'above' preposition 'from' region/corner -> within? 'to' region/corner -> within? [x] disable compound_rel in the grammar... it's swamping the top-10 limit [x] increase ambiguity from ACE 10 -> 100? old ambiguity for 'train', with compound_rel enabled: 17 0 125 1 25 2 172 3 32 4 6 5 88 6 38 7 34 8 26 9 1437 10 new ambiguity, with compound_rel disabled, but -n 100: 36 0 156 1 60 2 253 3 83 4 31 5 56 6 15 7 29 8 34 9 256 10-19 95 20-29 70 30-39 50 40-49 132 50-99 644 100+ next steps: - analyze more FAIL error cases - look at long error tail and find broad error groupings - see what ambiguities can be added in the RCL-generation stage and resolved by the planner there are 177 'FAIL' errors there are 269 'spatial-relation' structural errors those are mutually exclusive, and together account for: 446 / 547 = 81.5% of remaining errors looking at errors involving structure of spatial relations... PP attached to "take" instead of its object: 2599 Take the X from on top of the Y and place it Z ... 'from on top of': from has ARG2 u19, but its ARG0 is modified by on yuck. quite rare... ignore it? 19518 Move the X from the (top of the) Y over the Z attached "from Y" to "move" and "over Z" to Y, instead of "from Y" to X and "over Z" to "move" ... world-filtering doesn't fix this for some reason bad relative clause 18266 Place the X on the Y that is closer to the Z 18040 Move the X on the Y that is closest to the Z parsed "that is closer" as its own relative clause, with "to the Z" separated 14450 Pick up the X placed on top of the Y and move it on the Z just dropped the "placed on top of the Y" altogether... conclusion: some bad parses, some room for improved interpretation of existing parses 14109 Place the red block next to the yellow block. incorrect gold annotation -- first time I've found that in this treebank! ... gold semantics is functionally correct in a sense, but definitely flawed 15832 pick yellow block and place it 2 steps right to green block wrong parse? probably... 'to green block' is PP modifier of 'place', and '2 steps right' is a separate subordinated modifier clause 00048 move X on top of the Y in the corner nearest the Z 'nearest the Z' can attach to 'Y' or 'corner' we guess differently than their annotation, but it turns out to mean the same thing 2142 .... red and gray stack annotation shows it as red and WHITE stack 2392 ... the light grey brick ... the dark grey brick ... the light grey tower ... annotation resolves these to white, gray, white 3282 ... the grey block annotation: white are we expected to fix that with world knowledge? I guess so. systematic ambiguity gray -?> white? 10659 ... indicator 'top' in the command is dropped in the translation; with it present, nothing is found. ... lots of 'near to', 'nearer', 'closer', etc to work through 'that is sat on top of' !! [ ] "right to the green block", "just right to the...", "left to"... ! who are these annotators??? [x] allow measure NPs to modify N's (danf) code from Dan to fix 'Put it 2 yards away.' correct the definition of `npadv_measnp_phrase' in syntax.tdl by changing the value of SYNSEM..HEAD.MOD..HEAD from `v_or_a' to `n_or_v_or_a'. - the extra ambiguity causes a slight regression, sadly - disliking modified PPs experiment: these frequently cause trouble, but naively blocking it is a slight regression items that work without blocking and don't with blocking: 2051 Drop the pyramid exactly down. 7246 Move the yellow pyramid on top of the green block and place it on to the blue block. 4700 Drop the pyramid exactly below. 9533 place the red tetrahedron exactly down. 6778 place the tetrahedron directly below 9790 Take the blue block and place it on to green block. 7352 Move the blue block and place it on to the green block. 4213 Move the grey block on top of the red and green block and place it on to the single red block. 9066 Move blue pyramid and place it on to the blue block. 13081 Remove the blue block from above the yellow block, and place it on the top of the green and blue slab. 3878 Place the red pyramid exactly down 7967 Just drop the red pyramid exactly below two categories: exactly/directly down/below on to X / from above X in both cases, the modified PP has no ARG2 ? items that with with blocking and don't without blocking: 16811 3488 10922 17607 3647 19542 18338 - oracle experiment: oracle over RCL produced from all of top 100 parses correct RCL present but not first for 28 items ... vs correct and first for 1854 items chances of reranking that correctly? zilch? ANNOTATION ERRORS: 19982 superfluous id(1) with no anaphor (semantically harmless) 8215 superfluous id(1) with no anaphor (semantically harmless) 13715 superfluous id(1) with no anaphor (semantically harmless) 14109 destination mislabeled as modifier (semantically harmless) 1670 cube-group instead of cube scene 228 12513 cube-group instead of cube scene 232 14005 cube-group instead of cube scene 827 17703 cube-group instead of cube scene 228 17545 cube-group instead of cube scene 228 accomplishments: [x] collapse {P the top of} into just {top} from 211 to 302 ... nice. 15%. [x] instance reference: from 302 to 351 [x] multiple colors / indicators from 351 to 383 [x] turquoise -> cyan from 383 to 402 ... nice. 20%. [x] pink/purple -> magenta from 402 to 434 [x] 'corner' type [had 120 errors] no change! [x] top,right,rightmost,leftmost indicators from 434 to 457 [x] allow spatial relations on entities from 457 to 526 ... nice. 25%. [x] 'block', 'space, 'square' measure lexemes no change! (measure RCL not supported yet) [x] 'left' lexeme and indicator from 526 to 558 [x] adjectival spatial relations: 'closest' from 558 to 571 [x] block readings with too many PPs on verbs from 571 to 602 ... nice. 30%. [x] convert nominal left, top, etc to 'region' from 602 to 676 [x] block readings with named_rel from 676 to 689 [x] drop indicator(other) from 689 to 698 [x] drop indicator(light) and indicator(dark) from 698 to 699 [x] _at_p -> within from 699 to 711 ... nice. 35%. [x] add center, front, back lexemes/indicators from 711 to 734 [x] let move,drop have no destination from 734 to 898 ... nice. >40%. [x] make 2-element sequences always take/drop from 898 to 936 ... nice. 45%. [x] disallow compound_rel from 936 to 947 [x] change 'drop.dest' to 'move' for some words from 947 to 1108 ... nice. 55%. [x] center_n_of_rel -> region from 1108 to 1118 [x] near,nearest,next+to,beside prepositions from 1118 to 1133 [x] generic_entity_rel references from 1133 to 1163 [x] card_rel type-reference from 1163 to 1170 [x] disallow NP coordination from 1170 to 1195 [x] cube->cube-group when multiple colors from 1195 to 1204 ... nice. 60%. [x] modifier card_rel -> (cardinal: x) from 1204 to 1214 [x] change all hyphens to spaces from 1214 to 1238 [x] allow [SF prop] except 'put' and 'leave' from 1238 to 1239 [x] spatial-relations from verb+PP rel-clauses from 1239 to 1308 ... nice. 65%. [x] block drop or move based on whether holding from 1308 to 1339 [x] repair 'P top of X' trick from 1339 to 1396 [x] deal with 'to' part of move_v_from-to_rel from 1396 to 1403 ... nice. 70%. [x] deal with 'from' part of '''''''''''''''' from 1403 to 1414 ... 'dev' evaluation = 68% [x] simple measure phrases (not 'to the left') from 1414 to 1441 [x] measure 'to the left' phrases from 1441 to 1453 [x] treat 'to the left/right of X' like top from 1453 to 1461 [x] 'move X 3 steps to the left of Y' from 1461 to 1464 [x] 'take' resultatives as modifiers from 1464 to 1503 ... nice. 75%. [x] 'top' top region -> front region from 1503 to 1529 [x] broken superlatives, MWe for 'left most' from 1529 to 1547 [x] filter entities with no realizations from 1547 to 1572 [x] '_row_n_of_rel' -> region from 1572 to 1573 [x] block_n_1, triangle from 1573 to 1574 [x] near_a_to_rel from 1574 to 1608 ... nice. 80%. [x] {robot_n_1_rel, you} -> robot from 1608 to 1618 [x] nearest and closest as indicators from 1618 to 1630 [x] forbid subord_rel from 1630 to 1640 [x] relative clauses with resultatives from 1640 to 1643 [x] relocate sprel when coercing move->take from 1643 to 1653 [x] off_p_rel -> 'above' from 1653 to 1655 [x] miscellaneous lexical handlers from 1655 to 1670 [x] drop of_p modifier from 1670 to 1675 [x] drop unknown indicators and adjectives from 1675 to 1687 [x] 'bottom' adjective grammar lexeme from 1687 to 1760 ... nice. 85%. [x] disable compound_rel in grammar, -n 100 from 1760 to 1764 [x] no loc_nonsp_rel with bare NP from 1760 to 1764 [x] on the left side (of x) from 1764 to 1772 [x] to/from region/edge -> 'within' from 1772 to 1782 [x] count resultatives in 'too many pp' check from 1782 to 1789 [x] fix sequence handling bug from 1789 to 1789 [x] vocab: pile, board, ground, border from 1789 to 1791 [x] forbid _far_a ARG1 e from 1791 to 1794 [x] forbid _square_v from 1794 to 1795 [x] 3 squares left of X from 1795 to 1796 [x] 3 squares above X from 1796 to 1798 [x] 3 squares in front of X from 1798 to 1800 ... nice. 90%. ... 'dev' evaluation = 446 / 500 = 89.2% [x] allow resultative w/ 'take' (-> mod) from 1800 to 1801 [x] 'sky blue', 'blue sky', 'block tower', 'combination tower', 'block stack' MWE lexemes from 1801 to 1833 ... 91.5% [x] relax barenp rule to get 'block nearest X' from 1833 to 1849 [x] allow/ignore 'just' verbal modifier from 1849 to 1851 ... 92.5% ... 'dev' evaluation = 454 / 500 = 90.8% [x] disallow appos_rel in the grammar from 1851 to 1853 [x] kill off all 'move.nodest's from 1853 to 1854 [x] forbid in+order+to_x_rel from 1854 to 1858 [x] forbid PPs that are modified unless no OOP from 1858 to 1865 the unless case is e.g. 'directly below', also the common typo 'on to X' or 'from above X' [x] within(corner/region) for move_from-TO from 1865 to 1869 [x] within(edge) -> above(edge) from 1869 to 1873 ... 93.5% [x] no 'of the board' relative measures from 1873 to 1874 ... 'dev' evaluation = 461 / 500 = 92.2% ... 'train' combined with berkeley 1911 / 2000 = 95.5% ... 'dev' combined with berkeley 472 / 500 = 94.4% - backoff to berkeley when mrs output contains '_rel': ... 'train 'combined 1920 / 2000 = 96.0% ... 'dev 'combined 474 / 500 = 94.8% [x] improve statistical system: add 'above' when no relation in an sprel [ from 1670 = 83.5% to 1740 = 87%] ... 'train' combined -> from 1920 to 1925 ... 'dev' combined -> still 474 [x] improve stat system about 'cube' vs 'cube-group' [ from 1740 to 1808 = 90.4% ] ... 'train' combined -> from 1925 to 1933 ... 96.5% ... 'dev' combined -> 476 ... 95.2% [x] used '-accurate' in stat parser [ from 1808 to 1816 = 90.5% ] ... train combined -> 1935 (improvement) ... 'dev' combined -> 475 ... 95.0% (regression) + analyze how many gold outputs actually have ellipsis (use 'token:' data to guestimate) 6 items with ellipsis: 4674 19976 3974 6054 5655 21061 -- exactly the same set identified in failcom.txt, so the system FAILs on all of them -- possibility to automatically detect them? -- likelihood of false-positives - maybe: train a PCFG system to compare and fill in the gaps [ ] if so, be sure to make the MRS system FAIL whenever it's passing through untranslated symbols... (loc_nonsp_rel frequently, a few other preds here and there) - quite promising, in fact... script to-pst/pst.py to build phrase structure trees from RCL, then Berkeley parser trains on those out-of-the-box and can produce a parse for any input, including 'fooz bar bar bar baz'. very simple conversion back from PST to RCL ... to-pst/back.py [x] convert '-' to ' - ' and use NLTK tokenizer for input to parser (since the PST converter does that) [x] insert 'region' when no type present [x] add anaphora stuff results: training set 1670 / 2000 = 83.5% dev set 261 / 500 = 52.2% ... better than nothing, hehehe + start version control --- OFFICIAL EVALUATION // March 21, 2014 around 4:15pm PDT --- make eval => 749 correct out of 909 = 82.4% make beval => 741 correct out of 909 = 81.5% make ceval => 841 correct out of 909 = 92.5% what went wrong? => ERG coverage on train and dev was 99%, but on eval only 90%. ??? also supposed to eval without spatial planner: make eval => 730 correct out of 909 = 80.3% make beval => 741 correct out of 909 = 81.5% make ceval => 823 correct ouf of 909 = 90.5% --- experiments for my own curiosity munging input // May 9, 2014 s/\t\.\.\+[ ]*/\t make eval => 789 correct out of 909 = 86.8% [ = +4.4% relative to official run ] make beval => 741 correct out of 909 = 81.5% [ = +0.0% relative to official run ] make ceval => 858 correct out of 909 = 94.4% [ = +1.9% relative to official run ] --- experiments for my own curiosity, further munging input // June 30, 2014 s/ cell / square /g s/ cells / squares /g make eval => 828 correct out of 909 = 91.1% [ = +8.7% relative to official run ] make beval => 743 correct out of 909 = 81.7% [ = +0.2% relative to official run ] make ceval => 859 correct out of 909 = 94.5% [ = +2.0% relative to official run ]