TODO: - bug(?): when a rule is promoted from regular lexical to orthographemic by finding an entry in the irregulars table, we should also add a free "pass-through" regular orthographemic pattern, so that the rule will still match things that aren't in the irregulars table fully automated regression testing both parsing and generation ERG: mrs, csli, hike, maybe something else too HaG: hausa.items GG: mrs, babel berthold wants - support for more than one POS tag in YY mode - *updating* an ace tsdb profile -- doesn't work, apparently? - LUI support -generator trees +full chart -partial chart -simple MRS ideas: - if the config.tdl file doesn't exist, we print some silly errors - if the user swaps the -G and -g options, we overwrite their config.tdl file... we shouldn't do that. - properly support lettersets with nonascii characters - fix recursive labeling to check for cycles or empty SLASH difflists - [mode to?] make output more script-friendly - automatic time profile of a grammar parsing: pretty good, but add - orthographemic analysis - each rule for lexical parsing - MRS extraction - idiom checking generation: + fixup rule application + each rule - semantic index lookup - trigger rule application - each rule - main generation - unpacking - MRS extraction - subsumption checking - idiom checking? DONE: unreleased - trigger rule DAG specialization mechanism - rebuilt HCONS matching in transfer rules 0.9.6pre1 - "cleanup" transfer rules, which apply right after unpack() calls extract_mrs() 0.9.5 - one more TDL case sensitivity... - Version.lsp reading is more robust 0.9.4 - TDL operators are now case insensitive in their LHS - bug fix: generation derivation trees were inflecting token strings in the wrong order - bug fix: copy_mrs() wasn't copying the ->dg field - new --generation-server=LANG option (watches ~/tmp/.transfer.USER.LANG and generates from it) - LUI support for displaying realizations (click for tree, with active nodes) 0.9.4pre4 - fflush for generator script consumption - don't crash when EPs have no LBL - derivation trees from generation have proper token structures and orthographies - new option --show-realization-trees, which prints derivation trees for generation results - new option --show-realization-mrses, which prints MRSes for generation results - support for labeling SLASHes, with configuration options recursive-label-path-in-label and recursive-label-path-in-sign - support for special :c command to show last parse chart in LUI - slightly better communication with LUI (e.g. actually notice when LUI exits, suppress some needless noise) - report an error when attempting to parse and no REPP is loaded, unless in YY mode - new --report-trees option for sending labelled trees to TSDB (or stdout even) 0.9.4pre3 - YY mode - don't output a properties clause for MRS variables that have no properties 0.9.4pre2 - fix MRS characterization output format (thanks Berthold) - new optional post-generation token mapping stage, for visually fixing up generated strings 0.9.4pre1 - bug fixes in fixup - moving towards configurability for transfer-only grammars 0.9.3 - basic version of output-enabled transfer rules, used for mrs fixup for generation - changed post-generation subsumption test to operate on *external* MRSes 0.9.3pre2 - bigger TDL buffer: JaCY's QC skeleton needed it - allow lexemes to have STEM parts that aren't strings 0.9.3pre1 - better parsing of MRS string constants - config option to specify whether LTOP is extracted or invented - fix QC-from-instance loader - changes that speed up lattice mapping a lot in some cases - profiling mode "-i", for parsing - improved filtering of orthographemic rule chain hypotheses 0.9.2 - fix TDL reader/dagify to not conflate coreferences with the same name in different :+ addenda - fix a unicode bug in token mapping - fixed an obscure bug wherein GLB types could have incorrect constraints - don't load token mapping rules when token mapping is disabled - support more than 256 features - support ^ and $ in token mapping positional constraints - support spaces showing up in more unexpected places in TDL syntax - the path to the label within parse-node instances is now configurable (LNAME feature) - semantic indexing now pays attention to the lex-rels-path and rule-rels-path configurations - quickcheck can be loaded from a PET QC instance - new configuration options for limiting the number of orthographemic rules to apply - new configuration option to specify how much room to preallocate in the freezer. - new configuration option to specify what file[s] to load irregulars forms from - new configuration option to specify a suffix for rule names given in irregulars tables 0.9.1 - remove dependence on #include - change ~/logon/ to ${LOGONROOT} in Makefile 0.9 ----- - potentially use POS tagger to prune lexical ambiguity - but POS tagger makes mistakes... - preprocessor characterization problem: ... hole for 'n' ... hole for ''' ... hole for 't' ... lost 'n'=>4 filling hole 't' with 'n' ... lost '''=>5 ... lost 't'=>6 `(.) +n't' -> `\1n't' yields ` I don't know. ' + debug: I believe there are 15 747 724 136 275 002 577 605 653 961 181 555 468 044 717 914 527 116 709 366 231 425 076 185 631 031 296 296 protons in the universe and the same number of electrons. + currently, that results in a "too much RAM" error. that seems reasonable. delete deleted daughters from daughters of rule, since they can't ever be present in unified-in daughters figure out why unpacking is so much slower than PET (sometimes 10x slower!) - mrs extraction is the big bottleneck - approach one: memoization; most bits and pieces of MRS are reused many times - tried this on mrs_var's with good results, should also do ep's and hcon's - type names perhaps should be pointers to types, since much time is spent looking up and comparing types - maybe some other aspects of unpacking are slow? not clear yet. learn something from head-corner strategy ignore punctuation-chars for jacy try to prove/analyze interesting things at grammar compile time - *intelligently* auto pick quickcheck paths - make quickcheck used for packing ignore packing restrictor - and the shameless hack(tm) in qc.c with 0-1-lists - e.g. the INFLECTD is monotonic, '-' becomes '+' - this would help rule out applying inflectional rules like non3sg_fin_verb when 3sg_fin_verb is on the orth-agenda - might be lots of other such features we could find - auto pick packing restrictor - for each type of rule, precompute segments of the rule which are non-reentrant with ARGS[k] - then on filling in ARGS[k], copy() can know to structure-share without looking at those segments optimize lexicon storage; maybe use provided lexdb schema maybe support lexdb make STEM actually be updated by ortho rules; apparently some grammars depend on it.