TODO: - potentially use POS tagger to prune lexical ambiguity - but POS tagger makes mistakes... - preprocessor characterization problem: ... hole for 'n' ... hole for ''' ... hole for 't' ... lost 'n'=>4 filling hole 't' with 'n' ... lost '''=>5 ... lost 't'=>6 `(.) +n't' -> `\1n't' yields ` I don't know. ' - have a unicode.c file with wide-char and mbs routines in it debug: I believe there are 15 747 724 136 275 002 577 605 653 961 181 555 468 044 717 914 527 116 709 366 231 425 076 185 631 031 296 296 protons in the universe and the same number of electrons. delete deleted daughters from daughters of rule, since they can't ever be present in unified-in daughters figure out why unpacking is so much slower than PET (sometimes 10x slower!) - mrs extraction is the big bottleneck - approach one: memoization; most bits and pieces of MRS are reused many times - tried this on mrs_var's with good results, should also do ep's and hcon's - type names perhaps should be pointers to types, since much time is spent looking up and comparing types - maybe some other aspects of unpacking are slow? not clear yet. learn something from head-corner strategy [x] freeze qc settings into grammar file - cool: we compile it as a dynamic library at grammar load time, and then dlopen() it at parse time. ignore punctuation-chars for jacy try to prove/analyze interesting things at grammar compile time - *intelligently* auto pick quickcheck paths - make quickcheck used for packing ignore packing restrictor - and the shameless hack(tm) in qc.c with 0-1-lists - e.g. the INFLECTD is monotonic, '-' becomes '+' - this would help rule out applying inflectional rules like non3sg_fin_verb when 3sg_fin_verb is on the orth-agenda - might be lots of other such features we could find - auto pick packing restrictor - for each type of rule, precompute segments of the rule which are non-reentrant with ARGS[k] - then on filling in ARGS[k], copy() can know to structure-share without looking at those segments optimize lexicon storage; maybe use provided lexdb schema maybe support lexdb make STEM actually be updated by ortho rules; apparently some grammars depend on it. [x] make unpacking support 2+-ary - the latest ERG versions have several 4-ary rules and one 5-ary! loading: use an instance to define configuration paths? worthwhile experiment: check to be sure we're actually doing maxent correctly! - we weren't, quite: we were ignoring scores for surface strings - we get the gold tree for CSLI 79.2% of the time now - we get the gold tree for the first 70 trees in WS01 67.1% of the time - looks like it's working properly.