BUG: surprising .mem features output: (13797136) [1 (0) n_-_pn-unk_le "Settings\Administrator\" +FROM "] -0.997524 {0 0 0 0} [0 0] first and more serious bug is that the backslashes are there unescaped. - failed FORMs as they went into quoted feature leaf strings second bug is that the +FROM from token structures shows up... - insufficient escaping when edge record records [ +FORM "Settings\Administrator\" +FROM "37" ... ] i.e. the backslashes are part of the content. ... arguably they should be escaped as soon as they are in double quotes, i.e. ace's build_token_dg() plan: load parse forests from tsdb profiles convert to feature forests - unfold so feature contexts are local to 'and' nodes - ungrandparented, initially; ws01 ws01 edge relation on disk takes 230MB reading the edge relation takes ~8s and 2GB RAM ws01 parse forests take 1GB(?) of RAM and ~7s (on top of the exist 8s, for a 15s total) to load ws01 ungrandparented feature forests take 7GB(?) of RAM and ~45s (+15s = 60 total) to load ws01-ws12 might take 90GB of RAM for feature forests train maxent using clever math distribute forest data over multiple processes/nodes master process: use mela to drive the optimizer update() -- send current lambda to nodes, initiate computation of gradient and log-likelihood collect results; add regularization term gradient() and objective() just return the values slave processes: connect to master process, load up a tsdb profile, convert to feature forest loop: wait for lambdas, do math, send back results notes: could split different tsdb profiles to different processes, coordinated somehow need to use existing wescience.mem to get *real* top-1 rate on ws01-ws12 and on ws13, and output a new .mem file - but existing wescience.mem is GP[2]! not comparable. maxent math: highest entropy subject to E(f) = E~(f) highest training likelihood, given exponential model L = sum log P(Y_i | X_i ; lambda) dL/dlambda = sum [ dP(Y_i | X_i ; lambda)/dlambda / P(Y_i|X_i;lambda) ] P(Y_i | X_i ; lambda) = exp(lambda dot F_i,gold) Z^{-1} Z = sum exp(lambda dot F_i,j) dP/dlambda = F_i,gold P(Y_i | X_i ; lambda) - exp(lambda dot F_i,gold) Z^{-2} dZ dlambda dP/dlambda = F_i,gold P(Y_i | X_i ; lambda) - P(Y_i | X_i ; lambda) Z^{-1} dZ/dlambda dZ/dlambda = sum F_i,j exp(lambda dot F_i,j) dP/dlambda = P(Y_i | X_i ; lambda) [ F_i,gold - E(f_i | X_i ; lambda) ] dL/dlambda = sum [ F_i,gold - E(f_i | X_i ; lambda) ] ^^ empirical feature values model expectation ^^ need to efficiently compute E(f_i | X_i ; lambda) = (1/Z) dZ/dlambda = (1/Z) sum F_i,j exp(lambda dot F_i,j) need to compute: Z = sum 1 * exp(lambda dot F_i,j) sum F_i,j * exp(lambda dot F_i,j) dropping the _i's (i.e. which sentence) for convenience, need a way to compute: sum g(j) exp(lambda dot F_j) for certain classes of g (namely g(j) = 1 and g(j) = F_j) bearing in mind that F_j is a feature *vector* ... but we can consider each dimension independently if necessary Z = sum exp(lambda dot F_j) when combining several branches with an OR node, the outer sum splits, so the whole term sums when combining parts of a tree with an AND node, the lambda dot F_j's sum, so the inner exp(lambda dot F_j)'s multiply ... and considering the choices of nested ORs independent, the (z1+z2+z3) * (z4+z5+z6) combine nicely, so the whole term multiplies ... that's how the unpacking probability calculation in ACE works. how about the harder variant? g(j) is a sum of values contributed by different AND nodes sum g(j) exp(lambda dot F_j) is a sum over all unpackings = sum_{n is an AND node contributing v to g(j)} v sum_{all unpackings using n} exp(lambda dot F_j) so, need to be able to compute for any given AND node n: sum_{all unpackings using n} exp(lambda dot F_j) ... this is where the inside/outside thing comes in. get to assume local unpackings of n can all fit into any context above n. = (sum over local unpackings of N of exp(lambda dot local_features) * (sum over all trees containing N modulo what happens inside N of exp(lambda dot features_outside_N)) former term is "inside" score for N latter term is "outside" score for N already saw how to compute inside scores for outside scores, start at top top level OR has outside score 1 an AND's outside score is the sum of the outside scores of the ORs it appears in an OR's outside score is the sum of the (product of outside and inside scores) of the ANDs it appears in, divided by its own inside score also could use all-readings scores, call it Z (= inside * outside) for top level OR, Z = inside score an AND's Z score is the sum of the (Z score divided by inside score) of the ORs it appears in, multiplied by its own inside score an OR's Z score is the sum of the Z scores of the ANDs it appears in ... that's how the readings counter in FFTB works. inside and Z are both exp(sum of weights); likely to get small. store as logs? abstracting away from the exp(lambda dot F_j)'s... some function f(t) = property of an unpacked reading want to sum f(t) over all readings that use node N an OR's score is the sum of the scores of all ANDs it appears in after working all this out by myself, I checked the Miyao and Tsujii paper and I got it right :-) yay.