CSAW -- PCFG approximation of HPSG grammars, with semantics
This is a very brief description of work in progress. If you need to use this as part of your research, please check with Woodley Packard about the most appropriate way to cite it.
The csaw tool is based on the work of several other researchers, including Yi Zhong, Stephan Oepen, and others.
The basic idea is: extract a PCFG from an HPSG-parsed corpus, use ordinary PCFG techniques to derive approximate HPSG derivations for unseen sentences, and then use robust unification to derive MRS structures from those approximate derivations.
To try it on a Linux x86-64 system with approximations of the ERG, download (and decompress) the appropriate files in here. You will also need the file erg-1214-x86-64-0.9.25.dat from the ACE distribution (here). If you don't have the LOGON tree installed, or are using a non-English grammar, you will need to set an environment variable:
Next, you can try parsing. The all-treebanks-gp0.pcfg model is small and fast but not too accurate. The ww-1214-gp2.pcfg is big and slow but more accurate. The purpose of csaw is to support parsing of extragrammatical inputs. The degree of extragrammaticality supported depends on the PCFG used.
$ echo "The the quietly dog slept." | ./csaw erg-1214-x86-64-0.9.25.dat all-treebanks-gp0.pcfg -f
loaded 11 tokens ; chart_size = 7 = 1+biggest 'to'
SENT: The the quietly dog slept.
[ LTOP: h0
INDEX: e2 [ e SF: prop ]
RELS: < [ unknown_rel<0:26> LBL: h1 ARG0: e2 ARG: e4 [ e SF: prop TENSE: past MOOD: indicative ] ]
[ _the_q_rel<0:3> LBL: h5 ARG0: e4 RSTR: h6 BODY: h7 ]
[ _the_q_rel<4:7> LBL: h8 ARG0: e4 RSTR: h9 BODY: h10 ]
[ "_quiet_a_1_rel"<8:15> LBL: h11 ARG0: e12 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: e13 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ]
[ "_dog_v_1_rel"<16:19> LBL: h11 ARG0: e13 ARG1: i14 ARG2: i15 ]
[ "_sleep_v_1_rel"<20:26> LBL: h11 ARG0: e4 ARG1: i16 ] >
HCONS: < h0 qeq h1 h6 qeq h17 h9 qeq h11 > ] ; (2 np_frg_c -59.857639 0 5 (3 sp-hd_n_c .......
NOTE: 1 readings
NOTE: tsdb parse: (:total . 14) (:treal . 14) (:tcpu . 14) (:others . 1602972)
Zhang, Yi, and Hans-Ulrich Krieger. "Large-scale corpus-driven PCFG approximation of an HPSG." Proceedings of the 12th international conference on parsing technologies. Association for Computational Linguistics, 2011.
Yi Zhang, Stephan Oepen, Rebecca Dridan, Dan Flickinger, and Hans-Ulrich Krieger. In prep. Robust parsing, meaning composition, and evaluation: Integrating grammar approximation, default unification, and elementary semantic dependencies. Accessed Feb-01-2017 from: http://www.mn.uio.no/ifi/english/people/aca/oe/robustness.pdf