Baseline vs Best model performance at recovering MRS arguments, aggregated by predication type.

MATCH indicates how often the gold predication itself was found in the result analysis.

ARG2 etc indicates how often, given that a gold predication was found in the result analysis, the value of that particular argument matched (as judged by finding a matching pair of gold/result EPs with that value as their ARG0).

aggregate	baseline	best
exact tree match	40.3%	47.5%
_v_ . ARG1	91.50%	93.21%
_v_ . ARG2	93.08%	94.70%
_v_ . ARG3	90.35%	90.95%
_v_ . MATCH	96.39%	96.70%
_p_ . ARG1	80.03%	83.99%
_p_ . ARG2	94.35%	95.62%
_p_ . MATCH	93.94%	94.67%
_n_ . ARG1	93.81%	95.14%
_n_ . MATCH	98.33%	98.41%
_a_ . ARG1	93.37%	95.08%
_a_ . MATCH	97.51%	97.65%
. MATCH	92.09%	93.31%
_in_p_rel . ARG1	78.06%	80.53%
_in_p_rel . ARG2	95.95%	96.40%
_for_p_rel . ARG1	78.84%	82.60%
_for_p_rel . ARG2	93.09%	94.74%
_of_p_rel . ARG1	97.00%	97.16%
_of_p_rel . ARG2	94.26%	95.49%
_and_c_rel . L-INDEX	84.98%	88.45%
_and_c_rel . R-INDEX	88.96%	90.02%
compound_rel . ARG1	99.80%	99.79%
compound_rel . ARG2	95.70%	96.68%