Case-wise Results Comparison
评估指标说明:
- 越大越好:Coverage、Accuracy、Precision、F1 (↑)
- 越小越好:Div_D (度标准差)、Time (↓)
- 越接近 Ground Truth (GT) 越好:Size、n_edges、Avg_D、Productivity
注:表格中加粗的数值表示在 UnMerged、Merged、LLM 三种方法中表现最优的结果(根据上述指标方向判断)。
指标的详细介绍请参考论文(https://arxiv.org/pdf/2507.04070)中的附录说明。
C1: Supplementary Adverbs
| Metric | UnMerged | Merged | LLM | GT |
|---|---|---|---|---|
| Size | 182 | 201 | 151 | 201 |
| n_edges | 17 | 19 | 17 | 19 |
| Coverage (↑) | 0.927 | 1.000 | 0.744 | 1.000 |
| Productivity | 0.016 | 0.009 | 0.001 | — |
| Div_D (↓) | 1.242 | 1.449 | 3.213 | 1.560 |
| Avg_D | 1.889 | 2.111 | 1.889 | 2.111 |
| Accuracy (↑) | 0.963 | 0.975 | 0.827 | — |
| Precision (↑) | 0.882 | 0.895 | 0.235 | — |
| Recall (↑) | 0.790 | 0.895 | 0.211 | — |
| F1 (↑) | 0.833 | 0.895 | 0.222 | — |
| Time (s) (↓) | 0.062 | 0.082 | 357.000 | — |
C2: EAT verbs
| Metric | UnMerged | Merged | LLM | GT |
|---|---|---|---|---|
| Size | 209 | 209 | 209 | 209 |
| n_edges | 16 | 16 | 16 | 16 |
| Coverage (↑) | 1.000 | 1.000 | 1.000 | 1.000 |
| Productivity | 0.006 | 0.006 | 0.009 | — |
| Div_D (↓) | 1.641 | 1.641 | 1.450 | 1.711 |
| Avg_D | 1.882 | 1.882 | 1.882 | 1.882 |
| Accuracy (↑) | 0.958 | 0.958 | 0.958 | — |
| Precision (↑) | 0.813 | 0.813 | 0.813 | — |
| Recall (↑) | 0.813 | 0.813 | 0.813 | — |
| F1 (↑) | 0.813 | 0.813 | 0.813 | — |
| Time (s) (↓) | 2.223 | 2.225 | 552.000 | — |
C3: Ditransitive constructions
| Metric | UnMerged | Merged | LLM | GT |
|---|---|---|---|---|
| Size | 208 | 212 | 303 | 226 |
| n_edges | 29 | 30 | 51 | 33 |
| Coverage (↑) | 0.966 | 1.000 | 0.655 | 0.966 |
| Productivity | — | — | — | — |
| Div_D (↓) | 1.236 | 1.211 | 1.029 | 1.454 |
| Avg_D | 1.933 | 2.000 | 3.000 | 1.941 |
| Accuracy (↑) | 0.945 | 0.939 | 0.924 | — |
| Precision (↑) | 0.514 | 0.475 | 0.392 | — |
| Recall (↑) | 0.546 | 0.576 | 0.606 | — |
| F1 (↑) | 0.529 | 0.521 | 0.476 | — |
| Time (s) (↓) | 14.502 | 14.505 | 203.000 | — |
C4: And 和
| Metric | UnMerged | Merged | LLM | GT |
|---|---|---|---|---|
| Size | 261 | 261 | 263 | 259 |
| n_edges | 27 | 27 | 28 | 29 |
| Coverage (↑) | 1.000 | 1.000 | 1.000 | 0.927 |
| Productivity | — | — | — | — |
| Div_D (↓) | 2.987 | 2.987 | 4.234 | 1.509 |
| Avg_D | 1.929 | 1.929 | 1.931 | 2.000 |
| Accuracy (↑) | 0.926 | 0.926 | 0.912 | — |
| Precision (↑) | 0.464 | 0.464 | 0.357 | — |
| Recall (↑) | 0.448 | 0.448 | 0.345 | — |
| F1 (↑) | 0.456 | 0.456 | 0.351 | — |
| Time (s) (↓) | 4.568 | 4.570 | 307.000 | — |
C5: What 什么
| Metric | UnMerged | Merged | LLM | GT |
|---|---|---|---|---|
| Size | 329 | 443 | 659 | 462 |
| n_edges | 19 | 27 | 47 | 29 |
| Coverage (↑) | 0.659 | 1.000 | 1.000 | 0.932 |
| Productivity | — | — | — | — |
| Div_D (↓) | 0.943 | 1.552 | 2.002 | 2.142 |
| Avg_D | 1.900 | 2.700 | 4.700 | 2.900 |
| Accuracy (↑) | 0.840 | 0.820 | 0.790 | — |
| Precision (↑) | 0.421 | 0.370 | 0.362 | — |
| Recall (↑) | 0.276 | 0.345 | 0.586 | — |
| F1 (↑) | 0.333 | 0.357 | 0.447 | — |
| Time (s) (↓) | 6.674 | 6.789 | 443.000 | — |
C6: Quantifier 量词
| Metric | UnMerged | Merged | LLM | GT |
|---|---|---|---|---|
| Size | 426 | 426 | 426 | 425 |
| n_edges | 13 | 13 | 13 | 13 |
| Coverage (↑) | 1.000 | 1.000 | 1.000 | 0.989 |
| Productivity | 0.067 | 0.067 | 0.011 | — |
| Div_D (↓) | 1.407 | 1.407 | 3.090 | 1.059 |
| Avg_D | 1.857 | 1.857 | 1.857 | 1.857 |
| Accuracy (↑) | 0.857 | 0.857 | 0.816 | — |
| Precision (↑) | 0.462 | 0.462 | 0.308 | — |
| Recall (↑) | 0.462 | 0.462 | 0.308 | — |
| F1 (↑) | 0.462 | 0.462 | 0.308 | — |
| Time (s) (↓) | 1.930 | 1.932 | 316.000 | — |
C7: Make 做
| Metric | UnMerged | Merged | LLM | GT |
|---|---|---|---|---|
| Size | 51 | 51 | 91 | 68 |
| n_edges | 6 | 6 | 12 | 7 |
| Coverage (↑) | 1.000 | 1.000 | 0.353 | 1.000 |
| Productivity | 0.531 | 0.531 | 0.051 | — |
| Div_D (↓) | 0.881 | 0.881 | 1.500 | 1.090 |
| Avg_D | 1.714 | 1.714 | 3.000 | 1.750 |
| Accuracy (↑) | 0.656 | 0.656 | 0.531 | — |
| Precision (↑) | 0.300 | 0.300 | 0.167 | — |
| Recall (↑) | 0.429 | 0.429 | 0.286 | — |
| F1 (↑) | 0.353 | 0.353 | 0.211 | — |
| Time (s) (↓) | 0.099 | 0.100 | 308.000 | — |
C8: Hit 打
| Metric | UnMerged | Merged | LLM | GT |
|---|---|---|---|---|
| Size | 141 | 141 | 189 | 135 |
| n_edges | 18 | 18 | 33 | 18 |
| Coverage (↑) | 1.000 | 1.000 | 1.000 | 0.643 |
| Productivity | 0.000 | 0.000 | 0.000 | — |
| Div_D (↓) | 2.337 | 2.337 | 1.141 | 2.292 |
| Avg_D | 1.895 | 1.895 | 3.474 | 1.895 |
| Accuracy (↑) | 0.900 | 0.900 | 0.762 | — |
| Precision (↑) | 0.500 | 0.500 | 0.121 | — |
| Recall (↑) | 0.500 | 0.500 | 0.222 | — |
| F1 (↑) | 0.500 | 0.500 | 0.157 | — |
| Time (s) (↓) | 2.374 | 2.375 | 701.000 | — |
C9: Come 来
| Metric | UnMerged | Merged | LLM | GT |
|---|---|---|---|---|
| Size | 51 | 51 | 75 | 51 |
| n_edges | 7 | 7 | 10 | 7 |
| Coverage (↑) | 1.000 | 1.000 | 1.000 | 1.000 |
| Productivity | 0.298 | 0.298 | 0.167 | — |
| Div_D (↓) | 0.829 | 0.829 | 1.581 | 1.392 |
| Avg_D | 1.750 | 1.750 | 2.500 | 1.750 |
| Accuracy (↑) | 0.813 | 0.813 | 0.656 | — |
| Precision (↑) | 0.571 | 0.571 | 0.300 | — |
| Recall (↑) | 0.571 | 0.571 | 0.429 | — |
| F1 (↑) | 0.571 | 0.571 | 0.353 | — |
| Time (s) (↓) | 0.024 | 0.025 | 346.000 | — |
C10: Tree/Wood/Forest
| Metric | UnMerged | Merged | LLM | GT |
|---|---|---|---|---|
| Size | 7 | 7 | 7 | 7 |
| n_edges | 4 | 4 | 4 | 4 |
| Coverage (↑) | 1.000 | 1.000 | 1.000 | 1.000 |
| Productivity | 1.444 | 1.444 | 1.182 | — |
| Div_D (↓) | 0.490 | 0.490 | 0.800 | 0.800 |
| Avg_D | 1.600 | 1.600 | 1.600 | 1.600 |
| Accuracy (↑) | 0.840 | 0.840 | 1.000 | — |
| Precision (↑) | 0.750 | 0.750 | 1.000 | — |
| Recall (↑) | 0.750 | 0.750 | 1.000 | — |
| F1 (↑) | 0.750 | 0.750 | 1.000 | — |
| Time (s) (↓) | 0.076 | 0.077 | 263.000 | — |