Case-wise Results Comparison

评估指标说明:

  • 越大越好:Coverage、Accuracy、Precision、F1 (↑)
  • 越小越好:Div_D (度标准差)、Time (↓)
  • 越接近 Ground Truth (GT) 越好:Size、n_edges、Avg_D、Productivity

注:表格中加粗的数值表示在 UnMerged、Merged、LLM 三种方法中表现最优的结果(根据上述指标方向判断)。

指标的详细介绍请参考论文(https://arxiv.org/pdf/2507.04070)中的附录说明。


C1: Supplementary Adverbs

Metric UnMerged Merged LLM GT
Size 182 201 151 201
n_edges 17 19 17 19
Coverage (↑) 0.927 1.000 0.744 1.000
Productivity 0.016 0.009 0.001
Div_D (↓) 1.242 1.449 3.213 1.560
Avg_D 1.889 2.111 1.889 2.111
Accuracy (↑) 0.963 0.975 0.827
Precision (↑) 0.882 0.895 0.235
Recall (↑) 0.790 0.895 0.211
F1 (↑) 0.833 0.895 0.222
Time (s) (↓) 0.062 0.082 357.000

C2: EAT verbs

Metric UnMerged Merged LLM GT
Size 209 209 209 209
n_edges 16 16 16 16
Coverage (↑) 1.000 1.000 1.000 1.000
Productivity 0.006 0.006 0.009
Div_D (↓) 1.641 1.641 1.450 1.711
Avg_D 1.882 1.882 1.882 1.882
Accuracy (↑) 0.958 0.958 0.958
Precision (↑) 0.813 0.813 0.813
Recall (↑) 0.813 0.813 0.813
F1 (↑) 0.813 0.813 0.813
Time (s) (↓) 2.223 2.225 552.000

C3: Ditransitive constructions

Metric UnMerged Merged LLM GT
Size 208 212 303 226
n_edges 29 30 51 33
Coverage (↑) 0.966 1.000 0.655 0.966
Productivity
Div_D (↓) 1.236 1.211 1.029 1.454
Avg_D 1.933 2.000 3.000 1.941
Accuracy (↑) 0.945 0.939 0.924
Precision (↑) 0.514 0.475 0.392
Recall (↑) 0.546 0.576 0.606
F1 (↑) 0.529 0.521 0.476
Time (s) (↓) 14.502 14.505 203.000

C4: And 和

Metric UnMerged Merged LLM GT
Size 261 261 263 259
n_edges 27 27 28 29
Coverage (↑) 1.000 1.000 1.000 0.927
Productivity
Div_D (↓) 2.987 2.987 4.234 1.509
Avg_D 1.929 1.929 1.931 2.000
Accuracy (↑) 0.926 0.926 0.912
Precision (↑) 0.464 0.464 0.357
Recall (↑) 0.448 0.448 0.345
F1 (↑) 0.456 0.456 0.351
Time (s) (↓) 4.568 4.570 307.000

C5: What 什么

Metric UnMerged Merged LLM GT
Size 329 443 659 462
n_edges 19 27 47 29
Coverage (↑) 0.659 1.000 1.000 0.932
Productivity
Div_D (↓) 0.943 1.552 2.002 2.142
Avg_D 1.900 2.700 4.700 2.900
Accuracy (↑) 0.840 0.820 0.790
Precision (↑) 0.421 0.370 0.362
Recall (↑) 0.276 0.345 0.586
F1 (↑) 0.333 0.357 0.447
Time (s) (↓) 6.674 6.789 443.000

C6: Quantifier 量词

Metric UnMerged Merged LLM GT
Size 426 426 426 425
n_edges 13 13 13 13
Coverage (↑) 1.000 1.000 1.000 0.989
Productivity 0.067 0.067 0.011
Div_D (↓) 1.407 1.407 3.090 1.059
Avg_D 1.857 1.857 1.857 1.857
Accuracy (↑) 0.857 0.857 0.816
Precision (↑) 0.462 0.462 0.308
Recall (↑) 0.462 0.462 0.308
F1 (↑) 0.462 0.462 0.308
Time (s) (↓) 1.930 1.932 316.000

C7: Make 做

Metric UnMerged Merged LLM GT
Size 51 51 91 68
n_edges 6 6 12 7
Coverage (↑) 1.000 1.000 0.353 1.000
Productivity 0.531 0.531 0.051
Div_D (↓) 0.881 0.881 1.500 1.090
Avg_D 1.714 1.714 3.000 1.750
Accuracy (↑) 0.656 0.656 0.531
Precision (↑) 0.300 0.300 0.167
Recall (↑) 0.429 0.429 0.286
F1 (↑) 0.353 0.353 0.211
Time (s) (↓) 0.099 0.100 308.000

C8: Hit 打

Metric UnMerged Merged LLM GT
Size 141 141 189 135
n_edges 18 18 33 18
Coverage (↑) 1.000 1.000 1.000 0.643
Productivity 0.000 0.000 0.000
Div_D (↓) 2.337 2.337 1.141 2.292
Avg_D 1.895 1.895 3.474 1.895
Accuracy (↑) 0.900 0.900 0.762
Precision (↑) 0.500 0.500 0.121
Recall (↑) 0.500 0.500 0.222
F1 (↑) 0.500 0.500 0.157
Time (s) (↓) 2.374 2.375 701.000

C9: Come 来

Metric UnMerged Merged LLM GT
Size 51 51 75 51
n_edges 7 7 10 7
Coverage (↑) 1.000 1.000 1.000 1.000
Productivity 0.298 0.298 0.167
Div_D (↓) 0.829 0.829 1.581 1.392
Avg_D 1.750 1.750 2.500 1.750
Accuracy (↑) 0.813 0.813 0.656
Precision (↑) 0.571 0.571 0.300
Recall (↑) 0.571 0.571 0.429
F1 (↑) 0.571 0.571 0.353
Time (s) (↓) 0.024 0.025 346.000

C10: Tree/Wood/Forest

Metric UnMerged Merged LLM GT
Size 7 7 7 7
n_edges 4 4 4 4
Coverage (↑) 1.000 1.000 1.000 1.000
Productivity 1.444 1.444 1.182
Div_D (↓) 0.490 0.490 0.800 0.800
Avg_D 1.600 1.600 1.600 1.600
Accuracy (↑) 0.840 0.840 1.000
Precision (↑) 0.750 0.750 1.000
Recall (↑) 0.750 0.750 1.000
F1 (↑) 0.750 0.750 1.000
Time (s) (↓) 0.076 0.077 263.000
Last Updated: 2/28/2026, 1:17:18 PM