Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to actually run something on this bench? #36

Closed
dantetemplar opened this issue Mar 7, 2025 · 1 comment
Closed

How to actually run something on this bench? #36

dantetemplar opened this issue Mar 7, 2025 · 1 comment

Comments

@dantetemplar
Copy link

I have followed instructions to install dependencies. What should be next? I tried to run pdf_validation.py, but what it gives?

###### Process:  end2end_quick_match
Downloading builder script: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5.94k/5.94k [00:00<00:00, 11.2MB/s]
Downloading extra modules: 4.07kB [00:00, 6.57MB/s]                                                                                                                                                                
Downloading extra modules: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.34k/3.34k [00:00<00:00, 10.3MB/s]
Downloading builder script: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.02k/7.02k [00:00<00:00, 20.7MB/s]
[nltk_data] Downloading package wordnet to /home/jupyter/nltk_data...
[nltk_data] Downloading package punkt_tab to
[nltk_data]     /home/jupyter/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package omw-1.4 to /home/jupyter/nltk_data...
【text_block】
Edit_dist:
------------  --------
ALL_page_avg  0.356126
------------  --------
====================================================================================================
BLEU:
---  --------
all  0.269879
---  --------
====================================================================================================
METEOR:
---  --------
all  0.110879
---  --------
====================================================================================================
----Anno Attribute---------------
Edit_dist:
--------------------------------------  ---------
text_background: multi_colored          0.360388
text_background: single_colored         0.340771
text_background: white                  0.531482
text_language: text_en_ch_mixed         0.318496
text_language: text_english             0.0934457
text_language: text_simplified_chinese  0.734819
text_rotate: horizontal                 0.75
text_rotate: normal                     0.512192
--------------------------------------  ---------
====================================================================================================
sample_count:
--------------------------------------  ---
text_background: multi_colored           16
text_background: single_colored          10
text_background: white                  208
text_language: text_en_ch_mixed           7
text_language: text_english              76
text_language: text_simplified_chinese  149
text_rotate: horizontal                   2
text_rotate: normal                     230
--------------------------------------  ---
====================================================================================================
Edit_dist:
--------------------------------  ----------
ALL                               0.356126
None                              0.58769
colorful_backgroud                0.0645602
data_source: PPT2PDF              0.00378788
data_source: academic_literature  0.123349
data_source: book                 0.430496
data_source: colorful_textbook    0.490101
data_source: exam_paper           0.0387346
data_source: magazine             0.362045
data_source: newspaper            0.562052
data_source: note                 0.581583
data_source: research_report      0.612982
fuzzy_scan                        0.490101
language: en_ch_mixed             0.337711
language: english                 0.0726445
language: simplified_chinese      0.556404
layout: 1andmore_column           0.340281
layout: double_column             0.552618
layout: other_layout              0.741007
layout: single_column             0.212567
layout: three_column              0.124104
--------------------------------  ----------
====================================================================================================
【display_formula】
Edit_dist:
------------  --------
ALL_page_avg  0.492899
------------  --------
====================================================================================================
----Anno Attribute---------------
Edit_dist:
-------------------  --------
formula_type: print  0.419252
-------------------  --------
====================================================================================================
sample_count:
-------------------  --
formula_type: print  17
-------------------  --
====================================================================================================
Edit_dist:
--------------------------------  --------
ALL                               0.492899
None                              0.435644
colorful_backgroud                0.550154
data_source: academic_literature  0.435644
data_source: exam_paper           0.550154
language: english                 0.492899
layout: single_column             0.492899
--------------------------------  --------
====================================================================================================
【table】
TEDS:
---  --------
all  0.783813
---  --------
====================================================================================================
TEDS_structure_only:
---  --------
all  0.911589
---  --------
====================================================================================================
Edit_dist:
------------  --------
ALL_page_avg  0.202719
------------  --------
====================================================================================================
----Anno Attribute---------------
Edit_dist:
----------------------------------  --------
include_background: False           0.241423
include_background: True            0.163629
include_equation: False             0.2225
include_equation: True              0.122632
include_photo: False                0.202526
language: table_en                  0.256255
language: table_simplified_chinese  0.179499
line: fewer_line                    0.174262
line: full_line                     0.244128
line: less_line                     0.177442
table_layout: horizontal            0.202526
with_span: False                    0.238261
with_span: True                     0.166791
with_structured_text: False         0.202526
----------------------------------  --------
====================================================================================================
TEDS:
----------------------------------  --------
include_background: False           0.774105
include_background: True            0.79352
include_equation: False             0.771271
include_equation: True              0.833977
include_photo: False                0.783813
language: table_en                  0.858176
language: table_simplified_chinese  0.751942
line: fewer_line                    0.762137
line: full_line                     0.819911
line: less_line                     0.747795
table_layout: horizontal            0.783813
with_span: False                    0.734281
with_span: True                     0.833344
with_structured_text: False         0.783813
----------------------------------  --------
====================================================================================================
TEDS_structure_only:
----------------------------------  --------
include_background: False           0.907024
include_background: True            0.916153
include_equation: False             0.89667
include_equation: True              0.971264
include_photo: False                0.911589
language: table_en                  0.980843
language: table_simplified_chinese  0.881908
line: fewer_line                    0.928188
line: full_line                     0.928922
line: less_line                     0.759259
table_layout: horizontal            0.911589
with_span: False                    0.920635
with_span: True                     0.902542
with_structured_text: False         0.911589
----------------------------------  --------
====================================================================================================
sample_count:
----------------------------------  --
include_background: False            5
include_background: True             5
include_equation: False              8
include_equation: True               2
include_photo: False                10
language: table_en                   3
language: table_simplified_chinese   7
line: fewer_line                     5
line: full_line                      4
line: less_line                      1
table_layout: horizontal            10
with_span: False                     5
with_span: True                      5
with_structured_text: False         10
----------------------------------  --
====================================================================================================
Edit_dist:
--------------------------------  ----------
ALL                               0.202719
colorful_backgroud                0.0837313
data_source: academic_literature  0.217742
data_source: book                 0.177442
data_source: colorful_textbook    0.00746269
data_source: exam_paper           0.16
data_source: magazine             0.0275229
data_source: note                 0.404524
data_source: research_report      0.212628
fuzzy_scan                        0.00746269
language: en_ch_mixed             0.543561
language: english                 0.128402
language: simplified_chinese      0.179141
layout: 1andmore_column           0.167004
layout: double_column             0.217742
layout: other_layout              0.0275229
layout: single_column             0.24904
--------------------------------  ----------
====================================================================================================
TEDS:
--------------------------------  --------
ALL                               0.800119
colorful_backgroud                0.940929
data_source: academic_literature  0.702065
data_source: book                 0.747795
data_source: colorful_textbook    0.999505
data_source: exam_paper           0.882353
data_source: magazine             0.965889
data_source: note                 0.698893
data_source: research_report      0.752838
fuzzy_scan                        0.999505
language: en_ch_mixed             0.872959
language: english                 0.861308
language: simplified_chinese      0.748837
layout: 1andmore_column           0.759705
layout: double_column             0.702065
layout: other_layout              0.965889
layout: single_column             0.802741
--------------------------------  --------
====================================================================================================
TEDS_structure_only:
--------------------------------  --------
ALL                               0.914552
colorful_backgroud                0.941176
data_source: academic_literature  0.942529
data_source: book                 0.759259
data_source: colorful_textbook    1
data_source: exam_paper           0.882353
data_source: magazine             1
data_source: note                 0.916667
data_source: research_report      0.906746
fuzzy_scan                        1
language: en_ch_mixed             1
language: english                 0.941627
language: simplified_chinese      0.881217
layout: 1andmore_column           0.883637
layout: double_column             0.942529
layout: other_layout              1
layout: single_column             0.904233
--------------------------------  --------
====================================================================================================
【reading_order】
Edit_dist:
------------  --------
ALL_page_avg  0.216996
------------  --------
====================================================================================================
----Anno Attribute---------------
sample_count:

====================================================================================================
Edit_dist:
--------------------------------  ---------
ALL                               0.216996
None                              0.455182
colorful_backgroud                0.025641
data_source: PPT2PDF              0
data_source: academic_literature  0.209559
data_source: book                 0.25
data_source: colorful_textbook    0.285714
data_source: exam_paper           0.0769231
data_source: magazine             0.1
data_source: newspaper            0.5
data_source: note                 0.0384615
data_source: research_report      0.492308
fuzzy_scan                        0.285714
language: en_ch_mixed             0.0769231
language: english                 0.081852
language: simplified_chinese      0.325604
layout: 1andmore_column           0.376923
layout: double_column             0.299107
layout: other_layout              0.6
layout: single_column             0.0839618
layout: three_column              0
--------------------------------  ---------
====================================================================================================
(py38) jupyter@instance-20250308-023555:~/OmniDocBench$ 
@ouyanglinke
Copy link
Collaborator

Use this jupyer to extract the results and organize them into a tabular format.

@dantetemplar dantetemplar closed this as not planned Won't fix, can't repro, duplicate, stale Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants