This repository utilizes Donut (Document Understanding Transformer) for information extraction from scanned images TNI-AU dataset. Document Types are:
• Decree_On_Implementation.
• Military_Education_Certificate.
• Rank_Promotion_Degree.
Accuracy of Model:
-
Rank_Promotion_Degree: - Total number of samples: 4, Tree Edit Distance (TED) based accuracy score: 0.7174657534246576, F1 accuracy score: 0.8181818181818182
-
Decree_on_implementation: - Total number of samples: 9, Tree Edit Distance (TED) based accuracy score: 0.7151542241291141, F1 accuracy score: 0.4727272727272727
-
Military_Education_Certificate: - Total number of samples: 3, Tree Edit Distance (TED) based accuracy score: 0.23286052009456262, F1 accuracy score: 0.24
All Data Files:
-
Decree_on_implementation: - Link: https://drive.google.com/drive/folders/1jh_jNPfFVOPW6Ot35wjbz_pWDWZSuhjk?usp=drive_link
-
Military_Education_Certificate: - Link: https://drive.google.com/drive/folders/1raqa4bw5fjpPminS5MSPhrj9rlrjqDMI?usp=drive_link
-
Rank_Promotion_Degree: - Link: https://drive.google.com/drive/folders/1VhikcON_tmP9Xr3-pI05_LGveLi_rjbd?usp=drive_link