Skip to content

Commit

Permalink
Merge pull request #434 from flairNLP/add-paper-to-readme
Browse files Browse the repository at this point in the history
Reference paper in README.md and add `cite` section
  • Loading branch information
MaxDall authored Apr 21, 2024
2 parents f9816f6 + 87cd731 commit 97cc5e7
Showing 1 changed file with 30 additions and 1 deletion.
31 changes: 30 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Developed at <a href="https://www.informatik.hu-berlin.de/en/forschung-en/gebiet
<div align="center">
<hr>

[Quick Start](#quick-start) | [Tutorials](#tutorials) | [News Sources](/docs/supported_publishers.md)
[Quick Start](#quick-start) | [Tutorials](#tutorials) | [News Sources](/docs/supported_publishers.md) | [Paper](https://arxiv.org/abs/2403.15279)

</div>

Expand Down Expand Up @@ -143,6 +143,35 @@ You can find the publishers currently supported [**here**](/docs/supported_publi

Also: **Adding a new publisher is easy - consider contributing to the project!**

## Evaluation benchmark

Check out our evaluation [benchmark](https://github.com/dobbersc/fundus-evaluation).

| **Scraper** | **Precision** | **Recall** | **F1-Score** |
|-------------|---------------------------|---------------------------|---------------------------|
| [Fundus](https://github.com/flairNLP/fundus) | **99.89**<sub>±0.57</sub> | 96.75<sub>±12.75</sub> | **97.69**<sub>±9.75</sub> |
| [Trafilatura](https://github.com/adbar/trafilatura) | 90.54<sub>±18.86</sub> | 93.23<sub>±23.81</sub> | 89.81<sub>±23.69</sub> |
| [BTE](https://github.com/dobbersc/fundus-evaluation/blob/master/src/fundus_evaluation/scrapers/bte.py) | 81.09<sub>±19.41</sub> | **98.23**<sub>±8.61</sub> | 87.14<sub>±15.48</sub> |
| [jusText](https://github.com/miso-belica/jusText) | 86.51<sub>±18.92</sub> | 90.23<sub>±20.61</sub> | 86.96<sub>±19.76</sub> |
| [news-please](https://github.com/fhamborg/news-please) | 92.26<sub>±12.40</sub> | 86.38<sub>±27.59</sub> | 85.81<sub>±23.29</sub> |
| [BoilerNet](https://github.com/dobbersc/fundus-evaluation/tree/master/src/fundus_evaluation/scrapers/boilernet) | 84.73<sub>±20.82</sub> | 90.66<sub>±21.05</sub> | 85.77<sub>±20.28</sub> |
| [Boilerpipe](https://github.com/kohlschutter/boilerpipe) | 82.89<sub>±20.65</sub> | 82.11<sub>±29.99</sub> | 79.90<sub>±25.86</sub> |

## Cite

Please cite the following [paper](https://arxiv.org/abs/2403.15279) when using Fundus or building upon our work:

```bibtex
@misc{dallabetta2024fundus,
title={Fundus: A Simple-to-Use News Scraper Optimized for High Quality Extractions},
author={Max Dallabetta and Conrad Dobberstein and Adrian Breiding and Alan Akbik},
year={2024},
eprint={2403.15279},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

## Contact

Please email your questions or comments to [**Max Dallabetta**](mailto:[email protected]?subject=[GitHub]%20Fundus)
Expand Down

0 comments on commit 97cc5e7

Please sign in to comment.