Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support arxiv html papers #209

Open
dai-shuo opened this issue Nov 3, 2024 · 0 comments
Open

Support arxiv html papers #209

dai-shuo opened this issue Nov 3, 2024 · 0 comments

Comments

@dai-shuo
Copy link

dai-shuo commented Nov 3, 2024

Arxiv provides static html version of most papers using LateXML. The html contents are well structured by rich ltx_xxxx CSS classnames. It should be lightning fast parsing those paper htmls and get very precise info.
It would be cool to support arxiv html parsing, as a much faster branch or a strong hint for the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant