Support arxiv html papers #209

dai-shuo · 2024-11-03T01:43:34Z

Arxiv provides static html version of most papers using LateXML. The html contents are well structured by rich ltx_xxxx CSS classnames. It should be lightning fast parsing those paper htmls and get very precise info.
It would be cool to support arxiv html parsing, as a much faster branch or a strong hint for the pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support arxiv html papers #209

Support arxiv html papers #209

dai-shuo commented Nov 3, 2024

Support arxiv html papers #209

Support arxiv html papers #209

Comments

dai-shuo commented Nov 3, 2024