LimeSoup is a package to parse HTML or XML papers from different publishers. It can be used to feed a database.
Full Usage:
from LimeSoup import (
ACSSoup,
AIPSoup,
APSSoup,
ECSSoup,
ElsevierSoup,
IOPSoup,
NatureSoup,
RSCSoup,
SpringerSoup,
WileySoup,
)
with open(article, 'r', encoding = 'utf-8') as f:
html_str = f.read()
***Choose correct publisher
data = ECSSoup.parse(html_str)
with open('file_test.json', 'w', encoding = 'utf-8') as f:
json.dump(data, f, sort_keys=True, indent=4, ensure_ascii=False)
Currently, we have implemented the following parsers:
- ECS: The Electrochemical Society
- RSC: The Royal Society of Chemistry
- Elsevier
- Nature Publishing Group
- Springer
- Wiley
- ACS: American Chemical Society
- APS: American Physical Society
- IOP Publishing
- AIP: American Institute of Physics
Please refer to the wiki pages.
Please see change logs.
LimeSoup was contributed to by these genius people:
- Tiago Botari
- Ziqin Rong
- Vahe Tshitoyan
- Nicolas Mingione
- Jason Madeano
- Haoyan Huo
- Tanjin He
- Zach Jensen
- Alex van Grootel
- Edward Kim
- Haihao Liu
- Zheren Wang
If you are planning to use LimeSoup in your work, please consider citing the following paper:
- Kononova et. al "Text-mined dataset of inorganic materials synthesis recipes", Scientific Data 6 (1), 1-11 (2019) 10.1038/s41597-019-0224-1