Skip to content

A tool to detect whether numerals present in Financial Texts are in-claim or out-of-claim

License

Notifications You must be signed in to change notification settings

sohomghosh/FiNCAT_Financial_Numeral_Claim_Analysis_Tool

Repository files navigation

FiNCAT: Financial Numeral Claim Analysis Tool

A tool to detect whether numerals present in Financial Texts are in-claim or out-of-claim. It has been accepted at the FinWeb@TheWebConf-2022 (formerly ACM-WWW) (Core rank: A*) (pre-print)

alt text

Architecture

alt text

How to use?

Use it directly from HuggingFace Spaces or Google Colab

alt text

The API is available here.

For re-training or re-using the tool locally, please refer to requirements.txt for versions of the Python libaries used while developing this tool.


Training
For training you need to execute the FiNCAT_training.ipynb notebook the present in the training folder. It needs fincat_utils.py present in the main folder and the embeddings/labels present in the training folder as .csv files. X_train_df.zip needs to be unzipped to get the X_train_df.csv file. You can obtain the raw data from here .


Using the tool locally
For using the tool locally, you do not need to train it as we have already provided the model artifacts. You can simply execute the FiNCAT_tool_enhanced_UI.ipynb notebook. More details have been provided in the tools folder. alt text

FiNCAT (with enhanced UI)

alt text

FiNCAT Video Demonstration (on YouTube)

Video Demonstration

References

This tool has been built using Google Colab and Gradio. It has been hosted using 🤗 HuggingFace Spaces.

Tool citation:

@inproceedings{ghosh-fiNCAT,
    title = "FiNCAT: Financial Numeral Claim Analysis Tool",
    author = "Sohom Ghosh, Sudip Kumar Naskar",
    year = "2022",
    journal = "In Companion Proceedings of the Web Conference 2022 (WWW ’22 Companion)"
    url = "https://arxiv.org/abs/2202.00631",
    doi = "10.1145/3487553.3524635"
}
@article{fincat2,
title = {FiNCAT-2: An enhanced Financial Numeral Claim Analysis Tool},
journal = {Software Impacts},
volume = {},
pages = {},
year = {2022},
issn = {2665-9638},
doi = {10.1016/j.simpa.2022.100288},
url = {https://www.sciencedirect.com/science/article/pii/S2665963822000367},
author = {Sohom Ghosh, Sudip Kumar Naskar},
}

Dataset and shared task citation:

@inproceedings{finum3,
  title={Overview of the NTCIR-16 FinNum-3 Task: Investor’s and Manager’s 
Fine-grained Claim Detection},
  author={Chen, Chung-Chi and Huang, Hen-Hsen and Huang, Yu-Lieh and Takamura, Hiroya and Chen, Hsin-Hsi},
  journal={Proceedings of the 16th NTCIR Conference on Evaluation of Information Access Technologies, Tokyo Japan},
  year={2022}
}
@inbook{numclaim,
author = {Chen, Chung-Chi and Huang, Hen-Hsen and Chen, Hsin-Hsi},
title = {NumClaim: Investor's Fine-Grained Claim Detection},
year = {2020},
isbn = {9781450368599},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3340531.3412100},
booktitle = {Proceedings of the 29th ACM International Conference on Information & Knowledge Management},
pages = {1973–1976},
numpages = {4}
}

Blog by Arushi Prakash

NOTE:
This tool is released under MIT license.
The embeddings and labels are released under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.