Code quality checks | Status |
---|---|
CodeFactor | |
CircleCI | |
Codecov |
- This repo is a package that interfaces with elasticsearch that allows simple data uploading and querying with python
- Below are some basic instructions on how to use the code.
- A sample use case for this code is also illustrated at the end of this readme
sudo docker pull elasticsearch:7.5.2
docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.5.2
import pandas as pd
from elasticsearch_simple_client.uploader import Uploader
uploader = Uploader()
df = pd.read_csv("example/descriptions_with_categories.csv"))
uploader.post_df(df)
import pandas as pd
from elasticsearch_with_python_poc.searcher import Searcher
searcher = Searcher()
result = searcher.execute_search(musts=["exact match with some fuzziness"],
shoulds=["less exact matches allowed"])
- To clear an elasticsearch index
just run this in command line:
curl -X DELETE "localhost:9200/<index_name>"
- This blog post describes a use case for this repo in automating the categorisation of bank account transactions
- An example notebook is included in this repo, which contains code and results that are in the blog post above
- The use case in this example is to categorise account transactions based on description (e.g. the description "ANSTRUTHER FISH BAR AND ANSTRUTHER GBR" should be categorised as "EAT OUT")
- Some anonymised data used in the notebook can be found in the example folder
- Once transactions (with known descriptions) are added to elasticsearch, the simple fuzzy string lookup function of elasticsearch can be applied to predict new transactions simply by looking up the transaction (of known category) that best matches the new one
- While machine learning models can also be trained using transactions with known descriptions, this example demonstrates that fuzzy string matching on its own can sometimes be a simpler yet elegant solution