Skip to content

chilledgeek/elasticsearch-simple-client

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Elasticsearch simple client

Code quality checks Status
CodeFactor Codefactor
CircleCI CircleCI
Codecov codecov

Background

  • This repo is a package that interfaces with elasticsearch that allows simple data uploading and querying with python
  • Below are some basic instructions on how to use the code.
  • A sample use case for this code is also illustrated at the end of this readme

How to use

Starting the elasticsearch docker run

  • sudo docker pull elasticsearch:7.5.2
  • docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.5.2

Loading from a csv

import pandas as pd
from elasticsearch_simple_client.uploader import Uploader

uploader = Uploader()
df = pd.read_csv("example/descriptions_with_categories.csv"))

uploader.post_df(df)

Searching data

import pandas as pd

from elasticsearch_with_python_poc.searcher import Searcher

searcher = Searcher()

result = searcher.execute_search(musts=["exact match with some fuzziness"], 
                                 shoulds=["less exact matches allowed"])

Deleting data (not implemented by this python repo)

Sample Application (categorising new short text descriptions)

  • This blog post describes a use case for this repo in automating the categorisation of bank account transactions
  • An example notebook is included in this repo, which contains code and results that are in the blog post above
  • The use case in this example is to categorise account transactions based on description (e.g. the description "ANSTRUTHER FISH BAR AND ANSTRUTHER GBR" should be categorised as "EAT OUT")
  • Some anonymised data used in the notebook can be found in the example folder
  • Once transactions (with known descriptions) are added to elasticsearch, the simple fuzzy string lookup function of elasticsearch can be applied to predict new transactions simply by looking up the transaction (of known category) that best matches the new one
  • While machine learning models can also be trained using transactions with known descriptions, this example demonstrates that fuzzy string matching on its own can sometimes be a simpler yet elegant solution

About

Exploration of using elasticsearch with python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published