Skip to content

Latest commit

 

History

History
52 lines (39 loc) · 1.73 KB

README.md

File metadata and controls

52 lines (39 loc) · 1.73 KB

About The Project

This project is an entity normalisation engine developed for the Vector AI recruitment process. It supports entity normalisation for the following types of entities:

  • Companies, businesses;
  • Products, objects;
  • Locations, cities, countries;
  • Serial numbers;
  • Street addresses.

The model takes as input a stream of strings in the classes above. There is no context provided for each entity.

The model performs a normalisation to suitable Wikipedia articles for the first three types of entities. Given the uniqueness of the latter two types of entities, normalisation is performed according to linguistic similarity of the input entities using the Levenshtein distance.

The model accepts entities in any language supported by the Google Translator API.

Getting Started

To set up this project:

  1. Clone GitHub repo:
git clone https://github.com/jleguina0/entity-normalization.git
  1. Create a suitable virtual environment and install dependencies:

    • With conda:
      cd entity-normalization
      conda env create -f environment.yml
      conda activate entity-norm37
    • Or else, create a virtual environment with Python 3.7 and do:
      pip install -r requirements.txt
  2. To run the normalization engine with some predefined examples in various languages:

    python entity_norm.py

Contact

Javier Leguina Peral - [email protected]

Project Link: https://github.com/jleguina0/entity-normalization