diff --git a/_sources/experiments/index.rst.txt b/_sources/experiments/index.rst.txt index a1deead1..b08d24f4 100644 --- a/_sources/experiments/index.rst.txt +++ b/_sources/experiments/index.rst.txt @@ -6,7 +6,7 @@ Follow these steps to reproduce the experiments in our paper. 1. Obtain the external resources -------------------------------- -Follow the instructions in the ":doc:`/getting-started/resources`" page in the documentation +Follow the instructions in the ":doc:`resources`" page in the documentation to obtain the resources required for running the experiments. 2. Preparing the data @@ -17,7 +17,7 @@ run the following command from the ``./experiments/`` folder: .. code-block:: bash - $ python ./prepare_data.py -p ../resources + $ python ./prepare_data.py This script takes care of downloading the LwM and HIPE datasets and format them as needed in the experiments. @@ -30,7 +30,7 @@ folder: .. code-block:: bash - $ python ./toponym_resolution.py -p ../resources + $ python ./toponym_resolution.py This script does runs for all different scenarios reported in the experiments in the paper. diff --git a/_sources/getting-started/complete-tour.rst.txt b/_sources/getting-started/complete-tour.rst.txt index f0d88224..bee8401a 100644 --- a/_sources/getting-started/complete-tour.rst.txt +++ b/_sources/getting-started/complete-tour.rst.txt @@ -47,9 +47,7 @@ To instantiate the default T-Res pipeline, do: from geoparser import pipeline - geoparser = pipeline.Pipeline(resources_path="../resources/") - -.. note:: You should update the resources path argument to reflect your set up. + geoparser = pipeline.Pipeline() You can also instantiate a pipeline using a customised Recogniser, Ranker and Linker. To see the different options, refer to the sections on instantiating @@ -605,7 +603,7 @@ and ``levenshtein`` respectively), instantiate it as follows, changing the myranker = ranking.Ranker( method="perfectmatch", # or "partialmatch" or "levenshtein" - resources_path="resources/", + resources_path="resources/wikidata/", ) Note that ``resources_path`` should contain the path to the directory @@ -670,7 +668,7 @@ The Ranker can then be instantiated as follows: myranker = ranking.Ranker( # Generic Ranker parameters: method="deezymatch", - resources_path="resources/", + resources_path="resources/wikidata/", # Parameters to create the string pair dataset: strvar_parameters=dict(), # Parameters to train, load and use a DeezyMatch model: @@ -759,7 +757,7 @@ The Ranker can then be instantiated as follows: myranker = ranking.Ranker( # Generic Ranker parameters: method="deezymatch", - resources_path="resources/", + resources_path="resources/wikidata/", # Parameters to create the string pair dataset: strvar_parameters={ "ocr_threshold": 60, @@ -1058,7 +1056,7 @@ of the Linker method. .. code-block:: python - mylinker.load_resources() + mylinker.linking_resources = mylinker.load_resources() .. note:: diff --git a/_sources/getting-started/resources.rst.txt b/_sources/getting-started/resources.rst.txt index 8745d4c8..6b72addc 100644 --- a/_sources/getting-started/resources.rst.txt +++ b/_sources/getting-started/resources.rst.txt @@ -561,9 +561,6 @@ for the mentioned resources that are required in order to run the pipeline. :: T-Res/ - ├── t-res/ - │ ├── geoparser/ - │ └── utils/ ├── app/ ├── evaluation/ ├── examples/ @@ -574,6 +571,7 @@ for the mentioned resources that are required in order to run the pipeline. │ ├── linking_df_split.tsv [*?] │ ├── ner_fine_dev.json [*+?] │ └── ner_fine_train.json [*+?] + ├── geoparser/ ├── resources/ │ ├── deezymatch/ │ │ └── data/ @@ -588,7 +586,8 @@ for the mentioned resources that are required in order to run the pipeline. │ ├── mentions_to_wikidata.json [*] │ ├── wikidta_gazetteer.csv [*] │ └── wikidata_to_mentions_normalized.json [*] - └── tests/ + ├── tests/ + └── utils/ A question mark (``?``) is used to indicate resources which are only required for some approaches (for example, the ``rel_db/embeddings_database.db`` file diff --git a/_sources/reference/geoparser/linker.rst.txt b/_sources/reference/geoparser/linker.rst.txt index e6bb8091..26ee9990 100644 --- a/_sources/reference/geoparser/linker.rst.txt +++ b/_sources/reference/geoparser/linker.rst.txt @@ -1,8 +1,8 @@ -``t_res.geoparser.linking.Linker`` +``geoparser.linking.Linker`` ============================ -.. autoclass:: t_res.geoparser.linking.Linker +.. autoclass:: geoparser.linking.Linker :members: :undoc-members: -.. autoattribute:: t_res.geoparser.linking.RANDOM_SEED \ No newline at end of file +.. autoattribute:: geoparser.linking.RANDOM_SEED \ No newline at end of file diff --git a/_sources/reference/geoparser/pipeline.rst.txt b/_sources/reference/geoparser/pipeline.rst.txt index 95e68b45..392610ae 100644 --- a/_sources/reference/geoparser/pipeline.rst.txt +++ b/_sources/reference/geoparser/pipeline.rst.txt @@ -1,6 +1,6 @@ -``t_res.geoparser.pipeline.Pipeline`` +``geoparser.pipeline.Pipeline`` =============================== -.. autoclass:: t_res.geoparser.pipeline.Pipeline +.. autoclass:: geoparser.pipeline.Pipeline :members: :undoc-members: diff --git a/_sources/reference/geoparser/ranker.rst.txt b/_sources/reference/geoparser/ranker.rst.txt index c31dd884..659f928a 100644 --- a/_sources/reference/geoparser/ranker.rst.txt +++ b/_sources/reference/geoparser/ranker.rst.txt @@ -1,6 +1,6 @@ -``t_res.geoparser.ranking. Ranker`` +``geoparser.ranking. Ranker`` ============================= -.. autoclass:: t_res.geoparser.ranking.Ranker +.. autoclass:: geoparser.ranking.Ranker :members: :undoc-members: diff --git a/_sources/reference/geoparser/recogniser.rst.txt b/_sources/reference/geoparser/recogniser.rst.txt index d437b140..5b4543ca 100644 --- a/_sources/reference/geoparser/recogniser.rst.txt +++ b/_sources/reference/geoparser/recogniser.rst.txt @@ -1,6 +1,6 @@ -``t_res.geoparser.recogniser.Recogniser`` +``geoparser.recogniser.Recogniser`` =================================== -.. autoclass:: t_res.geoparser.recogniser.Recogniser +.. autoclass:: geoparser.recogniser.Recogniser :members: :undoc-members: diff --git a/_sources/reference/utils/deezy_processing.rst.txt b/_sources/reference/utils/deezy_processing.rst.txt index aa80e247..6f9ae76f 100644 --- a/_sources/reference/utils/deezy_processing.rst.txt +++ b/_sources/reference/utils/deezy_processing.rst.txt @@ -1,10 +1,10 @@ -``t_res.utils.deezy_processing`` module +``utils.deezy_processing`` module ================================= -.. autofunction:: t_res.utils.deezy_processing.obtain_matches +.. autofunction:: utils.deezy_processing.obtain_matches -.. autofunction:: t_res.utils.deezy_processing.create_training_set +.. autofunction:: utils.deezy_processing.create_training_set -.. autofunction:: t_res.utils.deezy_processing.train_deezy_model +.. autofunction:: utils.deezy_processing.train_deezy_model -.. autofunction:: t_res.utils.deezy_processing.generate_candidates \ No newline at end of file +.. autofunction:: utils.deezy_processing.generate_candidates \ No newline at end of file diff --git a/_sources/reference/utils/get_data.rst.txt b/_sources/reference/utils/get_data.rst.txt index c3016cf1..f3edecb1 100644 --- a/_sources/reference/utils/get_data.rst.txt +++ b/_sources/reference/utils/get_data.rst.txt @@ -1,6 +1,6 @@ -``t_res.utils.get_data`` module +``utils.get_data`` module ========================= -.. autofunction:: t_res.utils.get_data.download_lwm_data +.. autofunction:: utils.get_data.download_lwm_data -.. autofunction:: t_res.utils.get_data.download_hipe_data \ No newline at end of file +.. autofunction:: utils.get_data.download_hipe_data \ No newline at end of file diff --git a/_sources/reference/utils/ner.rst.txt b/_sources/reference/utils/ner.rst.txt index d8d3dc0b..363f5484 100644 --- a/_sources/reference/utils/ner.rst.txt +++ b/_sources/reference/utils/ner.rst.txt @@ -1,18 +1,18 @@ -``t_res.utils.ner`` module +``utils.ner`` module ==================== -.. autofunction:: t_res.utils.ner.training_tokenize_and_align_labels +.. autofunction:: utils.ner.training_tokenize_and_align_labels -.. autofunction:: t_res.utils.ner.collect_named_entities +.. autofunction:: utils.ner.collect_named_entities -.. autofunction:: t_res.utils.ner.aggregate_mentions +.. autofunction:: utils.ner.aggregate_mentions -.. autofunction:: t_res.utils.ner.fix_capitalization +.. autofunction:: utils.ner.fix_capitalization -.. autofunction:: t_res.utils.ner.fix_hyphens +.. autofunction:: utils.ner.fix_hyphens -.. autofunction:: t_res.utils.ner.fix_nested +.. autofunction:: utils.ner.fix_nested -.. autofunction:: t_res.utils.ner.fix_startEntity +.. autofunction:: utils.ner.fix_startEntity -.. autofunction:: t_res.utils.ner.aggregate_entities \ No newline at end of file +.. autofunction:: utils.ner.aggregate_entities \ No newline at end of file diff --git a/_sources/reference/utils/preprocess_data.rst.txt b/_sources/reference/utils/preprocess_data.rst.txt index 73c73d8b..938773a5 100644 --- a/_sources/reference/utils/preprocess_data.rst.txt +++ b/_sources/reference/utils/preprocess_data.rst.txt @@ -1,20 +1,20 @@ -``t_res.utils.preprocess_data`` module +``utils.preprocess_data`` module ================================ -.. automodule:: t_res.utils.preprocess_data +.. automodule:: utils.preprocess_data -.. autofunction:: t_res.utils.preprocess_data.turn_wikipedia2wikidata +.. autofunction:: utils.preprocess_data.turn_wikipedia2wikidata -.. autofunction:: t_res.utils.preprocess_data.reconstruct_sentences +.. autofunction:: utils.preprocess_data.reconstruct_sentences -.. autofunction:: t_res.utils.preprocess_data.process_lwm_for_ner +.. autofunction:: utils.preprocess_data.process_lwm_for_ner -.. autofunction:: t_res.utils.preprocess_data.process_lwm_for_linking +.. autofunction:: utils.preprocess_data.process_lwm_for_linking -.. autofunction:: t_res.utils.preprocess_data.aggregate_hipe_entities +.. autofunction:: utils.preprocess_data.aggregate_hipe_entities -.. autofunction:: t_res.utils.preprocess_data.process_hipe_for_linking +.. autofunction:: utils.preprocess_data.process_hipe_for_linking -.. autofunction:: t_res.utils.preprocess_data.process_tsv +.. autofunction:: utils.preprocess_data.process_tsv -.. autofunction:: t_res.utils.preprocess_data.fine_to_coarse \ No newline at end of file +.. autofunction:: utils.preprocess_data.fine_to_coarse \ No newline at end of file diff --git a/_sources/reference/utils/process_data.rst.txt b/_sources/reference/utils/process_data.rst.txt index 25798b04..1f5f2066 100644 --- a/_sources/reference/utils/process_data.rst.txt +++ b/_sources/reference/utils/process_data.rst.txt @@ -1,20 +1,20 @@ -``t_res.utils.process_data`` module +``utils.process_data`` module ============================= -.. autofunction:: t_res.utils.process_data.eval_with_exception +.. autofunction:: utils.process_data.eval_with_exception -.. autofunction:: t_res.utils.process_data.prepare_sents +.. autofunction:: utils.process_data.prepare_sents -.. autofunction:: t_res.utils.process_data.align_gold +.. autofunction:: utils.process_data.align_gold -.. autofunction:: t_res.utils.process_data.postprocess_predictions +.. autofunction:: utils.process_data.postprocess_predictions -.. autofunction:: t_res.utils.process_data.ner_and_process +.. autofunction:: utils.process_data.ner_and_process -.. autofunction:: t_res.utils.process_data.update_with_linking +.. autofunction:: utils.process_data.update_with_linking -.. autofunction:: t_res.utils.process_data.update_with_skyline +.. autofunction:: utils.process_data.update_with_skyline -.. autofunction:: t_res.utils.process_data.prepare_storing_links +.. autofunction:: utils.process_data.prepare_storing_links -.. autofunction:: t_res.utils.process_data.store_for_scorer +.. autofunction:: utils.process_data.store_for_scorer diff --git a/_sources/reference/utils/process_wikipedia.rst.txt b/_sources/reference/utils/process_wikipedia.rst.txt index 807ef9ee..69f7e686 100644 --- a/_sources/reference/utils/process_wikipedia.rst.txt +++ b/_sources/reference/utils/process_wikipedia.rst.txt @@ -1,8 +1,8 @@ -``t_res.utils.process_wikipedia`` module +``utils.process_wikipedia`` module ================================== -.. autofunction:: t_res.utils.process_wikipedia.make_wikilinks_consistent +.. autofunction:: utils.process_wikipedia.make_wikilinks_consistent -.. autofunction:: t_res.utils.process_wikipedia.make_wikipedia2wikidata_consisent +.. autofunction:: utils.process_wikipedia.make_wikipedia2wikidata_consisent -.. autofunction:: t_res.utils.process_wikipedia.title_to_id \ No newline at end of file +.. autofunction:: utils.process_wikipedia.title_to_id \ No newline at end of file diff --git a/_sources/reference/utils/rel/entity_disambiguation.rst.txt b/_sources/reference/utils/rel/entity_disambiguation.rst.txt index 9a690635..1ace598e 100644 --- a/_sources/reference/utils/rel/entity_disambiguation.rst.txt +++ b/_sources/reference/utils/rel/entity_disambiguation.rst.txt @@ -1,8 +1,8 @@ -``t_res.utils.REL.entity_disambiguation`` module +``utils.REL.entity_disambiguation`` module ========================================== -.. autoclass:: t_res.utils.REL.entity_disambiguation.EntityDisambiguation +.. autoclass:: utils.REL.entity_disambiguation.EntityDisambiguation :members: :undoc-members: -.. autoattribute:: t_res.utils.REL.entity_disambiguation.RANDOM_SEED \ No newline at end of file +.. autoattribute:: utils.REL.entity_disambiguation.RANDOM_SEED \ No newline at end of file diff --git a/_sources/reference/utils/rel/mulrel_ranker.rst.txt b/_sources/reference/utils/rel/mulrel_ranker.rst.txt index a7352632..7e4e77ea 100644 --- a/_sources/reference/utils/rel/mulrel_ranker.rst.txt +++ b/_sources/reference/utils/rel/mulrel_ranker.rst.txt @@ -1,10 +1,10 @@ -``t_res.utils.REL.mulrel_ranker`` module +``utils.REL.mulrel_ranker`` module ================================== -.. autoclass:: t_res.utils.REL.mulrel_ranker.PreRank +.. autoclass:: utils.REL.mulrel_ranker.PreRank :members: :undoc-members: -.. autoclass:: t_res.utils.REL.mulrel_ranker.MulRelRanker +.. autoclass:: utils.REL.mulrel_ranker.MulRelRanker :members: :undoc-members: diff --git a/_sources/reference/utils/rel/utils.rst.txt b/_sources/reference/utils/rel/utils.rst.txt index 1597f964..74641788 100644 --- a/_sources/reference/utils/rel/utils.rst.txt +++ b/_sources/reference/utils/rel/utils.rst.txt @@ -1,10 +1,10 @@ -``t_res.utils.REL.t_res.utils`` module +``utils.REL.utils`` module ========================== -.. autofunction:: t_res.utils.REL.t_res.utils.flatten_list_of_lists +.. autofunction:: utils.REL.utils.flatten_list_of_lists -.. autofunction:: t_res.utils.REL.t_res.utils.make_equal_len +.. autofunction:: utils.REL.utils.make_equal_len -.. autofunction:: t_res.utils.REL.t_res.utils.is_important_word +.. autofunction:: utils.REL.utils.is_important_word -.. autoattribute:: t_res.utils.REL.t_res.utils.STOPWORDS \ No newline at end of file +.. autoattribute:: utils.REL.utils.STOPWORDS \ No newline at end of file diff --git a/_sources/reference/utils/rel/vocabulary.rst.txt b/_sources/reference/utils/rel/vocabulary.rst.txt index 3516423d..5ab8da92 100644 --- a/_sources/reference/utils/rel/vocabulary.rst.txt +++ b/_sources/reference/utils/rel/vocabulary.rst.txt @@ -1,6 +1,6 @@ -``t_res.utils.REL.vocabulary`` module +``utils.REL.vocabulary`` module =============================== -.. autoclass:: t_res.utils.REL.vocabulary.Vocabulary +.. autoclass:: utils.REL.vocabulary.Vocabulary :members: :undoc-members: diff --git a/_sources/reference/utils/rel_e2e.rst.txt b/_sources/reference/utils/rel_e2e.rst.txt index 64c3130a..7145d30c 100644 --- a/_sources/reference/utils/rel_e2e.rst.txt +++ b/_sources/reference/utils/rel_e2e.rst.txt @@ -1,16 +1,16 @@ -``t_res.utils.rel_e2e`` module +``utils.rel_e2e`` module ======================== -.. autofunction:: t_res.utils.rel_e2e.rel_end_to_end +.. autofunction:: utils.rel_e2e.rel_end_to_end -.. autofunction:: t_res.utils.rel_e2e.get_rel_from_api +.. autofunction:: utils.rel_e2e.get_rel_from_api -.. autofunction:: t_res.utils.rel_e2e.match_wikipedia_to_wikidata +.. autofunction:: utils.rel_e2e.match_wikipedia_to_wikidata -.. autofunction:: t_res.utils.rel_e2e.match_ent +.. autofunction:: utils.rel_e2e.match_ent -.. autofunction:: t_res.utils.rel_e2e.postprocess_rel +.. autofunction:: utils.rel_e2e.postprocess_rel -.. autofunction:: t_res.utils.rel_e2e.store_rel +.. autofunction:: utils.rel_e2e.store_rel -.. autofunction:: t_res.utils.rel_e2e.run_rel_experiments \ No newline at end of file +.. autofunction:: utils.rel_e2e.run_rel_experiments \ No newline at end of file diff --git a/_sources/reference/utils/rel_utils.rst.txt b/_sources/reference/utils/rel_utils.rst.txt index 0ce4cb52..d3fb3638 100644 --- a/_sources/reference/utils/rel_utils.rst.txt +++ b/_sources/reference/utils/rel_utils.rst.txt @@ -1,14 +1,14 @@ -``t_res.utils.rel_utils`` module +``utils.rel_utils`` module ========================== -.. autofunction:: t_res.utils.rel_utils.get_db_emb +.. autofunction:: utils.rel_utils.get_db_emb -.. autofunction:: t_res.utils.rel_utils.eval_with_exception +.. autofunction:: utils.rel_utils.eval_with_exception -.. autofunction:: t_res.utils.rel_utils.prepare_initial_data +.. autofunction:: utils.rel_utils.prepare_initial_data -.. autofunction:: t_res.utils.rel_utils.rank_candidates +.. autofunction:: utils.rel_utils.rank_candidates -.. autofunction:: t_res.utils.rel_utils.add_publication +.. autofunction:: utils.rel_utils.add_publication -.. autofunction:: t_res.utils.rel_utils.prepare_rel_trainset \ No newline at end of file +.. autofunction:: utils.rel_utils.prepare_rel_trainset \ No newline at end of file diff --git a/_sources/t-res-api/index.rst.txt b/_sources/t-res-api/index.rst.txt index e5e36fc4..6ebb6775 100644 --- a/_sources/t-res-api/index.rst.txt +++ b/_sources/t-res-api/index.rst.txt @@ -1,11 +1,22 @@ ==================== -Running T-Res as API +Deploying the T-Res API ==================== -TODO +T-Res can also be deployed as a `FastAPI `_ via `Docker `_, +allowing remote users to access your T-Res pipeline instead of their own local installation. + +The API consists of the following files: + +* ``app/app_template.py`` +* ``app/configs/.py`` +* ``app/template.Dockerfile`` +* ``docker-compose.yml`` + +Example configuration files are provided in this repository, which can be adapted to fit your needs. .. toctree:: :maxdepth: 2 :caption: Table of contents: - installation \ No newline at end of file + installation + usage \ No newline at end of file diff --git a/_sources/t-res-api/installation.rst.txt b/_sources/t-res-api/installation.rst.txt index 40fdc472..435db162 100644 --- a/_sources/t-res-api/installation.rst.txt +++ b/_sources/t-res-api/installation.rst.txt @@ -1,5 +1,85 @@ ======================= -Installing T-Res as API +Deploying the T-Res API ======================= -TODO \ No newline at end of file +The T-Res API can be deployed either as a standalone docker container, +or via docker compose to deploy multiple configurations of the pipeline simultaneously behind +a reverse-proxy (`traefik `_). + +Docker and Docker Compose should be installed on your server according to +the `official installation guide `_ +before proceeding with the following steps to build and deploy the containers. + +.. + A bash script `builder.sh` has been included in the repository to conveniently (re-)deploy the example API: + .. code-block:: bash + ./builder.sh + + +1. Building the container +-------------- + +To build a docker image for the app using the default configuration provided (``t-res_deezy_reldisamb-wpubl-wmtops.py``), +run the following bash commands from the root of the repository: + +.. code-block:: bash + + export CONTAINER_NAME=t-res_deezy_reldisamb-wpubl-wmtops + sudo -E docker build -f app/template.Dockerfile --no-cache --build-arg APP_NAME=${CONTAINER_NAME} -t ${CONTAINER_NAME}_image . + +2. Deploying the container +-------------- + +The docker image built in step 1 can then be deployed by running the following command, providing the required +resources are available according to the :ref: `Resources and directory structure <../getting-started/resources.html>`_ section. + +.. code-block:: bash + + sudo docker run -p 80:80 -it \ + -v ${HOME}/T-Res/resources/:/app/resources/ \ + -v ${HOME}/T-Res/geoparser/:/app/geoparser/ \ + -v ${HOME}/T-Res/utils/:/app/utils/ \ + -v ${HOME}/T-Res/preprocessing/:/app/preprocessing/ \ + -v ${HOME}/T-Res/experiments/:/app/experiments/ \ + -v ${HOME}/T-Res/app/configs/:/app/configs/ \ + ${CONTAINER_NAME}_image:latest + + +3. Deploying multiple containers via Docker Compose +-------------- +To deploy the example configuration behind a traefik load-balancing server: + +.. code-block:: bash + + HOST_URL= sudo -E docker-compose up -d + +4. Configuring your deployment +-------------- + +1. Add your T-Res pipeline configuration file to the ``app/config`` directory. This file should instantiate the ``Recogniser``, ``Linker``, and ``Ranker`` to be used in your pipeline and store them in a dictionary called ``CONFIG``, which is then imported and used by the app. +2. Optionally, you can add or edit endpoints or app behaviour in the ``app/app_template.py`` file +3. Build your docker container as in step 1, setting the ``CONTAINER_NAME`` environment variable to your new configuration's name +4. Add a section to the docker-compose.yml, updating the service name, image and labels as follows: + + .. code-block:: yaml + + : + image: _image:latest + restart: always + expose: + - 80 + volumes: + - ${HOME}/T-Res/resources/:/app/resources/ + - ${HOME}/T-Res/geoparser/:/app/geoparser/ + - ${HOME}/T-Res/utils/:/app/utils/ + - ${HOME}/T-Res/preprocessing/:/app/preprocessing/ + - ${HOME}/T-Res/experiments/:/app/experiments/ + labels: + - traefik.enable=true + - traefik.http.services..loadbalancer.server.port=80 + - traefik.http.routers._router.service= + - traefik.http.routers._router.rule=Host(``, `0.0.0.0`) && PathPrefix(`/v2/t-res_`) + - traefik.http.middlewares.test-stripprefix-rwop.stripprefix.prefixes=/v2/t-res_ + - traefik.http.routers._router.middlewares=test-stripprefix-rwop + command: ["poetry", "run", "uvicorn", "app:app", "--proxy-headers", "--host", "0.0.0.0", "--port", "80", "--root-path", "/v2/t-res_deezy_reldisamb-wpubl-wmtops"] + diff --git a/_sources/t-res-api/usage.rst.txt b/_sources/t-res-api/usage.rst.txt new file mode 100644 index 00000000..ba940e15 --- /dev/null +++ b/_sources/t-res-api/usage.rst.txt @@ -0,0 +1,18 @@ +======================= +Using the T-Res API +======================= + +If you deploy the T-Res API according to the steps in the previous section, +it should now be available on your server as a HTTP API +(be sure to expose the correct ports - by default, the app is deployed to port 8000). +Automatically generated, interactive documentation (created by `Swagger`) is available at the ``/docs`` endpoint. + +The following example shows how to query the API via curl to resolve the toponyms in a single sentence: + +.. code-block:: bash + + curl -X GET http://20.0.184.45:8000/v2/t-res_deezy_reldisamb-wpubl-wmtops/toponym_resolution \ + -H "Content-Type: application/json" \ + -d '{"text": "Harvey, from London;Thomas and Elizabeth, Barnett.", "place": "Manchester", "place_wqid": "Q18125"}' + +See the ``app/api_usage.ipynb`` notebook for more examples of how to use the API's various endpoints via Python. \ No newline at end of file diff --git a/_static/check-solid.svg b/_static/check-solid.svg deleted file mode 100644 index 92fad4b5..00000000 --- a/_static/check-solid.svg +++ /dev/null @@ -1,4 +0,0 @@ - - - - diff --git a/_static/clipboard.min.js b/_static/clipboard.min.js deleted file mode 100644 index 54b3c463..00000000 --- a/_static/clipboard.min.js +++ /dev/null @@ -1,7 +0,0 @@ -/*! - * clipboard.js v2.0.8 - * https://clipboardjs.com/ - * - * Licensed MIT © Zeno Rocha - */ -!function(t,e){"object"==typeof exports&&"object"==typeof module?module.exports=e():"function"==typeof define&&define.amd?define([],e):"object"==typeof exports?exports.ClipboardJS=e():t.ClipboardJS=e()}(this,function(){return n={686:function(t,e,n){"use strict";n.d(e,{default:function(){return o}});var e=n(279),i=n.n(e),e=n(370),u=n.n(e),e=n(817),c=n.n(e);function a(t){try{return document.execCommand(t)}catch(t){return}}var f=function(t){t=c()(t);return a("cut"),t};var l=function(t){var e,n,o,r=1 - - - - diff --git a/_static/copybutton.css b/_static/copybutton.css deleted file mode 100644 index f1916ec7..00000000 --- a/_static/copybutton.css +++ /dev/null @@ -1,94 +0,0 @@ -/* Copy buttons */ -button.copybtn { - position: absolute; - display: flex; - top: .3em; - right: .3em; - width: 1.7em; - height: 1.7em; - opacity: 0; - transition: opacity 0.3s, border .3s, background-color .3s; - user-select: none; - padding: 0; - border: none; - outline: none; - border-radius: 0.4em; - /* The colors that GitHub uses */ - border: #1b1f2426 1px solid; - background-color: #f6f8fa; - color: #57606a; -} - -button.copybtn.success { - border-color: #22863a; - color: #22863a; -} - -button.copybtn svg { - stroke: currentColor; - width: 1.5em; - height: 1.5em; - padding: 0.1em; -} - -div.highlight { - position: relative; -} - -/* Show the copybutton */ -.highlight:hover button.copybtn, button.copybtn.success { - opacity: 1; -} - -.highlight button.copybtn:hover { - background-color: rgb(235, 235, 235); -} - -.highlight button.copybtn:active { - background-color: rgb(187, 187, 187); -} - -/** - * A minimal CSS-only tooltip copied from: - * https://codepen.io/mildrenben/pen/rVBrpK - * - * To use, write HTML like the following: - * - *

Short

- */ - .o-tooltip--left { - position: relative; - } - - .o-tooltip--left:after { - opacity: 0; - visibility: hidden; - position: absolute; - content: attr(data-tooltip); - padding: .2em; - font-size: .8em; - left: -.2em; - background: grey; - color: white; - white-space: nowrap; - z-index: 2; - border-radius: 2px; - transform: translateX(-102%) translateY(0); - transition: opacity 0.2s cubic-bezier(0.64, 0.09, 0.08, 1), transform 0.2s cubic-bezier(0.64, 0.09, 0.08, 1); -} - -.o-tooltip--left:hover:after { - display: block; - opacity: 1; - visibility: visible; - transform: translateX(-100%) translateY(0); - transition: opacity 0.2s cubic-bezier(0.64, 0.09, 0.08, 1), transform 0.2s cubic-bezier(0.64, 0.09, 0.08, 1); - transition-delay: .5s; -} - -/* By default the copy button shouldn't show up when printing a page */ -@media print { - button.copybtn { - display: none; - } -} diff --git a/_static/copybutton.js b/_static/copybutton.js deleted file mode 100644 index 2ea7ff3e..00000000 --- a/_static/copybutton.js +++ /dev/null @@ -1,248 +0,0 @@ -// Localization support -const messages = { - 'en': { - 'copy': 'Copy', - 'copy_to_clipboard': 'Copy to clipboard', - 'copy_success': 'Copied!', - 'copy_failure': 'Failed to copy', - }, - 'es' : { - 'copy': 'Copiar', - 'copy_to_clipboard': 'Copiar al portapapeles', - 'copy_success': '¡Copiado!', - 'copy_failure': 'Error al copiar', - }, - 'de' : { - 'copy': 'Kopieren', - 'copy_to_clipboard': 'In die Zwischenablage kopieren', - 'copy_success': 'Kopiert!', - 'copy_failure': 'Fehler beim Kopieren', - }, - 'fr' : { - 'copy': 'Copier', - 'copy_to_clipboard': 'Copier dans le presse-papier', - 'copy_success': 'Copié !', - 'copy_failure': 'Échec de la copie', - }, - 'ru': { - 'copy': 'Скопировать', - 'copy_to_clipboard': 'Скопировать в буфер', - 'copy_success': 'Скопировано!', - 'copy_failure': 'Не удалось скопировать', - }, - 'zh-CN': { - 'copy': '复制', - 'copy_to_clipboard': '复制到剪贴板', - 'copy_success': '复制成功!', - 'copy_failure': '复制失败', - }, - 'it' : { - 'copy': 'Copiare', - 'copy_to_clipboard': 'Copiato negli appunti', - 'copy_success': 'Copiato!', - 'copy_failure': 'Errore durante la copia', - } -} - -let locale = 'en' -if( document.documentElement.lang !== undefined - && messages[document.documentElement.lang] !== undefined ) { - locale = document.documentElement.lang -} - -let doc_url_root = DOCUMENTATION_OPTIONS.URL_ROOT; -if (doc_url_root == '#') { - doc_url_root = ''; -} - -/** - * SVG files for our copy buttons - */ -let iconCheck = ` - ${messages[locale]['copy_success']} - - -` - -// If the user specified their own SVG use that, otherwise use the default -let iconCopy = ``; -if (!iconCopy) { - iconCopy = ` - ${messages[locale]['copy_to_clipboard']} - - - -` -} - -/** - * Set up copy/paste for code blocks - */ - -const runWhenDOMLoaded = cb => { - if (document.readyState != 'loading') { - cb() - } else if (document.addEventListener) { - document.addEventListener('DOMContentLoaded', cb) - } else { - document.attachEvent('onreadystatechange', function() { - if (document.readyState == 'complete') cb() - }) - } -} - -const codeCellId = index => `codecell${index}` - -// Clears selected text since ClipboardJS will select the text when copying -const clearSelection = () => { - if (window.getSelection) { - window.getSelection().removeAllRanges() - } else if (document.selection) { - document.selection.empty() - } -} - -// Changes tooltip text for a moment, then changes it back -// We want the timeout of our `success` class to be a bit shorter than the -// tooltip and icon change, so that we can hide the icon before changing back. -var timeoutIcon = 2000; -var timeoutSuccessClass = 1500; - -const temporarilyChangeTooltip = (el, oldText, newText) => { - el.setAttribute('data-tooltip', newText) - el.classList.add('success') - // Remove success a little bit sooner than we change the tooltip - // So that we can use CSS to hide the copybutton first - setTimeout(() => el.classList.remove('success'), timeoutSuccessClass) - setTimeout(() => el.setAttribute('data-tooltip', oldText), timeoutIcon) -} - -// Changes the copy button icon for two seconds, then changes it back -const temporarilyChangeIcon = (el) => { - el.innerHTML = iconCheck; - setTimeout(() => {el.innerHTML = iconCopy}, timeoutIcon) -} - -const addCopyButtonToCodeCells = () => { - // If ClipboardJS hasn't loaded, wait a bit and try again. This - // happens because we load ClipboardJS asynchronously. - if (window.ClipboardJS === undefined) { - setTimeout(addCopyButtonToCodeCells, 250) - return - } - - // Add copybuttons to all of our code cells - const COPYBUTTON_SELECTOR = 'div.highlight pre'; - const codeCells = document.querySelectorAll(COPYBUTTON_SELECTOR) - codeCells.forEach((codeCell, index) => { - const id = codeCellId(index) - codeCell.setAttribute('id', id) - - const clipboardButton = id => - `` - codeCell.insertAdjacentHTML('afterend', clipboardButton(id)) - }) - -function escapeRegExp(string) { - return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string -} - -/** - * Removes excluded text from a Node. - * - * @param {Node} target Node to filter. - * @param {string} exclude CSS selector of nodes to exclude. - * @returns {DOMString} Text from `target` with text removed. - */ -function filterText(target, exclude) { - const clone = target.cloneNode(true); // clone as to not modify the live DOM - if (exclude) { - // remove excluded nodes - clone.querySelectorAll(exclude).forEach(node => node.remove()); - } - return clone.innerText; -} - -// Callback when a copy button is clicked. Will be passed the node that was clicked -// should then grab the text and replace pieces of text that shouldn't be used in output -function formatCopyText(textContent, copybuttonPromptText, isRegexp = false, onlyCopyPromptLines = true, removePrompts = true, copyEmptyLines = true, lineContinuationChar = "", hereDocDelim = "") { - var regexp; - var match; - - // Do we check for line continuation characters and "HERE-documents"? - var useLineCont = !!lineContinuationChar - var useHereDoc = !!hereDocDelim - - // create regexp to capture prompt and remaining line - if (isRegexp) { - regexp = new RegExp('^(' + copybuttonPromptText + ')(.*)') - } else { - regexp = new RegExp('^(' + escapeRegExp(copybuttonPromptText) + ')(.*)') - } - - const outputLines = []; - var promptFound = false; - var gotLineCont = false; - var gotHereDoc = false; - const lineGotPrompt = []; - for (const line of textContent.split('\n')) { - match = line.match(regexp) - if (match || gotLineCont || gotHereDoc) { - promptFound = regexp.test(line) - lineGotPrompt.push(promptFound) - if (removePrompts && promptFound) { - outputLines.push(match[2]) - } else { - outputLines.push(line) - } - gotLineCont = line.endsWith(lineContinuationChar) & useLineCont - if (line.includes(hereDocDelim) & useHereDoc) - gotHereDoc = !gotHereDoc - } else if (!onlyCopyPromptLines) { - outputLines.push(line) - } else if (copyEmptyLines && line.trim() === '') { - outputLines.push(line) - } - } - - // If no lines with the prompt were found then just use original lines - if (lineGotPrompt.some(v => v === true)) { - textContent = outputLines.join('\n'); - } - - // Remove a trailing newline to avoid auto-running when pasting - if (textContent.endsWith("\n")) { - textContent = textContent.slice(0, -1) - } - return textContent -} - - -var copyTargetText = (trigger) => { - var target = document.querySelector(trigger.attributes['data-clipboard-target'].value); - - // get filtered text - let exclude = '.linenos'; - - let text = filterText(target, exclude); - return formatCopyText(text, '', false, true, true, true, '', '') -} - - // Initialize with a callback so we can modify the text before copy - const clipboard = new ClipboardJS('.copybtn', {text: copyTargetText}) - - // Update UI with error/success messages - clipboard.on('success', event => { - clearSelection() - temporarilyChangeTooltip(event.trigger, messages[locale]['copy'], messages[locale]['copy_success']) - temporarilyChangeIcon(event.trigger) - }) - - clipboard.on('error', event => { - temporarilyChangeTooltip(event.trigger, messages[locale]['copy'], messages[locale]['copy_failure']) - }) -} - -runWhenDOMLoaded(addCopyButtonToCodeCells) \ No newline at end of file diff --git a/_static/copybutton_funcs.js b/_static/copybutton_funcs.js deleted file mode 100644 index dbe1aaad..00000000 --- a/_static/copybutton_funcs.js +++ /dev/null @@ -1,73 +0,0 @@ -function escapeRegExp(string) { - return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string -} - -/** - * Removes excluded text from a Node. - * - * @param {Node} target Node to filter. - * @param {string} exclude CSS selector of nodes to exclude. - * @returns {DOMString} Text from `target` with text removed. - */ -export function filterText(target, exclude) { - const clone = target.cloneNode(true); // clone as to not modify the live DOM - if (exclude) { - // remove excluded nodes - clone.querySelectorAll(exclude).forEach(node => node.remove()); - } - return clone.innerText; -} - -// Callback when a copy button is clicked. Will be passed the node that was clicked -// should then grab the text and replace pieces of text that shouldn't be used in output -export function formatCopyText(textContent, copybuttonPromptText, isRegexp = false, onlyCopyPromptLines = true, removePrompts = true, copyEmptyLines = true, lineContinuationChar = "", hereDocDelim = "") { - var regexp; - var match; - - // Do we check for line continuation characters and "HERE-documents"? - var useLineCont = !!lineContinuationChar - var useHereDoc = !!hereDocDelim - - // create regexp to capture prompt and remaining line - if (isRegexp) { - regexp = new RegExp('^(' + copybuttonPromptText + ')(.*)') - } else { - regexp = new RegExp('^(' + escapeRegExp(copybuttonPromptText) + ')(.*)') - } - - const outputLines = []; - var promptFound = false; - var gotLineCont = false; - var gotHereDoc = false; - const lineGotPrompt = []; - for (const line of textContent.split('\n')) { - match = line.match(regexp) - if (match || gotLineCont || gotHereDoc) { - promptFound = regexp.test(line) - lineGotPrompt.push(promptFound) - if (removePrompts && promptFound) { - outputLines.push(match[2]) - } else { - outputLines.push(line) - } - gotLineCont = line.endsWith(lineContinuationChar) & useLineCont - if (line.includes(hereDocDelim) & useHereDoc) - gotHereDoc = !gotHereDoc - } else if (!onlyCopyPromptLines) { - outputLines.push(line) - } else if (copyEmptyLines && line.trim() === '') { - outputLines.push(line) - } - } - - // If no lines with the prompt were found then just use original lines - if (lineGotPrompt.some(v => v === true)) { - textContent = outputLines.join('\n'); - } - - // Remove a trailing newline to avoid auto-running when pasting - if (textContent.endsWith("\n")) { - textContent = textContent.slice(0, -1) - } - return textContent -} diff --git a/experiments/index.html b/experiments/index.html index 8132fc13..6691f6d4 100644 --- a/experiments/index.html +++ b/experiments/index.html @@ -7,7 +7,6 @@ Experiments and evaluation — T-Res 0.1.0 documentation - @@ -16,12 +15,10 @@ - - - + @@ -43,7 +40,7 @@ @@ -139,13 +136,9 @@

1. Instantiate the Pipeline
from geoparser import pipeline
 
-geoparser = pipeline.Pipeline(resources_path="../resources/")
+geoparser = pipeline.Pipeline()
 
-
-

Note

-

You should update the resources path argument to reflect your set up.

-

You can also instantiate a pipeline using a customised Recogniser, Ranker and Linker. To see the different options, refer to the sections on instantiating each of them: Recogniser, Ranker @@ -650,7 +643,7 @@

1.1. Perfectmatch, partialmatch, and levenshteinmyranker = ranking.Ranker( method="perfectmatch", # or "partialmatch" or "levenshtein" - resources_path="resources/", + resources_path="resources/wikidata/", ) @@ -706,7 +699,7 @@
Option 1. Train a DeezyMatch model from scratch, given an existing string pa myranker = ranking.Ranker( # Generic Ranker parameters: method="deezymatch", - resources_path="resources/", + resources_path="resources/wikidata/", # Parameters to create the string pair dataset: strvar_parameters=dict(), # Parameters to train, load and use a DeezyMatch model: @@ -792,7 +785,7 @@
Option 2. Train a DeezyMatch model from scratch, including generating a stri myranker = ranking.Ranker( # Generic Ranker parameters: method="deezymatch", - resources_path="resources/", + resources_path="resources/wikidata/", # Parameters to create the string pair dataset: strvar_parameters={ "ocr_threshold": 60, @@ -1071,7 +1064,7 @@

1.2. reldisamb<

2. Load the resources

The following line of code loads the resources required by the Linker, regardless of the Linker method.

-
mylinker.load_resources()
+
mylinker.linking_resources = mylinker.load_resources()
 
diff --git a/getting-started/index.html b/getting-started/index.html index 929dd991..6a57dba4 100644 --- a/getting-started/index.html +++ b/getting-started/index.html @@ -7,7 +7,6 @@ Getting started — T-Res 0.1.0 documentation - @@ -16,8 +15,6 @@ - - @@ -49,7 +46,7 @@
  • Reference
  • -
  • Running T-Res as API
  • +
  • Deploying the T-Res API
  • Experiments and evaluation
  • diff --git a/getting-started/installation.html b/getting-started/installation.html index fee4468e..515de65d 100644 --- a/getting-started/installation.html +++ b/getting-started/installation.html @@ -7,7 +7,6 @@ Installing T-Res — T-Res 0.1.0 documentation - @@ -16,8 +15,6 @@ - - @@ -57,7 +54,7 @@
  • Reference
  • -
  • Running T-Res as API
  • +
  • Deploying the T-Res API
  • Experiments and evaluation
  • diff --git a/getting-started/resources.html b/getting-started/resources.html index dae2d699..a4071bf4 100644 --- a/getting-started/resources.html +++ b/getting-started/resources.html @@ -7,7 +7,6 @@ Resources and directory structure — T-Res 0.1.0 documentation - @@ -16,8 +15,6 @@ - - @@ -71,7 +68,7 @@
  • Reference
  • -
  • Running T-Res as API
  • +
  • Deploying the T-Res API
  • Experiments and evaluation
  • @@ -588,9 +585,6 @@

    Summary of resources and directory structure
    T-Res/
    -├── t-res/
    -│   ├── geoparser/
    -│   └── utils/
     ├── app/
     ├── evaluation/
     ├── examples/
    @@ -601,6 +595,7 @@ 

    Summary of resources and directory structure?) is used to indicate resources which are only required diff --git a/index.html b/index.html index 7b067136..7dd427d6 100644 --- a/index.html +++ b/index.html @@ -7,7 +7,6 @@ T-Res: A Toponym Resolution Pipeline for Digitised Historical Newspapers — T-Res 0.1.0 documentation - @@ -16,8 +15,6 @@ - - @@ -43,7 +40,7 @@ @@ -120,28 +117,35 @@

    T-Res: A Toponym Resolution Pipeline for Digitised Historical Newspapers
  • Reference
  • -
  • Running T-Res as API
  • @@ -101,7 +98,7 @@

    geoparser

    diff --git a/reference/geoparser/linker.html b/reference/geoparser/linker.html index a15ac6bd..7d3a21ea 100644 --- a/reference/geoparser/linker.html +++ b/reference/geoparser/linker.html @@ -4,10 +4,9 @@ - t_res.geoparser.linking.Linker — T-Res 0.1.0 documentation + geoparser.linking.Linker — T-Res 0.1.0 documentation - @@ -16,12 +15,10 @@ - - - + @@ -45,16 +42,16 @@
  • Getting started
  • Reference
  • -
  • Running T-Res as API
  • +
  • Deploying the T-Res API
  • Experiments and evaluation
  • @@ -74,7 +71,7 @@
  • »
  • Reference »
  • geoparser module »
  • -
  • t_res.geoparser.linking.Linker
  • +
  • geoparser.linking.Linker
  • View page source
  • @@ -84,11 +81,11 @@
    -
    -

    t_res.geoparser.linking.Linker

    +
    +

    geoparser.linking.Linker

    -
    -class t_res.geoparser.linking.Linker(method: Literal['mostpopular', 'reldisamb', 'bydistance'], resources_path: str, experiments_path: Optional[str] = '../experiments', linking_resources: Optional[dict] = {}, overwrite_training: Optional[bool] = False, rel_params: Optional[dict] = None)
    +
    +class geoparser.linking.Linker(method: Literal['mostpopular', 'reldisamb', 'bydistance'], resources_path: str, linking_resources: Optional[dict] = {}, overwrite_training: Optional[bool] = False, rel_params: Optional[dict] = {'data_path': '../experiments/outputs/data/lwm/', 'db_embeddings': None, 'default_publname': 'United Kingdom', 'default_publwqid': 'Q145', 'do_test': False, 'model_path': '../resources/models/disambiguation/', 'training_split': 'originalsplit', 'with_publication': True, 'without_microtoponyms': True})

    The Linker class provides methods for entity linking, which is the task of associating mentions in text with their corresponding entities in a knowledge base.

    @@ -97,9 +94,7 @@

    t_res.geoparser
    • method (Literal["mostpopular", "reldisamb", "bydistance"]) – The linking method to use.

    • -
    • resources_path (str) – The path to the linking resources.

    • -
    • experiments_path (str, optional) – The path to the experiments -directory. Default is “../experiments/”.

    • +
    • resources_path (str, optional) – The path to the linking resources.

    • linking_resources (dict, optional) – Dictionary containing the necessary linking resources. Defaults to dict() (an empty dictionary).

    • @@ -115,8 +110,7 @@

      t_res.geoparser

      Example:

      linker = Linker(
         method="mostpopular",
      -  resources_path="/path/to/resources/",
      -  experiments_path="/path/to/experiments/",
      +  resources_path="/path/to/linking/resources/",
         linking_resources={},
         overwrite_training=True,
         rel_params={"with_publication": True, "do_test": True}
      @@ -136,7 +130,6 @@ 

      t_res.geoparser mylinker = linking.Linker( method="reldisamb", resources_path="../resources/", - experiments_path="../experiments/", linking_resources=dict(), rel_params={ "model_path": "../resources/models/disambiguation/", @@ -171,8 +164,8 @@

      t_res.geoparser

      -
      -by_distance(dict_mention: dict, origin_wqid: Optional[str] = '') Tuple[str, float, dict]
      +
      +by_distance(dict_mention: dict, origin_wqid: Optional[str] = '') Tuple[str, float, dict]

      Select candidate based on distance to the place of publication.

      Parameters
      @@ -205,8 +198,8 @@

      t_res.geoparser

      -
      -load_resources() dict
      +
      +load_resources() dict

      Loads the linking resources.

      Returns
      @@ -223,8 +216,8 @@

      t_res.geoparser

      - +

      Select most popular candidate, given Wikipedia’s in-link structure.

      Parameters
      @@ -251,8 +244,8 @@

      t_res.geoparser

      -
      -run(dict_mention: dict) Tuple[str, float, dict]
      +
      +run(dict_mention: dict) Tuple[str, float, dict]

      Executes the linking process based on the specified unsupervised method.

      @@ -263,13 +256,13 @@

      t_res.geoparser

      The result of the linking process. For details, see below:

      • If the method provided when initialising the -Linker() object was +Linker() object was "mostpopular", see -most_popular().

      • +most_popular().

      • If the method provided when initialising the -Linker() object was +Linker() object was "bydistance", see -by_distance().

      • +by_distance().

      @@ -280,13 +273,13 @@

      t_res.geoparser

      -
      -train_load_model(myranker: t_res.geoparser.ranking.Ranker, split: Optional[str] = 'originalsplit') t_res.utils.REL.entity_disambiguation.EntityDisambiguation
      +
      +train_load_model(myranker: geoparser.ranking.Ranker, split: Optional[str] = 'originalsplit') utils.REL.entity_disambiguation.EntityDisambiguation

      Trains or loads the entity disambiguation model.

      Parameters
      @@ -332,8 +325,8 @@

      t_res.geoparser

      -
      -linking.RANDOM_SEED = 42
      +
      +linking.RANDOM_SEED = 42

    @@ -343,7 +336,7 @@

    t_res.geoparser


    diff --git a/reference/geoparser/pipeline.html b/reference/geoparser/pipeline.html index 77c47b2c..ab7cc4cd 100644 --- a/reference/geoparser/pipeline.html +++ b/reference/geoparser/pipeline.html @@ -4,10 +4,9 @@ - t_res.geoparser.pipeline.Pipeline — T-Res 0.1.0 documentation + geoparser.pipeline.Pipeline — T-Res 0.1.0 documentation - @@ -16,13 +15,11 @@ - - - + @@ -45,16 +42,16 @@
  • Getting started
  • Reference
  • -
  • Running T-Res as API
  • +
  • Deploying the T-Res API
  • Experiments and evaluation
  • @@ -74,7 +71,7 @@
  • »
  • Reference »
  • geoparser module »
  • -
  • t_res.geoparser.pipeline.Pipeline
  • +
  • geoparser.pipeline.Pipeline
  • View page source
  • @@ -84,30 +81,27 @@
    -
    -

    t_res.geoparser.pipeline.Pipeline

    +
    +

    geoparser.pipeline.Pipeline

    -
    -class t_res.geoparser.pipeline.Pipeline(myner: Optional[t_res.geoparser.recogniser.Recogniser] = None, myranker: Optional[t_res.geoparser.ranking.Ranker] = None, mylinker: Optional[t_res.geoparser.linking.Linker] = None, resources_path: Optional[str] = None, experiments_path: Optional[str] = None)
    +
    +class geoparser.pipeline.Pipeline(myner: Optional[geoparser.recogniser.Recogniser] = None, myranker: Optional[geoparser.ranking.Ranker] = None, mylinker: Optional[geoparser.linking.Linker] = None)

    Represents a pipeline for processing a text using natural language processing, including Named Entity Recognition (NER), Ranking, and Linking, to geoparse any entities in the text.

    Parameters
      -
    • myner (recogniser.Recogniser, optional) – The NER (Named Entity +

    • myner (recogniser.Recogniser, optional) – The NER (Named Entity Recogniser) object to use in the pipeline. If None, a default Recogniser will be instantiated. For the default settings, see Notes below.

    • -
    • myranker (ranking.Ranker, optional) – The Ranker object to use in +

    • myranker (ranking.Ranker, optional) – The Ranker object to use in the pipeline. If None, the default Ranker will be instantiated. For the default settings, see Notes below.

    • -
    • mylinker (linking.Linker, optional) – The Linker object to use in +

    • mylinker (linking.Linker, optional) – The Linker object to use in the pipeline. If None, the default Linker will be instantiated. For the default settings, see Notes below.

    • -
    • resources_path (str, optional) – The path to your resources directory.

    • -
    • experiments_path (str, optional) – The path to the experiments directory. -Default is “../experiments”.

    @@ -140,7 +134,7 @@

    t_res.geoparser
  • The default settings for the Ranker:

    ranking.Ranker(
         method="perfectmatch",
    -    resources_path=resources_path,
    +    resources_path="../resources/wikidata/",
     )
     
    @@ -148,7 +142,7 @@

    t_res.geoparser
  • The default settings for the Linker:

    linking.Linker(
         method="mostpopular",
    -    resources_path=resources_path,
    +    resources_path="../resources/",
     )
     
    @@ -156,13 +150,13 @@

    t_res.geoparser

  • -
    -format_prediction(mention, sentence: str, wk_cands: Optional[dict] = None, context: Optional[Tuple[str, str]] = ('', ''), sent_idx: Optional[int] = 0, place: Optional[str] = '', place_wqid: Optional[str] = '') dict
    +
    +format_prediction(mention, sentence: str, wk_cands: Optional[dict] = None, context: Optional[Tuple[str, str]] = ('', ''), sent_idx: Optional[int] = 0, place: Optional[str] = '', place_wqid: Optional[str] = '') dict
    -
    -run_candidate_selection(document_dataset: List[dict]) dict
    +
    +run_candidate_selection(document_dataset: List[dict]) dict

    Performs candidate selection on already identified toponyms, resulting from the run_text_recognition method. Given a list of dictionaries corresponding to mentions, this method @@ -207,8 +201,8 @@

    t_res.geoparser

    -
    -run_disambiguation(dataset, wk_cands, place: Optional[str] = '', place_wqid: Optional[str] = '')
    +
    +run_disambiguation(dataset, wk_cands, place: Optional[str] = '', place_wqid: Optional[str] = '')

    Performs entity disambiguation given a list of already identified toponyms and selected candidates.

    @@ -262,8 +256,8 @@

    t_res.geoparser

    -
    -run_sentence(sentence: str, sent_idx: Optional[int] = 0, context: Optional[Tuple[str, str]] = ('', ''), place: Optional[str] = '', place_wqid: Optional[str] = '', postprocess_output: Optional[bool] = True, without_microtoponyms: Optional[bool] = False) List[dict]
    +
    +run_sentence(sentence: str, sent_idx: Optional[int] = 0, context: Optional[Tuple[str, str]] = ('', ''), place: Optional[str] = '', place_wqid: Optional[str] = '', postprocess_output: Optional[bool] = True, without_microtoponyms: Optional[bool] = False) List[dict]

    Runs the pipeline on a single sentence.

    Parameters
    @@ -335,13 +329,13 @@

    t_res.geoparser

    -
    -run_sentence_recognition(sentence) List[dict]
    +
    +run_sentence_recognition(sentence) List[dict]
    -
    -run_text(text: str, place: Optional[str] = '', place_wqid: Optional[str] = '', postprocess_output: Optional[bool] = True) List[dict]
    +
    +run_text(text: str, place: Optional[str] = '', place_wqid: Optional[str] = '', postprocess_output: Optional[bool] = True) List[dict]

    Runs the pipeline on a text document.

    Parameters
    @@ -405,19 +399,19 @@

    t_res.geoparser sentences, then finds relevant candidates and ranks them, and finally links them to the Wikidata ID.

    This method runs the -run_sentence() method for +run_sentence() method for each of the document’s sentences. The without_microtoponyms keyword, passed to run_sentence comes from the Linker’s (passed when initialising the -Pipeline() object) rel_params -parameter. See geoparser.linking.Linker for +Pipeline() object) rel_params +parameter. See geoparser.linking.Linker for instructions on how to set that up.

    -
    -run_text_recognition(text: str, place: Optional[str] = '', place_wqid: Optional[str] = '') List[dict]
    +
    +run_text_recognition(text: str, place: Optional[str] = '', place_wqid: Optional[str] = '') List[dict]

    Runs the NER on a text document and returns the recognised entities in the format required by future steps: candidate selection and entity disambiguation.

    @@ -482,7 +476,7 @@

    t_res.geoparser

    diff --git a/reference/geoparser/ranker.html b/reference/geoparser/ranker.html index c18b7f4d..db89c4c2 100644 --- a/reference/geoparser/ranker.html +++ b/reference/geoparser/ranker.html @@ -4,10 +4,9 @@ - t_res.geoparser.ranking. Ranker — T-Res 0.1.0 documentation + geoparser.ranking. Ranker — T-Res 0.1.0 documentation - @@ -16,13 +15,11 @@ - - - - + + @@ -45,16 +42,16 @@
  • Getting started
  • Reference
  • -
  • Running T-Res as API
  • +
  • Deploying the T-Res API
  • Experiments and evaluation
  • @@ -74,7 +71,7 @@
  • »
  • Reference »
  • geoparser module »
  • -
  • t_res.geoparser.ranking. Ranker
  • +
  • geoparser.ranking. Ranker
  • View page source
  • @@ -84,11 +81,11 @@
    -
    -

    t_res.geoparser.ranking. Ranker

    +
    +

    geoparser.ranking. Ranker

    -
    -class t_res.geoparser.ranking.Ranker(method: Literal['perfectmatch', 'partialmatch', 'levenshtein', 'deezymatch'], resources_path: str, mentions_to_wikidata: Optional[dict] = {}, wikidata_to_mentions: Optional[dict] = {}, strvar_parameters: Optional[dict] = None, deezy_parameters: Optional[dict] = None, already_collected_cands: Optional[dict] = None)
    +
    +class geoparser.ranking.Ranker(method: Literal['perfectmatch', 'partialmatch', 'levenshtein', 'deezymatch'], resources_path: str, mentions_to_wikidata: Optional[dict] = {}, wikidata_to_mentions: Optional[dict] = {}, strvar_parameters: Optional[dict] = {'max_len': 15, 'min_len': 5, 'ocr_threshold': 60, 'overwrite_dataset': False, 'top_threshold': 85, 'w2v_ocr_model': 'w2v_*_news', 'w2v_ocr_path': '/home/runner/work/T-Res/T-Res/docs/resources/models/w2v'}, deezy_parameters: Optional[dict] = {'dm_cands': 'wkdtalts', 'dm_model': 'w2v_ocr', 'dm_output': 'deezymatch_on_the_fly', 'dm_path': '/home/runner/work/T-Res/T-Res/docs/resources/deezymatch', 'do_test': False, 'num_candidates': 1, 'overwrite_training': False, 'ranking_metric': 'faiss', 'selection_threshold': 50, 'verbose': False}, already_collected_cands: Optional[dict] = {})

    The Ranker class implements a system for candidate selection through string variation ranking. It provides methods to select candidates based on different matching approaches, such as perfect match, partial match, Levenshtein distance, @@ -103,11 +100,11 @@

    t_res.geoparser
  • mentions_to_wikidata (dict, optional) – An empty dictionary which will store the mapping between mentions and Wikidata IDs, which will be loaded through the -load_resources() method.

  • +load_resources() method.

  • wikidata_to_mentions (dict, optional) – An empty dictionary which will store the mapping between Wikidata IDs and mentions, which will be loaded through the -load_resources() method.

  • +load_resources() method.

  • strvar_parameters (dict, optional) – Dictionary of string variation parameters required to create a DeezyMatch training dataset. For the default settings, see Notes below.

  • @@ -127,7 +124,7 @@

    t_res.geoparser

    >>> # Load resources
    ->>> ranker.load_resources()
    +>>> ranker.mentions_to_wikidata = ranker.load_resources()
     
    >>> # Train the ranker (if applicable)
    @@ -189,8 +186,8 @@ 

    t_res.geoparser

    -
    -check_if_contained(query: str, row: pandas.core.series.Series) float
    +
    +check_if_contained(query: str, row: pandas.core.series.Series) float

    Returns the amount of overlap, if a mention is contained within a row in the dataset.

    @@ -222,8 +219,8 @@

    t_res.geoparser

    -
    -damlev_dist(query: str, row: pandas.core.series.Series) float
    +
    +damlev_dist(query: str, row: pandas.core.series.Series) float

    Calculate the Damerau-Levenshtein distance between a mention and a row in the dataset.

    @@ -263,8 +260,8 @@

    t_res.geoparser

    -
    -deezy_on_the_fly(queries: List[str]) Tuple[dict, dict]
    +
    +deezy_on_the_fly(queries: List[str]) Tuple[dict, dict]

    Perform DeezyMatch (a deep neural network approach to fuzzy string matching) on-the-fly for a list of given mentions (queries).

    @@ -292,7 +289,7 @@

    t_res.geoparser

    Example

    >>> ranker = Ranker(...)
    ->>> ranker.load_resources()
    +>>> ranker.mentions_to_wikidata = ranker.load_resources()
     >>> queries = ['London', 'Shefrield']
     >>> candidates, already_collected = ranker.deezy_on_the_fly(queries)
     >>> print(candidates)
    @@ -309,14 +306,14 @@ 

    t_res.geoparser process for that query. For the remaining queries, it uses the DeezyMatch model to generate candidates and ranks them based on the specified ranking metric and selection threshold, -provided when initialising the Ranker() +provided when initialising the Ranker() object.

    -
    -find_candidates(mentions: List[dict]) Tuple[dict, dict]
    +
    +find_candidates(mentions: List[dict]) Tuple[dict, dict]

    Find candidates for the given mentions using the selected ranking method.

    @@ -365,7 +362,7 @@

    t_res.geoparser mention using the selected ranking method. It first extracts the queries from the mentions and then calls the appropriate method based on the ranking method chosen when initialising the -Ranker() object.

    +Ranker() object.

    The method returns a dictionary that maps each original mention to a sub-dictionary containing the mention variations as keys and their corresponding Wikidata match scores as values.

    @@ -376,8 +373,8 @@

    t_res.geoparser

    -
    -load_resources() dict
    +
    +load_resources() dict

    Load the ranker resources.

    Returns
    @@ -396,7 +393,7 @@

    t_res.geoparser

    This method loads the mentions-to-wikidata and wikidata-to-mentions dictionaries from the resources directory, specified when initialising the -Ranker(). They are required for +Ranker(). They are required for performing candidate selection and ranking.

    It filters the dictionaries to remove noise and updates the class attributes accordingly.

    @@ -407,8 +404,8 @@

    t_res.geoparser

    -
    -partial_match(queries: List[str], damlev: bool) Tuple[dict, dict]
    +
    +partial_match(queries: List[str], damlev: bool) Tuple[dict, dict]

    Perform partial matching for a list of given mentions (queries).

    Parameters
    @@ -459,8 +456,8 @@

    t_res.geoparser

    -
    -perfect_match(queries: List[str]) Tuple[dict, dict]
    +
    +perfect_match(queries: List[str]) Tuple[dict, dict]

    Perform perfect matching between a provided list of mentions (queries) and the altnames in the knowledge base.

    @@ -497,8 +494,8 @@

    t_res.geoparser

    -
    -run(queries: List[str]) Tuple[dict, dict]
    +
    +run(queries: List[str]) Tuple[dict, dict]

    Run the appropriate ranking method based on the specified method.

    Parameters
    @@ -516,7 +513,7 @@

    t_res.geoparser

    Example

    >>> myranker = Ranker(method="perfectmatch", ...)
    ->>> myranker.load_resources()
    +>>> myranker.mentions_to_wikidata = myranker.load_resources()
     >>> queries = ['London', 'Barcelona', 'Bologna']
     >>> candidates, already_collected = myranker.run(queries)
     >>> print(candidates)
    @@ -529,13 +526,13 @@ 

    t_res.geoparser

    Note

    This method executes the appropriate ranking method based on the method parameter, selected when initialising the -Ranker() object.

    +Ranker() object.

    It delegates the execution to the corresponding method:

    See the documentation of those methods for more details about their processing if the provided mentions (queries).

    @@ -543,15 +540,15 @@

    t_res.geoparser

    -
    -train() None
    +
    +train() None

    Training a DeezyMatch model. The training will be skipped if the model already exists and the overwrite_training key in the deezy_parameters passed when initialising the -Ranker() object is set to False. The +Ranker() object is set to False. The training will be run on test mode if the do_test key in the deezy_parameters passed when initialising the -Ranker() object is set to True.

    +Ranker() object is set to True.

    Returns

    None.

    @@ -567,8 +564,8 @@

    t_res.geoparser