Deep Search Examples

In this repository we showcase some common usage of Deep Search for Document conversion as well as Data and Knowledge exploration.

Quick links

Deep Search Toolkit
Documentation

Examples

Setup and usage

Each example starts by defining its input parameters. This is supported by Pydantic Settings, allowing automated loading from a .env file or env vars. Furthermore, access is based on Profiles. Unless otherwise configured, the profile used is the active one.

Document conversion

	Name	Description
1.	Convert documents quick start	Full example on programmatic document conversion
2.	Convert documents with custom settings	Full example on programmatic document conversion with custom conversion settings
3.	Visualize bounding boxes	Visualize the bbox of the text elements
4.	Extract figures from documents	Given a PDF file, extract the figures
5.	Extract tables	Given a PDF file, extract the tables

NLP on Documents

	Name	Description
1.	NLP on documents*	A few quick examples on how to apply NLP models on documents (eg extracting key-terms)
2.	Reference Parsing*	Examples on how to parse references from Documents
3.	Material Extraction*	Examples on how to extract materials from Documents

Data queries

This section will showcase examples which query data processed via Deep Search.

	Name	Description
1.	Data query quick start	Example listing data collections, making search in one and more document collections, using source for projection
2.	Chemistry search queries	Search the chemistry databases for known molecules
3.	Chemistry and patent searches via PatCID	Explore the chemistry databases using substructure and similarity searches and navigate to the world-wide patents which reference molecules
4.	Snippets and aggregations in data queries	Extract snippets in search queries and leverage aggregations for exploratory analysis

Semantic queries & RAG

This section will showcase examples of semantic capabilitilies in the area of Q&A using RAG.

	Name	Description
1.	QA quick start	Get started with semantic ingestion, RAG, and retrieval.
2.	QA deep dive	Explore advanced RAG and semantic retrieval capabilities.

Bring your own

This section will showcase examples for bringing your own documents, csv data, nlp models and more.

	Name	Description
1.	Bring your own PDF	Upload your own PDF documents, search on them and export the result as JSON files.
2.	Bring your own converted documents	Upload your documents already formatted as JSON.
3.	Bring your own DataFrame	Bring your own DataFrame from CSV, XLSX, etc and explore the content in a knowledge graph

Attachments and metadata

This section will showcase examples for managing index item attachments and metadata.

	Name	Description
1.	Manage attachments	Manage index item attachments

Knowledge graphs

This section will showcase examples related to the use of knowledge graphs (KGs) in Deep Search.

	Name	Description
1.	Using Deep Search KGs with PyTorch Geometric	Download knowledge graphs from Deep Search and import them in PyTorch Geometric.

Integrations

This section will showcase examples related to the integration of Deep Search with other tools and utilities.

	Name	Description
1.	Annotations on argilla.io	Use argilla.io for annotating the content of documents.

Example dependencies

The examples contained in this catalog depend on the deepsearch-toolkit as well as other modules needed for the showcase demonstrated (e.g. pandas, matplotlib, rdkit, etc). Please refer to the poetry pyproject.toml or requirements.txt for a complete list.

Python dependencies are installed with

pip install -r requirements.txt

Additionally, some examples rely on system packages. When this is the case, the README of the individual example will contain more details on which package is required. The auxiliary file apt.txt list all such packages for a Debian-bases OS. They can be installed with

xargs sudo apt-get install < apt.txt

Windows compatibility

Note that some examples require dependencies that are not available on Windows platform. We flagged those examples with an asterisk * in the index above.

License

The Deep Search Toolkit codebase is under MIT license. For individual model usage, please refer to the model licenses found in the original packages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Deep Search Examples

Quick links

Examples

Setup and usage

Document conversion

NLP on Documents

Data queries

Semantic queries & RAG

Bring your own

Attachments and metadata

Knowledge graphs

Integrations

Example dependencies

Windows compatibility

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Deep Search Examples

Quick links

Examples

Setup and usage

Document conversion

NLP on Documents

Data queries

Semantic queries & RAG

Bring your own

Attachments and metadata

Knowledge graphs

Integrations

Example dependencies

Windows compatibility

License