GitHub

Overview

A Python-based command-line tool that extracts structured information about characters from story text files using embeddings and natural language processing.

Features

Character Information Extraction:
- Character name
- Story title
- Character summary
- Character relationships
- Character type/role
Batch Processing: Handle multiple story files at once
Smart Caching: Store and reuse computed embeddings
Fast Search: Efficient similarity search using FAISS

Prerequisites

Python 3.8+
pip package manager
(Optional) GPU for faster processing

Installation

Clone the repository:

git clone https://github.com/avdhoottt/LangChainAssignment.git
cd LangChainAssignment

Install dependencies:

pip install -r requirements.txt

Usage

Computing Embeddings

Process your story files and compute embeddings:

python story_processor.py compute-embeddings story1.txt story2.txt story3.txt

Getting Character Information

Retrieve information about a specific character:

python story_processor.py get-character-info Marya Vassilyevna

Example Output

{
  "name": "Marya Vassilyevna",
  "storyTitle": "The Schoolmistress",
  "summary": "A dedicated schoolteacher who has been working for thirteen years, dealing with the challenges of rural education and a lonely life.",
  "relations": [
    {
      "name": "Hanov",
      "relation": "Acquaintance who she encounters on her journey"
    },
    {
      "name": "Semyon",
      "relation": "Her driver who guides her through difficult weather"
    }
  ],
  "characterType": "Protagonist"
}

Technical Components

Sentence Transformers: For text embeddings
FAISS: For efficient similarity search
LangChain: For text splitting and document handling
Click: For CLI interface

Error Handling

The tool includes robust error handling for:

Missing files
Character not found
Invalid file formats
Memory issues
Processing errors

Requirements File

transformers==4.36.2
click>=8.0.0
numpy>=1.24.0
torch
faiss-cpu>=1.7.4
huggingface-hub>=0.19.0
sentence-transformers>=2.2.0
langchain>=0.1.0
tokenizers
safetensors
packaging
filelock
regex
tqdm
requests
pyyaml
typing-extensions>=4.0.0

Author

Avdhoot Fulsundar (@avdhoottt)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
a-mother.txt		a-mother.txt
requirements.txt		requirements.txt
sorrow.txt		sorrow.txt
story_processor.py		story_processor.py
the-lantern-keepers.txt		the-lantern-keepers.txt
the-poor-relations-story.txt		the-poor-relations-story.txt
the-schoolmistress.txt		the-schoolmistress.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Features

Prerequisites

Installation

Usage

Computing Embeddings

Getting Character Information

Example Output

Technical Components

Error Handling

Requirements File

Author

About

Releases

Packages

Languages

avdhoottt/LangChainAssignment

Folders and files

Latest commit

History

Repository files navigation

Overview

Features

Prerequisites

Installation

Usage

Computing Embeddings

Getting Character Information

Example Output

Technical Components

Error Handling

Requirements File

Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages