DocSort

This little side-project is called DocSort, a Flet-based application streamlining document digitalization and organization. Key features:

Automatic document scanning with corner detection
OCR (Optical Character Recognition) for text extraction
Smart categorization into structured folders
Google Drive integration
Natural language search capabilities through RAG (Retrieval-Augmented Generation)
Customizable document categories
All ML models are running offline so no data is published (except over Google Drive API)

Perfect for managing personal or business documents while maintaining searchable digital records.

Currently supported languages: English, German

Installation

Create and activate the conda environment:

conda env create -f environment.yaml
conda activate docsort

Setup Offline Models:

python3 scripts/setup_models

Setup Google Drive API Project:
- Visit the Google Cloud Console
- Create a new project or select an existing one
- Enable the Google Drive API for your project
- Navigate to Credentials
- Create an OAuth 2.0 Client ID (select Desktop application)
- Download the JSON file
- Rename it to client_secrets.json
- Place the file in your project root directory
Important: Google sets the project into testing phase initially. This means that users that should be able to use the App using the Google Drive API have to be manually added as test users.
Start the application:

flet run

Roadmap

✅ Completed Features

Corner detection on document images
Image quality enhancement filters
OCR text extraction
Automated file categorization
Company detection
Save to Drive in smart folder structure
Setup page
Offline Inference no extra API's (except Google Drive access)
Vector-based search functionality
Allow manual upload and sync search service
Folder explorer

🚧 In Development

Document Scanning
- Camera integration
- Multi-page document support
Smart Organization
- Suggest new category if "Other" selected
Multi-platform support
- Build application on Android and iOS

Known Issues

File upload on web returns None for file path (seems to be an issue with Flet)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
scripts		scripts
src		src
storage/data		storage/data
.gitignore		.gitignore
DocSortDemo.gif		DocSortDemo.gif
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
settings.yaml		settings.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocSort

Installation

Roadmap

✅ Completed Features

🚧 In Development

Known Issues

About

Releases

Packages

Languages

Alexanderstaehle/DocSort

Folders and files

Latest commit

History

Repository files navigation

DocSort

Installation

Roadmap

✅ Completed Features

🚧 In Development

Known Issues

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages