TextRactor: Dosya İçerik Birleştirici [English Below]

TextRactor, farklı formatlardaki dosyaların metin içeriklerini tek bir dosyada toplayan Python tabanlı bir araçtır. Belirtilen dosya veya klasördeki desteklenen tüm dosyaların içeriğini okuyarak birleştirir.

Özellikler

Çoklu dosya formatı desteği:
- Office dosyaları (DOCX, XLSX, PPTX)
- OpenDocument formatları (ODT, ODS, ODP)
- PDF dosyaları
- Metin tabanlı dosyalar (TXT, CSV, LOG, MD)
- Programlama dilleri kaynak kodları
- Yapılandırma dosyaları
Gelişmiş dosya işleme özellikleri:
- Otomatik karakter kodlaması tespiti
- Binary dosya kontrolü
- Maksimum dosya boyutu sınırlaması
- MIME type tespiti
- Hata yönetimi ve loglama
- Özelleştirilebilir çıktı dosyası

Kurulum

pip install python-docx openpyxl python-pptx xlrd odfpy pdfminer.six chardet

Kullanım

from processor.file_processor import FileProcessor

islemci = FileProcessor(
    max_file_size_mb=5.0,  # Maksimum dosya boyutu (MB)
    output_file="icerik.txt"  # Çıktı dosyası adı
)

islemci.process("dosya/veya/klasor/yolu")

Komut satırından çalıştırma:

python src/main.py

Desteklenen Dosya Türleri

Office: .docx, .xlsx, .xls, .pptx
OpenDocument: .odt, .ods, .odp
PDF: .pdf
Metin: .txt, .csv, .log, .md
Kod: .py, .java, .cpp, .js vb.
Web: .html, .css, .json, .xml
Yapılandırma: .ini, .conf, .cfg

Katkıda Bulunma

Bu projeyi fork edin
Feature branch'inizi oluşturun (git checkout -b feature/YeniOzellik)
Değişikliklerinizi commit edin (git commit -m 'Yeni özellik eklendi')
Branch'inizi push edin (git push origin feature/YeniOzellik)
Bir Pull Request oluşturun

TextRactor - File Content Merger

TextRactor is a Python-based tool that extracts and combines content from various file formats into a single file. It processes all supported files in a specified file or directory.

Features

Multi-format file support:
- Office files (DOCX, XLSX, PPTX)
- OpenDocument formats (ODT, ODS, ODP)
- PDF files
- Text-based files (TXT, CSV, LOG, MD)
- Programming language source codes
- Configuration files
Advanced file processing features:
- Automatic character encoding detection
- Binary file validation
- Maximum file size limitation
- MIME type detection
- Error handling and logging
- Customizable output file

Installation

pip install python-docx openpyxl python-pptx xlrd odfpy pdfminer.six chardet

Usage

from processor.file_processor import FileProcessor

processor = FileProcessor(
    max_file_size_mb=5.0,  # Maximum file size (MB)
    output_file="content.txt"  # Output file name
)

processor.process("path/to/file/or/directory")

Running from command line:

python src/main.py

Supported File Types

Office: .docx, .xlsx, .xls, .pptx
OpenDocument: .odt, .ods, .odp
PDF: .pdf
Text: .txt, .csv, .log, .md
Code: .py, .java, .cpp, .js etc.
Web: .html, .css, .json, .xml
Configuration: .ini, .conf, .cfg

Contributing

Fork the project
Create your feature branch (git checkout -b feature/NewFeature)
Commit your changes (git commit -m 'Added new feature')
Push to the branch (git push origin feature/NewFeature)
Open a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
LICENSE		LICENSE
README.md		README.md
content.txt		content.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TextRactor: Dosya İçerik Birleştirici [English Below]

Özellikler

Kurulum

Kullanım

Desteklenen Dosya Türleri

Katkıda Bulunma

TextRactor - File Content Merger

Features

Installation

Usage

Supported File Types

Contributing

About

Releases

Packages

Languages

License

isikmuhamm/textractor

Folders and files

Latest commit

History

Repository files navigation

TextRactor: Dosya İçerik Birleştirici [English Below]

Özellikler

Kurulum

Kullanım

Desteklenen Dosya Türleri

Katkıda Bulunma

TextRactor - File Content Merger

Features

Installation

Usage

Supported File Types

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages