Skip to content

Namba-ir/Sokhan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sokhan | Persian NLP Framework

Sokhan is a fast, lightweight, and optimized framework for natural language processing (NLP) in Persian that provides developers and researchers with advanced NLP capabilities with high accuracy and fast performance. With full support for the Persian language, this tool is suitable for personal, research, and commercial projects.

Video > Voice > Image > Last IT Text

Install Windows

pip install Sokhan

Install Linux/Mac

python3 -m pip install Sokhan

or

python3 -m pip install Sokhan

Take Easy, Just Do it :

from sokhan.core.normalize import Sokhan

Sokhan = Sokhan()
text = 'من اولین پیام شما هستم به دنیای سخن خوش آمدید !'

Sokhan(text)

CLI
-------------
'من اولین پیام شما هستم به دنیای سخن خوش آمدید !'

Important

Fast, Optimized, No Free, No Needed GPU And Hard Processing.

Lang :

  • Support Persian
  • Support Engilish

Feature :

  • Support CPU/GPU[No Need Requerment]
  • Super Fast Proccessing / Turbo Fast Proccessing
  • Config With .json Files
  • Base C Lang[For Speed UP]
  • API Web

Framework Modules :

  • Normalizer
  • Informal Normalize
  • Summerizer
  • Twitter Normalizer
  • Instagram Normalizer
  • Youtube Normilzer
  • Telegram Normilzer
  • Whatsapp Normalizer

Sokhan Vs AI In this test, the models are tested on graphics cards of at least H100, while the Sokhan library is being tested on the least hardware resources.

Lang/Module's Sokhan* GPTv4 DeepSeek Gemini Gama Grok v3 Shiraz
Count Keyword 0.00000001 No No No No No Yes

Sokhan Benchmark's language

Lang/Module's Sokhan* NLTK Fast Text SpaCy Regex Scikit-Learn Hazm
Support Persian Yes No No No No No Yes
Support Engilish Yes No No No No No no
Support Arabic Cooming... No No No No No no
Support Russian Cooming... No No No No No no
Support France Cooming... No No No No No no
Support Italy Cooming... No No No No No no
Support Spanol Cooming... No No No No No no

Speed Benhmark:

  • Cpu 2 Core | No GPU | Ram 8 Gig | Python 3.10
  • 10 Gig Dataset's Comment Ronaldo.
Feature's/Module's Sokhan* NLTK Fast Text SpaCy Regex Scikit-Learn Hazm
Normalize 0.00000016123 ------- ------- ------- ------- ------- -------
informal normalize 0.00000016123 ------- ------- ------- ------- ------- -------
Summrize 0.00000016123 ------- ------- ------- ------- ------- -------
Clean Text 0.00000016123 ------- ------- ------- ------- ------- -------

Dataset's Benchmark:

Dataset's/Module's Sokhan* NLTK Fast Text SpaCy Regex Scikit-Learn Hazm
Instagram Yes ------- ------- ------- ------- ------- -------
Twitter Yes ------- ------- ------- ------- ------- -------
Youtube Yes ------- ------- ------- ------- ------- -------
Telegram Yes ------- ------- ------- ------- ------- -------
Bale.io Yes ------- ------- ------- ------- ------- -------
Eeta Yes ------- ------- ------- ------- ------- -------
Tiktok Yes ------- ------- ------- ------- ------- -------
Robika Yes ------- ------- ------- ------- ------- -------
Hamshahri Yes ------- ------- ------- ------- ------- -------
Forbs Yes ------- ------- ------- ------- ------- -------

requerment's :

  • Python 3.10
  • Cython

Suitable for:

  • Researcher
  • Student's
  • Bussiness Productor
  • Startup Projects