This is an algorithm for Arabic stemming written on Snowball framework language. If offers light stemming and text normalization. voc
-
You can download it automatically using:
$ make download
- Install python requirements
$ sudo pip install -r requirements.txt
or manually by:
- extracting snowball into the root folder
{Root}/snowball
- extracting snowball-data/arabic/voc.txt.gz into
{Root}/test_data/voc.txt
- light stemming
$ make build
- root-based stemming
$ make build_root_based_stemmer
- Light Stemmer
$ make run
الطالب
طالب
- Root-Based Stemmer
$ make run_root
الطالب
طلب
We configured tests to run against snowball-data arabic sample.
- time:
$ make time
- grouping effect:
$ make grouping
- all:
$ make test
- Test SAS with golden arabic corpus:
$ make test_arabicstemmer
- Test ISRI Stemmer with golden arabic corpus:
$ make test_isri
- dist light stemmer to available languages:
$ make dist
- dist root-based stemmer to available languages:
$ make dist_rooter
Snowball Arabic (Stemmer & rooter) Results
Word | Stem | root |
---|---|---|
طفل | طفل | طفل |
اطفال | اطفال | طفل |
الاطفال | اطفال | طفل |
اطفالكم | اطفال | طفل |
فأطفالكم | اطفال | طفل |
اطفالهم | اطفال | طفل |
والاطفال | اطفال | طفل |
فاطفالهم | اطفال | طفل |
وطفل | طفل | طفل |
الطفولة | طفول | طفل |
والطفلتين | طفل | طفل |
طفلتان | طفل | طفل |