Official repository for the Opensource Text dataset for NMT for local languages in West Africa (EWE Corpus) and implement the Yodi model afterward.
Note: This repository will evolve into the official repository for the Yodi model, once the necessary data is gathered.
• Develop a Machine Translation Text and Speech Dataset NMT for local languages in West Africa (EWE Corpus)
-> Develop & Measure the accuracy or the performance of the Yodi model from this dataset for text-to-text translation.
Remark: Getting accurate data and labeled data from available sources online or in local written papers would be necessary for machine sentence translation.
We have transformed and analyzed two Ewe-English dictionaries: KABDICT525 and EWEDICT995. These dictionaries are now available as Python modules for easy integration into your projects.
-
The transformed dictionaries are located in the
Dictionaries
folder:Dictionaries/kabdict525.json
Dictionaries/ewedict995.json
-
To use these dictionaries in your Python scripts, you can import them as follows:
import json
# Load KABDICT525
with open('Dictionaries/kabdict525.json', 'r', encoding='utf-8') as f:
kabdict = json.load(f)
# Load EWEDICT995
with open('Dictionaries/ewedict995.json', 'r', encoding='utf-8') as f:
ewedict = json.load(f)
# Example usage
print(kabdict.get('word', 'Word not found'))
print(ewedict.get('word', 'Word not found'))
Feel free to share your analytics in the discussions! Instructions are available in project_contributions_instructions.txt
Please register at https://sites.google.com/umbaji.org/yodi/home to build the largest NMT text Dataset for West Africa.