The EN-AR Translator leverages the Transformer architecture to provide English to Arabic translations. The project is built upon Django and makes efficient use of Tensorflow and Numpy for its operations.
For this project to function correctly, ensure the following are installed:
- Django
- Tensorflow
- Numpy
-
Clone the repository:
git clone https://github.com/ALLIA12/EN-AR-Translator.git
-
Navigate to the project directory:
cd EN-AR-Translator
-
Clean the data:
Before training, the dataset needs cleaning. Run the
cleanData
notebook to preprocess the dataset:jupyter notebook cleanData.ipynb
Note: The dataset file needs to be named CCMatrix v1- EN to AR Dataset.tmx
-
Training the model:
Once the data is clean, use the
EN-AR
notebook to train the model:jupyter notebook En-AR.ipynb
Note: if you change the model size or parmeters, make sure to update the application.py file accordingly
After training the model:
-
Navigate to the project's root directory.
-
Start the Django server:
python manage.py runserver
-
Access the application via your preferred web browser at:
http://127.0.0.1:8000/
The heart of this translator is the Transformer architecture, known for its efficiency in handling sequence-based tasks. [1]
The model underwent training in two phases using the CCMatrix dataset for English to Arabic translation [2]:
Training accuracy and loss metrics for the first 30 epochs:
For fine-tuning, the model was trained for an additional 15 epochs. The accuracy and loss during this phase: