YoungNMT is a young but low coupling, flexible and scalable neural machine translation system. The system is designed for researchers and developers to realize their ideas quickly without changing the original system.
2020.10.10
Version 0.1.0 has some bugs (but these bugs do not affect normal use of YoungNMT):
- loading exception of user define hocon files;
- logging exception of BLEU scorer.
- Full Documentation
- Requirements
- Installation
- Arguments
- Quickstart
- Models and Configurations
- Citation
Required
It's better to configure and install the following dependency packages by the user:
The following dependency packages will be installed automatically during system installation. If there are errors, please configure them manually.
Optional
- NCCL is used to train models on NVIDIA GPU.
- apex is used to train models with mixed precision.
- pynvml is used to manage and monitor NVIDIA GPU.
Three different installation methods are shown bellow:
- Install
YoungNMT
from PyPI:
pip install YoungNMT
- Install
YoungNMT
from sources:
git clone https://github.com/Jason-Young-AI/YoungNMT.git
cd YoungNMT
python setup.py install
- Develop
YoungNMT
locally:
git clone https://github.com/Jason-Young-AI/YoungNMT.git
cd YoungNMT
python setup.py build develop
In YoungNMT, we built a module, which is a encapsulation of pyhocon,
that parses files which are wrote in a HOCON style to obtain arguments of system.
HOCON (Human-Optimized Config Object Notation) is a superset of JSON.
So YoungNMT can load arguments from *.json
or pure HOCON files.
After installation, the commonds ynmt-preprocess
, ynmt-train
and ynmt-test
can be excuted directly and system arguments will be loaded from default HOCON files.
Save Arguments
ynmt-preprocess -s {path to save args} -t {json|yaml|properties|hocon}
Load Arguments
ynmt-preprocess -l {user's config file}
See Full Documentation for more details.
Here is an example of the WMT16 English to Romania experiment.
Step 0. preliminaries
- Download English-Romania corpora directory from OneDrive;
- Download English-Romania configuration file from YoungNMT-configs
Step 1. Dataset preparation
unzip -d Corpora English-Romania.zip
mkdir Datasets
ynmt-preprocess -l wmt16_en-ro_config/main.hocon
Step 2. Train the model on 4 GPU
mkdir -p Checkpoints/WMT16_En-Ro
CUDA_VISIBLE_DEVICES=0,1,2,3 ynmt-train -l wmt16_en-ro_config/main.hocon
Step 3. Test the model using 1 GPU
mkdir -p Outputs/WMT16_En-Ro
CUDA_VISIBLE_DEVICES=0 ynmt-test -l wmt16_en-ro_config/main.hocon
We provide pre-trained models and its configurations for several tasks. Please refer to YoungNMT-configs.