Name		Name	Last commit message	Last commit date
parent directory ..
data		data
mbart50-predictions		mbart50-predictions
README.md		README.md
fairseq.sh		fairseq.sh
setup_bart50.sh		setup_bart50.sh
train_bart50.sh		train_bart50.sh

README.md

Machine Translation

This folder contains information and some scripts about the trained models for Machine translation task.

Dataset

Download and extract data from the v1.0 release of mukayese-datasets.

Dataset Format

All of the translated sentences in the dataset are formatted as follows:

{
    "sentence" : "I'm going to talk today about energy and climate.",
    "translation" : "Bugun enerji ve iklim hakkında konuşacağım."
}

Baseline Models

For Convolutional Sequence to Sequence Learning (Fairseq) and Transformer, first prepare the datasets using the provided preparation scripts, then run your experiments from fairseq.sh. For mBART50 you can use experiments in train_bart50.sh.

For example, to run the baseline model for the English-Turkish translation on WMT16, run the following command:

CUDA_VISIBLE_DEVICES=0 fairseq-train \
    data-bin/wmt16-en-tr \
    --arch transformer_iwslt_de_en --share-decoder-input-output-embed \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
    --dropout 0.3 --weight-decay 0.0001 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --max-tokens 4096 \
    --eval-bleu \
    --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \
    --eval-bleu-detok moses \
    --eval-bleu-remove-bpe \
    --eval-bleu-print-samples \
    --best-checkpoint-metric bleu --maximize-best-checkpoint-metric

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

machine-translation

machine-translation

README.md

Machine Translation

Dataset

Dataset Format

Baseline Models

Files

machine-translation

Directory actions

More options

Directory actions

More options

Latest commit

History

machine-translation

Folders and files

parent directory

README.md

Machine Translation

Dataset

Dataset Format

Baseline Models