Skip to content
This repository has been archived by the owner on Jan 27, 2024. It is now read-only.

Transalte #6

Closed
Parkchanjun opened this issue Jan 20, 2020 · 3 comments
Closed

Transalte #6

Parkchanjun opened this issue Jan 20, 2020 · 3 comments

Comments

@Parkchanjun
Copy link

Is there any Example how to "Translate" Using APE Model?

@AlyShmahell
Copy link

I'm using

! python ./OpenNMT-APE/translate.py -model $(ls | grep -e 'ape-model_step' | tail -1) -src mt_test.txt -output ape.txt -replace_unk -verbose -gpu 0

But I'm seeing a lot of artifacts even as I reach 97% validation accuracy, the artifacts are spaces between word sections with each section starting with ##, an example would be:

This is an example

would become

Th ##is is a ##n ex ##am ##ple

@goncalomcorreia
Copy link
Collaborator

Here's a bash script to do translation:

# Generic things
SOURCE=en
TARGET=de
LANGPAIR=${SOURCE}-${TARGET}
DATA= #/path/to/your/data
MODELS= #/path/to/your/models
ONMT=~/OpenNMT-APE
MODEL_NAME= # trained_model_file_name

PRED_SUFFIX=pred

DATA_TYPE=dev
BATCH_SIZE=5

# Call the OpenNMT-py script
python ${ONMT}/translate.py \
        -model $model \
        -src ${DATA}/${DATA_TYPE}.srcmt \
        -tgt ${DATA}/${DATA_TYPE}.mt \
        -output ${DATA}/${DATA_TYPE}.${PRED_SUFFIX}.unprocessed \
        -beam_size 8 \
        -min_length 2 \
        -batch_size ${BATCH_SIZE} \
        -length_penalty avg \
        -gpu 0

cat ${DATA}/${DATA_TYPE}.${PRED_SUFFIX}.unprocessed | sed 's/ \#\#//g' > ${DATA}/${DATA_TYPE}.${PRED_SUFFIX}

~/mosesdecoder/scripts/generic/multi-bleu.perl ${DATA}/${DATA_TYPE}.pe < ${DATA}/${DATA_TYPE}.${PRED_SUFFIX} | head -n 1

@goncalomcorreia
Copy link
Collaborator

@AlyShmahell ## is the WordPiece symbol. In the script above there's a line to remove them in post-processing after translation. Here it is again:

cat ${DATA}/${DATA_TYPE}.${PRED_SUFFIX}.unprocessed | sed 's/ \#\#//g' > ${DATA}/${DATA_TYPE}.${PRED_SUFFIX}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants