overlaps without test files? #14

theamato · 2023-01-02T19:23:00Z

Pherhaps silly quastion, but in the demo, it seems like you create the files with the overlapping sentences with the dev and the test files. In my case, I just have a parallel corpus of a few Arabic and English texts that I want to align, and I don't have any dev or test files due to the small size of the corpus. Do I need to have this to align the files, or is there some way to get around it?

thompsonb · 2023-01-02T20:07:53Z

You only need dev/dest files if you want to measure sentence alignment performance (e.g. F1). In your case you should just need to run sentence segmentation on your data to get one sentence per line, then compute and embed the overlaps that vecalign needs, then run vecalign.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overlaps without test files? #14

overlaps without test files? #14

theamato commented Jan 2, 2023

thompsonb commented Jan 2, 2023

overlaps without test files? #14

overlaps without test files? #14

Comments

theamato commented Jan 2, 2023

thompsonb commented Jan 2, 2023