Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overlaps without test files? #14

Open
theamato opened this issue Jan 2, 2023 · 1 comment
Open

overlaps without test files? #14

theamato opened this issue Jan 2, 2023 · 1 comment

Comments

@theamato
Copy link

theamato commented Jan 2, 2023

Pherhaps silly quastion, but in the demo, it seems like you create the files with the overlapping sentences with the dev and the test files. In my case, I just have a parallel corpus of a few Arabic and English texts that I want to align, and I don't have any dev or test files due to the small size of the corpus. Do I need to have this to align the files, or is there some way to get around it?

@thompsonb
Copy link
Owner

You only need dev/dest files if you want to measure sentence alignment performance (e.g. F1). In your case you should just need to run sentence segmentation on your data to get one sentence per line, then compute and embed the overlaps that vecalign needs, then run vecalign.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants