-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate possible issue with text alignment for deuterocanonical books and unusual versifications #478
Comments
The issue identified was that the mapping between DAN and SUS was incorrect because of a bug in the preprocess code. To really verify that all books work, there are a few main points where this could mess up:
Where should we make tests for this?
Then, use ParallelCorpus level testing to verify the following:
@Enkidu93, what do you think? |
I'd be curious for Damien's thoughts, but I wouldn't worry to much about incorrect .vrs files. As the saying goes, 'garbage in garbage out'. I don't think we can take responsibility for this, and undefined behavior is appropriate behavior when bad data is provided. This is something that's skirting the edge of the purview of Serval/Machine. If we do anything at all about this, I'd suggest just a quick validation of the .vrs file (if that's possible) and throw a meaningful error if it's just plain invalid. We simply can't account for all the possible bad .vrs files out there, and furthermore, I'm not sure how often this is an issue: Seems like Paratext should have an automatic check for this or something (?).
What exactly was the bug?
This should be less of an issue, right, since we're not doing aligning here so much as just (potentially) changing versification? Where are you imagining an error like this would creep in?
Yes, we should do this - at least for preprocessing. I'm not sure a test exists that specifically covers this.
Like I said above, I don't know if these are things we really can/should test.
I think these are all in scope, and hopefully, Mudi has covered some of these in his tests so far (?).
Yep, I agree. |
Regarding incorrect vrs files, there might be some simpler cases that we could handle automatically. For example, if the USFM contains a verse that isn't specified in the vrs file, we could automatically add it. |
I have written tests for the following:
I intend to write the following tests:
Mapping between chapters work I also intend to verify that there are tests that check if the target references are used for the preprocess and USFM update. If there are none, I write the tests |
@ddaspit - how would we add the verse if it isn't in the VRS file? Would we just assume there is no mapping? What if there is mapping? Should we follow some logic like "if there is no mapping used in this specific book then add the verse"? Also, I am assuming in general we want to be graceful and forgiving for bad VRS and USFM files. There will be other places in Serval that will analyze these issues and tell the user to fix up their stuff, but I believe that this would be a bad place to do it - I believe people are expecting it to to a "best guess" job and not crash. |
The versification defines the number of chapters for each book and the number of verses for each chapter. If the verse number for a particular chapter in the USFM exceeds the number of verses for that chapter in the versification, then we could update the number of verses in the versification. |
And, I am assuming, extend any mapping from one book/chapter to another. That sounds feasible. |
@johnml1135 I'm not sure what you mean by |
vrs files contains:
What if there are extra verses NAM 1:15, NAM 1:16, NAM 2:14 and NAM 2:16. How should they be handled? |
See here for more information: #471 (and here - #442 (comment))
Let's add some tests to verify that we are handling ALL of the deuterocanonical books properly (same amount of testing as for the standard English versification books.
The text was updated successfully, but these errors were encountered: