-
-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
po2json: Make sure that --fuzzy and --removeuntranslated can be used together. #4257
base: master
Are you sure you want to change the base?
po2json: Make sure that --fuzzy and --removeuntranslated can be used together. #4257
Conversation
I made a mistake, will fix :) |
Not sure how to fix/handle/interpret the flake8 test that is failing. Also, would love to add a unittest for these changes, but I haven't figured out yet how to run them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Your changes break several
po2json
tests, can you please look at it? - The flake8 issues can be easier addressed by running black code formatter, see http://docs.translatehouse.org/projects/translate-toolkit/en/latest/developers/styleguide.html
- Testing is documented at http://docs.translatehouse.org/projects/translate-toolkit/en/latest/developers/testing.html
@nijel However, I don't think the test cases reflect the expected behaviour, so I changed them too. This might be radical so allow me to explain the logic used, given that we start with po input:
And json file:
I adapted the following tests: include_fuzzy=false and remove_untranslated=falseThis was the expected output:
As both bar and baz are fuzzy, they should be removed if include_fuzzy is false:
include_fuzzy=true and remove_untranslated=trueThis was the expected output:
As bar is fuzzy and translated, imo it should also be included:
In closing..These changes seem logical to me, however doing a search in the issue queue I see they have caused confusion to others in the past, some examples: Outside of the scope of this change however, I also think the code could be simplified a lot by adding an "allow_fuzzy=False" parameter to the global is_translated function, or even better just removing the fuzzy check from that function all together (because it's an assumption that is only clear when you read the source code), "not translated" and "fuzzy" mean two completely different things to me, but then again there may be conceptual or historical reasons behind all this that I'm completely missing. |
This depends on how you interpret "is translated". Right now it means "the translation is completed", which definitely should exclude fuzzy (and other needing review states, as in xliff). I don't think it's reasonable to change the semantics as that could easily silently break third-party code. So, the unit is either untranslated, fuzzy or translated (or approved, but that's out of scope here). |
There's not really any indication that it means that a translation is completed, the comments in code say:
As for the semantics, the GNU documentation does more or less say a fuzzy entry is a translated entry, for must purposes, only from the viewpoint of the translator one that needs additional review. In any case, the changes in my PR are made with the idea in mind that a fuzzy translation is a translation. Any comments on that? |
I have no clue about the original intents behind this behavior, I'm just describing how it behaves. Most of the things in the translate-toolkit are modelled around XLIFF, so it might be also the reason for this behavior. Another issue with
The question is whether they should be removed (what you implement) or source used instead (what is current implementation).
|
I understand. So how to continue from here? In my use case, I need to make sure no source strings end up in the localized json, as the strings are used with speech synthesis and this will make for example an English word to be spoken with a French voice. Not very good ;) If I don't include the source string, the application will automatically fall back to the English word and speak it with an English voice. That's fine. I'd be happy to perform additional work on this PR if needed, I think my use-case is not that uncommon so a solution would benefit others, but I'm going to need some guidance on the approach to take. Perhaps additional parameters should be added? |
In the end the "correct" behavior for the first case depends on usage of the resulting file - whether all keys should be present there or not. And I think For fuzzy inclusion, I think the behavior you've made makes more sense - when you explicitly specify Don't get me wrong, I just don't want to merge change which will end up breaking existing workflows and this easily can do that. The perfect solution would be to have a single logic used by all the convertors (see #3573). |
Does any of the other converters use some sort of parameter to disable the inclusion of source strings? If so I can add that as well (either here or in another PR) |
I think |
Currently when you combine the options --fuzzy and --removeuntranslated together, the fuzzy translations are also removed.
Both of these options work fine on their own, the problem lies in the check itself, that currently ignores the fuzzy parameter and will skip any fuzzy strings when --removeuntranslated is passed on.
The following PR will improve the logic in the "skip" check, to make sure that:
or