Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xgettext: Avoid collisions when splitting .pot files #243

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

cip999
Copy link
Contributor

@cip999 cip999 commented Dec 13, 2024

Hey, I've been looking into #185 and its predecessor #67. While thinking of ways to implement the "reconstruction" of the catalog from smaller pieces, I noticed that the code doesn't handle well books where several part titles and/or chapters have the same name (or, more precisely, are equivalent up to slugging).

For example, consider the following SUMMARY.md:

# Summary

# Part Foo

[Chapter 1](src/chapter-1.md)

# Part Bar

[Chapter 2](src/chapter-2.md)

# Part Foo

[Chapter 3](src/chapter-3.md)

With a depth of 1, the preprocessor generates two POT files foo.pot and bar.pot (beside summary.pot), and the contents of the two Part Foo's (including Chapters 1 and 3) are merged. Since those are unrelated, if not for having the same title, the expected behavior would be to put them in separate files. Note that the titles don't have to be exactly equal, for example the same would happen with Part Foo 1 and Part Foo 2 (since numbers are discarded).

This PR should fix this issue. I hope doc comments are explicative enough, otherwise feel free to request clarifications. :)

Copy link

google-cla bot commented Dec 13, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@cip999
Copy link
Contributor Author

cip999 commented Dec 13, 2024

Ah, the PR is somwhat large so let me know if you want me to split it into 2-3 commits.

@codecov-commenter
Copy link

codecov-commenter commented Dec 13, 2024

Codecov Report

Attention: Patch coverage is 97.61905% with 5 lines in your changes missing coverage. Please review.

Project coverage is 85.20%. Comparing base (b5510c9) to head (6eb5ce8).

Files with missing lines Patch % Lines
i18n-helpers/src/xgettext.rs 97.60% 1 Missing and 2 partials ⚠️
i18n-helpers/src/xgettext/paths.rs 97.64% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #243      +/-   ##
==========================================
+ Coverage   84.87%   85.20%   +0.33%     
==========================================
  Files          15       16       +1     
  Lines        3366     3455      +89     
  Branches     3366     3455      +89     
==========================================
+ Hits         2857     2944      +87     
  Misses        422      422              
- Partials       87       89       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

PR google#77 added the first half of a functionality requested in google#67, i.e. being
able to split the catalog across a tree of POT files. However, there are
edge cases where the generated files can have identical path relative to
the po/ directory, with the effect of collapsing messages for unrelated
parts or chapters in the same POT fie.

This commit handles deduplication through a "black box" struct
`UniquePathBuilder`, and adds a doctest and a unit test for collision
scenarios. The new version is backward-compatible for cases that
are collision-free in the old version.

Contextually, simplify the recursion logic in `get_subcontent_for_chapter`.
catalogs.insert(summary_destination, catalog);
let mut unique_path_builder = UniquePathBuilder::new(depth);
unique_path_builder.maybe_push("summary");
unique_path_builder.maybe_push("summary");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a duplicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually intended, to keep the previous behavior: for depth 2, the summary is stored within summary/summary.pot (so the first push is for the directory, the second for the file name). For depth 1 instead this results in a top-level summary.pot.

None | Some(&0) => self.current_path.clone(),
Some(&cnt) => {
let mut file_name = self.current_path.file_name().unwrap().to_os_string();
file_name.push(format!("_{}", cnt));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer - over _ in filenames, where possible. So could we do it like this instead:

Suggested change
file_name.push(format!("_{}", cnt));
file_name.push(format!("-{cnt}"));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, changed!

use std::collections::HashMap;
use std::path::PathBuf;

pub struct UniquePathBuilder {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this idea, very elegant!

Two small things: could the maybe_push and maybe_pop methods be called just push and pop? I don't think why the maybe_ prefix is necessary since the push/pop calls have a real effect on the internal state, even if it might not be visible in the resulting string right away.

Another thing, could you attach the big doc string on the impl block below to the struct here? That way the struct is nicely document.

A final small thing: please move the inline paths module to it's own file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All done, thanks for the suggestions. 🙂

@cip999 cip999 requested a review from mgeisler January 14, 2025 20:11
@kdarkhan
Copy link
Collaborator

@cip999, could you rebase this PR now? There are some conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants