Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produced zipapp is not reproducible due to _bootstrap file ordering #265

Open
timothyg-stripe opened this issue Nov 18, 2024 · 0 comments · May be fixed by #269
Open

Produced zipapp is not reproducible due to _bootstrap file ordering #265

timothyg-stripe opened this issue Nov 18, 2024 · 0 comments · May be fixed by #269

Comments

@timothyg-stripe
Copy link

👋 We are using shiv in reproducible mode, but we found that the output is not idempotent due to this block of code that adds the _bootstrap directory:

shiv/src/shiv/builder.py

Lines 165 to 177 in a353d10

bootstrap_target = Path("_bootstrap")
for path, name in iter_package_files(bootstrap):
data = path.read_bytes()
write_to_zipapp(
archive,
str(bootstrap_target / name),
data,
zipinfo_datetime,
compression,
stat=path.stat(),
)

In particular, iter_package_files(bootstrap) (which eventually calls Path.iterdir()) is not guaranteed to be in any particular order.

lilaliu-stripe added a commit to lilaliu-stripe/shiv that referenced this issue Jan 15, 2025
### Summary
Fixed non-deterministic file ordering in the `iter_package_files` function by adding explicit sorting of package files. This change ensures consistent behavior across builds and prevents issues caused by unordered file iteration.

### Changes Made
- Applied `sorted()` to file listings in both Python versions (<3.11 and >=3.11).
- Guaranteed that files are processed in a consistent, lexicographic order.

### Why This Fix?
- Prevents non-deterministic builds caused by unordered file iteration.
- Improves build reproducibility and enables better caching in CI/CD pipelines.
- Simplifies code by enforcing consistent ordering within the function, reducing the need for manual sorting in callers.

### Testing
- Rebuilt the project multiple times and verified that output files are now consistent across builds.
- Compared generated artifacts (e.g., `.json` files) to confirm deterministic ordering.

### Impact
- No functional changes to how files are processed, only improves determinism.
- Safer and more reliable builds, especially in CI environments.

Closes linkedin#265
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant