Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 763: Limiting deletions on PyPI #4080

Merged
merged 36 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
6176ded
drafting deletion PEP
woodruffw Oct 8, 2024
7db6359
draft: add Alexis as author
woodruffw Oct 16, 2024
ba5490a
Update download count
DarkaMaul Oct 17, 2024
27ab348
Update peps/pep-9999.rst
woodruffw Oct 17, 2024
15db84b
Update peps/pep-9999.rst
woodruffw Oct 17, 2024
4190787
Update peps/pep-9999.rst
woodruffw Oct 17, 2024
54af5b7
cite Catch-22
woodruffw Oct 17, 2024
a817cbb
use a more Python-specific stat
woodruffw Oct 17, 2024
1b1b646
Apply suggestions from code review
woodruffw Oct 17, 2024
ebb5456
codecov example
woodruffw Oct 17, 2024
600fa37
Merge branch 'ww/deletion-pep' of github.com:trail-of-forks/peps into…
DarkaMaul Oct 17, 2024
df34ddf
add pre-release exception
woodruffw Oct 17, 2024
e994629
fix grammar
woodruffw Oct 17, 2024
f2e3f21
Revert collapsing changes
DarkaMaul Oct 17, 2024
9f3a2d2
small tweaks
woodruffw Oct 24, 2024
0f72405
add implementation deets
woodruffw Oct 24, 2024
5bed2ce
Update pep-9999.rst
woodruffw Oct 24, 2024
689bfa5
PEP: rename to PEP 763
woodruffw Oct 24, 2024
ecccdc0
CODEOWNERS: record PEP 763
woodruffw Oct 24, 2024
f7a9e74
PEP 763: fix backticks
woodruffw Oct 24, 2024
2272ebb
Apply suggestions from code review
woodruffw Oct 25, 2024
afbc32d
Merge branch 'main' into ww/deletion-pep
woodruffw Oct 25, 2024
1ebea39
Apply suggestions from code review
woodruffw Oct 25, 2024
46e6b23
PEP 763: fix blockquote
woodruffw Oct 25, 2024
5e28051
CODEOWNERS: reorder
woodruffw Oct 28, 2024
79115ed
PEP 763: cleanup, remove licensing bits
woodruffw Oct 28, 2024
a85925c
PEP 763: use release consistently
woodruffw Oct 28, 2024
aad73dd
PEP 763: subsection
woodruffw Oct 28, 2024
d451f3f
Apply suggestions from code review
woodruffw Oct 28, 2024
962c6f4
Update peps/pep-0763.rst
woodruffw Oct 28, 2024
449a9d9
PEP 763: remove redundant justification
woodruffw Oct 28, 2024
d7204a7
PEP 763: positive security
woodruffw Oct 28, 2024
ed4733a
PEP 763: add banner
woodruffw Oct 28, 2024
250c9c3
PEP 763: elaborate on security
woodruffw Oct 28, 2024
a800aaa
Merge branch 'main' into ww/deletion-pep
woodruffw Oct 28, 2024
1f3dbcc
Update peps/pep-0763.rst
woodruffw Oct 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -638,6 +638,8 @@ peps/pep-0757.rst @vstinner
peps/pep-0758.rst @pablogsal @brettcannon
peps/pep-0759.rst @warsaw
# ...
peps/pep-0763.rst @dstufft
hugovk marked this conversation as resolved.
Show resolved Hide resolved
# ...
peps/pep-0789.rst @njsmith
# ...
peps/pep-0801.rst @warsaw
Expand Down
275 changes: 275 additions & 0 deletions peps/pep-0763.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,275 @@
PEP: 763
Title: Limiting deletions on PyPI
Author: William Woodruff <[email protected]>,
Alexis Challande <[email protected]>
Sponsor: Donald Stufft <[email protected]>
PEP-Delegate: Donald Stufft <[email protected]>
Discussions-To: TODO
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
Status: Draft
Type: Standards Track
Topic: Packaging
Created: 07-Oct-2024
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
Post-History: `09-Jul-2022 <https://discuss.python.org/t/stop-allowing-deleting-things-from-pypi/17227>`__,
`01-Oct-2024 <https://discuss.python.org/t/pre-pep-limiting-deletions-on-pypi/66351>`__
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

Abstract
========

This PEP proposes limiting user-level deletions of projects (including release
and file-level deletions) on PyPI, in favor of the "yanking" mechanism
adopted by PyPI with :pep:`592`.

In particular, the PEP proposes a time-based deletion criterion via
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
which projects (and their releases and files) are protected from *deletion*
after an initial grace period of 72 hours following their creation.
An exception to this criteria is made for versions and files that are
marked with :ref:`pre-release specifiers <packaging:version-specifiers>`,
which will remain deletable at all times.

This PEP does not propose changes to administrator-initiated deletions, e.g.
for moderation or security purposes.
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

Rationale and Motivation
========================

As observed in :pep:`592`, user-level deletion of projects on PyPI
enables a `Catch-22 <https://www.merriam-webster.com/dictionary/catch-22>`_
of dependency breakage:
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

Whenever a project detects that a particular release on PyPI might be
broken, they oftentimes will want to prevent further users from
inadvertently using that version. However, the obvious solution of
deleting the existing file from a repository will break users who have
followed best practices and pinned to a specific version of the project.
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

This leaves projects in a catch-22 situation where new projects may be pulling
down this known broken version, but if they do anything to prevent that they’ll
break projects that are already using it.
Comment on lines +40 to +42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This leaves projects in a catch-22 situation where new projects may be pulling
down this known broken version, but if they do anything to prevent that they’ll
break projects that are already using it.
This leaves a project author in a catch-22 situation where other projects may be pulling
down this known broken version, but if the project author does anything to prevent that, it
breaks projects where the dependency has been pinned for use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as https://github.com/python/peps/pull/4080/files#r1817037637 -- I like this rephrasing but I don't know whether it's kosher to rewrite a direct quote 😅


On a technical level, the problem of deletion is mitigated by
"yanking," also specified in :pep:`592`. However, deletions continue to be
allowed on PyPI, and have caused multiple notable disruptions to the Python
ecosystem over the interceding years:

* July 2022: `atomicwrites <https://pypi.org/project/atomicwrites/>`_
was `deleted by its maintainer <https://github.com/untitaker/python-atomicwrites/issues/61>`_
in an attempt to remove the project's "critical" designation, without the
maintainer realizing that project deletion would also delete all previously
uploaded versions.

The project was subsequently restored with the maintainer's consent,
but at the cost of manual administrator action and extensive downstream
breakage to projects like `pytest <https://github.com/pytest-dev/pytest/issues/10114>`_.
As of October 2024, atomicwrites is archived but still has
around `4.5 million monthly downloads from PyPI <https://pypistats.org/packages/atomicwrites>`_.

* April 2023: `codecov <https://pypi.org/project/codecov/>`_ was deleted by
its maintainers after a long deprecation period. This caused extensive
breakage for many of Codecov's CI/CD users, who were unaware of the
deprecation period due to limited observability of deprecation warnings
within CI/CD logs.

The project was
`subsequently re-created <https://about.codecov.io/blog/message-regarding-the-pypi-package/>`_
by its maintainers, with a new release published to compensate for the deleted releases
(which were not restored), meaning that any pinned installations remained
broken. As of October 2024, this single release remains the only release on
PyPI and has around
`1.5 million monthly downloads <https://pypistats.org/packages/codecov>`_.

* June 2023: `python-sonarqube-api <https://pypi.org/project/python-sonarqube-api/>`_
deleted all released versions prior to 2.0.2, which was both a breaking
change version *and* was accompanied by a paid "Professional Edition".

The project's maintainer subsequently
`deleted conversations <https://discuss.python.org/t/stop-allowing-deleting-things-from-pypi/17227/114>`_
and force-pushed over the tag history for ``python-sonarqube-api``'s source repository,
impeding efforts by the community to compare changes between versions
and obtain previous, permissively-licensed versions.
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

* June 2024: `PySimpleGUI <https://pypi.org/project/PySimpleGUI/>`_ changed
licenses from LGPL to a commercial license, and deleted
`nearly all previous versions <https://discuss.python.org/t/48790/27>`_.
This resulted in widespread disruption for users, who (prior
to the relicensing) were downloading PySimpleGui
approximately 25,000 times a day. This deletion also impeded efforts
by the community to provide "continuity" projects, as the
permissively-licensed artifacts were no longer available for reference
and re-uploading.
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

In addition to their disruptive effect on downstreams, deletions
also have deleterious effects on PyPI's sustainability as well as the overall
security of the ecosystem:
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

* Deletions raise the baseline support load for PyPI's administrators and
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
moderators, as users mistakenly file support requests believing that PyPI
is broken, or that the administrators themselves have removed the
project.

* Deletions impair incident response and external analysis, making it
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
difficult to distinguish "good faith" maintainer behavior from malicious
post-exploitation track-covering.

The size and interdependency of the Python ecosystem is continuing to grow,
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
meaning that future deletions of projects can be reasonably assumed to
be *just as if not more* disruptive than the deletions sampled above.
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

Given the above, this PEP concludes that the availability of "hard" deletions
versus "soft" deletions (i.e. yanking) now presents more of a risk and detriment
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
to the Python ecosystem than a benefit.
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

Specification
=============

This PEP identifies 3 different types of deletable objects:
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

1. **Files**, which are individual project distributions (such as source
distributions or wheels).

Example: ``requests-2.32.3-py3-none-any.whl``.

2. **Versions**, which contain one or more files that share the same version.

Example: `requests v2.32.3 <https://pypi.org/project/requests/2.32.3/>`_.
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

3. **Projects**, which contain one or more versions.

Example: `requests <https://pypi.org/project/requests>`_.

This PEP proposes the following *deletion eligibility rules*:
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

* A **file** is considered deletable if and only if it was uploaded to
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
PyPI less than 72 hours from the current time, **or** if it
has a :ref:`pre-release specifier <packaging:version-specifiers>`.
* A **release** is considered deletable if and only if all of its
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
constituent files are deletable.
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
* A **project** is considered deletable if and only if all of its
constituent releases are deletable.
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

These rules are intentionally "telescoping": they allow new projects to be
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
deleted entirely, and allow old projects to delete new files or releases,
but do not allow old projects to delete old files or releases.

This is intended to strike a balance between competing interests: brand new
projects are unlikely to have significant community uptake and thus pose a
minimal disruptive risk, while established projects (of any size)
are more likely to have a "tail" of adopted versions. Their downstream users
are not necessarily equipped to address the sudden deletion
of a version, file, or the whole project.
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

Implementation
==============

This PEP's implementation primarily concerns aspects of PyPI that are not
standardized or subject to standardization, such as the web interface and
signed-in user operations. As a result, this section describes its
implementation in behavioral terms.

Changes
-------

* Per the eligibility rules above, PyPI will reject web interface requests
(using an appropriate HTTP response code of its choosing) for
file, release, or project deletion if the respective object is not
eligible for deletion.
* PyPI will amend its web interface to indicate a file/release/project's
deletion ineligibility, e.g. by styling the relevant UI elements as "inactive"
and making relevant bottoms/forms unclickable.

Security Implications
=====================

This PEP does not identify any positive or negative security implications
associated with proposed approach.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given above we discuss security implications of permitting deletions, I would expect that the security implications are this PEP makes things better (but also see my comment above).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible negative implication is that it may be harder to stop users from downloading a file that has security issues, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yeah, I need to qualify this: the positive implication above was that external users of the index (e.g. companies) benefit from limiting deletion because it makes post-compromise triage and remediation simpler, since the attacker can't remove their trail as easily. I wasn't sure whether to put that under this section though, since it's an indirect/fuzzy security implication, but I'll work it in for further discussion.

A possible negative implication is that it may be harder to stop users from downloading a file that has security issues, right?

@JelleZijlstra Could you elaborate a bit on what you're thinking here? This PEP doesn't stipulate any changes to PyPI admins' abilities to delete things, so the existing malware reporting and quarantining flows remain in place. So from my perspective this doesn't make it any harder to halt downloads of malicious packages.

OTOH, maybe you're thinking of packages that aren't malicious but just have security issues? For those I think the pre-existing "yank" mechanism is good enough (and is what people already use in practice), but I could add in some qualifications there if you think the PEP would benefit from it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, if a package maintainer discovers that there is malware in a package they released, they can delete the package. With this PEP's proposal, they'd have to instead wait for PyPI admins to get to it, which may take longer.

Example scenario: I am a maintainer of https://pypi.org/project/black/. I discover (e.g., through private disclosure) that one of the compiled wheels of the latest release contains malicious code due to a backdoor in cibuildwheel/GH Actions/mypyc (pretty outlandish scenarios, but bear with me). Right now, the first thing I would do is go to PyPI and delete the malicious wheel. With this PEP's proposal, I'd have to figure out the right channels to raise the issue to the PyPI admins, and possibly wait for some time.

This is not a common scenario and your answer could well be that the tradeoff is worth it, but it doesn't seem impossible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks for elaborating! That's an interesting case -- I can see the argument that it represents a slightly worse security posture, since there's now a middleman (PyPI's admins) for malware response.

I think my position is that it's "worth it," especially now that PyPI has a dedicated malware reporting flow (see screencap) as well as project "quarantine" support for temporarily suspending a project's downloads without actually deleting it. But I'll make sure to include that in a revision to fairly characterize this PEP's changes.

Screenshot 2024-10-28 at 12 10 53 PM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that position makes sense to me!


How To Teach This
=================

This PEP suggests at least two pieces of public-facing material to help
the larger Python packaging community (and its downstream consumers)
understand its changes:

* An announcement post on the `PyPI blog <https://blog.pypi.org>`_ explaining
the nature of the PEP and its behavioral implications for PyPI.
* Updates to the `PyPI user documentation <https://docs.pypi.org/>`_ explaining
the difference between deletion and yanking and the limited conditions under
which the former can still be initiated by package owners.
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

Rejected Ideas
==============

Conditioning deletion on dependency relationships
-------------------------------------------------

An alternative to time-based deletion windows is deletion eligibility based on
downstream dependents. For example, a release could be considered deletable
if and only if it has fewer than ``N`` downstream dependents on PyPI,
where ``N`` could be as low as 1.

This idea is appealing since it directly links deletion eligibility to
disruptiveness. `NPM <https://www.npmjs.com/>`_ uses it and
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
conditions project removal on the absence of any downstream dependencies
known to the index.

Despite its appeal, this PEP identifies several disadvantages and technical
limitations that make dependency-conditioned deletion not appropriate
for PyPI:

1. *PyPI is not aware of dependency relationships.* In Python packaging,
both project builds *and* metadata generation are frequently dynamic
operations, involving arbitrary project-specified code. This is typified
by source distributions containing ``setup.py`` scripts, where the execution
of ``setup.py`` is responsible for computing the set of dependencies
encoded in the project's metadata.

This is in marked contrast to ecosystems like NPM and Rust's
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
`crates <https://crates.io/>`_, where project *builds* can be dynamic but
the project's metadata itself is static.

As a result of this,
`PyPI doesn't know your project's dependencies <https://dustingram.com/articles/2018/03/05/why-pypi-doesnt-know-dependencies/>`_,
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
and is architecturally incapable of knowing them without either running
arbitrary code (a significant security risk) or performing a long-tail
deprecation of ``setup.py``-based builds in favor of :pep:`517` and
:pep:`621`-style static metadata.

2. *Results in an unintuitive permissions model.* Dependency-conditioned
deletion results in a "reversed" power relationship, where anybody
who introduces a dependency on a project can prevent that project from
being deleted.

This is reasonable on face value, but can be abused to produce unexpected
and undesirable (in the context of enabling some deletions) outcomes.
A notable example of this is NPM's
`everything package <https://www.npmjs.com/package/everything>`_, which
depends on every public package on NPM (as of 30-Dec-2023) and thereby
woodruffw marked this conversation as resolved.
Show resolved Hide resolved
prevents their deletion.


Conditioning deletion on download count
---------------------------------------

Another alternative to time-based deletion windows is to delete based on the
number of downloads. For example, a release could be considered deletable if
and only if it has fewer than ``N`` downloads during the last period.

While presenting advantages by tying a project deletion possibility to its
usage, this PEP identifies several limitations to this approach:

1. *Ecosystem diversity.* The Python ecosystem includes projects with widely
varying usage patterns. A fixed download threshold would not adequately account
for niche but critical projects with naturally low download counts.

2. *Time sensitivity.* Download counts do not necessarily reflect a project's
current status or importance. A previously popular project might have low
recent downloads but still be crucial for maintaining older systems.

3. *Technical complexity.* Accessing the download count of a project within
PyPI is not straightforward, and there is limited possibility to gather a
project download statistics from mirrors or other distributions systems.
woodruffw marked this conversation as resolved.
Show resolved Hide resolved

Copyright
=========

This document is placed in the public domain or under the CC0-1.0-Universal
license, whichever is more permissive.
Loading