diff --git a/oeps/oep-0018-bp-python-dependencies.rst b/oeps/oep-0018-bp-python-dependencies.rst new file mode 100644 index 000000000..cf20e73ab --- /dev/null +++ b/oeps/oep-0018-bp-python-dependencies.rst @@ -0,0 +1,348 @@ +====================================== +OEP-0018: Python Dependency Management +====================================== + ++-----------------+--------------------------------------------------------+ +| OEP | :doc:`OEP-0018 ` | ++-----------------+--------------------------------------------------------+ +| Title | Python Dependencies Management | ++-----------------+--------------------------------------------------------+ +| Last Modified | 2018-03-27 | ++-----------------+--------------------------------------------------------+ +| Authors | Jeremy Bowman | ++-----------------+--------------------------------------------------------+ +| Arbiter | Nimisha Asthagiri | ++-----------------+--------------------------------------------------------+ +| Status | Draft | ++-----------------+--------------------------------------------------------+ +| Type | Best Practice | ++-----------------+--------------------------------------------------------+ +| Created | 2018-03-27 | ++-----------------+--------------------------------------------------------+ +| `Review Period` | 2018-03-27 - 2018-04-20 | ++-----------------+--------------------------------------------------------+ +| `Resolution` | | ++-----------------+--------------------------------------------------------+ + +Abstract +======== + +Proposes best practices for declaring and maintaining dependencies on other +Python packages in Open edX software repositories. + +Motivation +========== + +The Open edX project includes dozens of Python software repositories, most of +which depend on certain other Python packages being installed in order to +function correctly. The simple methods we originally used to do this have +assorted drawbacks that have repeatedly caused problems over the past few +years: accidental upgrades to incompatible versions, strict installation +requirements that restrict the ability of downstream packages to manage their +own dependency versions, lack of clarity regarding the full set of packages +actually depended upon, etc. + +Outlined here is a recommended standard for declaring dependencies on other +Python packages which resolves most of these issues and will let us make all +the Open edX Python packages consistent with each other (and many other open +source Python projects) for ease of understanding and maintenance. + +Specification +============= + +The key to successful Python dependency management is to break it down into +four parts: + +1. Identify the different contexts in which dependencies will need to be + installed. +2. For each of these contexts, declare the direct dependencies that will be + needed. Use the least restrictive constraints which should allow pip to + install a working set of dependencies. +3. Auto-generate from the high-level dependencies declaration a complete + set of exact package versions that are known to work for a Python + `virtualenv`_ created for that context. +4. Automate updates of the detailed dependencies listing for each context. + +.. _virtualenv: https://virtualenv.pypa.io/ + +Identify Usage Contexts +----------------------- + +The dependencies of Python software are typically installed or run in a +variety of different contexts over the course of developing and using it. +The set of dependencies needed to perform a task can easily vary between these +contexts. The dependencies for each context will be captured in a +separate file in the ``requirements`` directory. Here are some common +contexts and the file names often used for them: + +* Just the standard set of core dependencies for execution on a production + server to perform its primary purpose (``base.in``) +* Additional dependencies which are only needed when optional extra features + of a package are desired +* Assorted testing libraries to run automated test suites (``test.in``) +* Static code analysis tools to perform code quality checks (``quality.in``) +* The utilities called directly by a CI server to create and use one or more + virtualenvs and report code coverage statistics to a 3rd-party service + (``jenkins.in``, ``travis.in``) +* `Sphinx`_ and other utilities used to generate developer documentation + (``doc.in``) +* Additional utilities needed to perform common development tasks (``dev.in``) +* Utilities that a particular developer likes to use with a repository, but + aren't strictly needed for any of the regular contexts (``private.in``). + +.. _Sphinx: http://www.sphinx-doc.org/ + +Declare Direct Dependencies +--------------------------- + +As indicated above, some of the usage contexts have a standard filename used in +the ``requirements`` directory of an Open edX repository to list dependencies. +Others will have an appropriate filename custom to that repository's unique +context. Each of these is a ``pip``-compatible `requirements file`_ listing +the direct dependencies needed for that context. Beyond complying with the +file format, there are a few guidelines each of these files should follow: + +* The file should start with a brief comment explaining the context in which + these dependencies are needed. Examples can be found in the + `cookiecutter-django-app`_ repository. +* Each listed dependency should have a brief end-of-line comment explaining + its primary purpose(s) in this context. These comments typically start at + the 27th character, but this is just a convention for consistency with files + generated by ``pip-compile``. +* Version constraints should only be used to exclude dependency versions which + are known (or strongly suspected) to not work in this context. +* Indirect dependencies (used by dependencies but not directly by the code in + the repository itself) should not be listed unless a constraint is needed to + enforce a compatible version; these are automatically detected and captured + elsewhere as described below. +* `Environment markers`_ should be used as necessary to indicate dependencies + which should only be installed on specific operating systems, Python + versions, etc. +* Avoid direct links to packages in local directories, GitHub, or other version + control systems if at all possible; all dependencies should be installed + from `PyPI`_. If you think you're in one of the rare circumstances where + installing a package from a URL is appropriate, see the notes below on + `Installing Dependencies from URLs`_ +* If the dependencies in one context are a superset of those in another one, + do not repeat the dependencies. Instead, explicitly include the file + produced by ``pip-compile`` for the smaller set of dependencies in the + requirements file for the larger set of dependencies. For example, + ``test.in`` often includes a line like the following to ensure that the same + versions of packages used in production for a service will also be used when + testing it: + +.. code-block:: python + + -r base.txt # Core dependencies of the service being tested + +If the repository contains a ``setup.py`` file defining a Python package, the +base dependencies also need to be specified there. These can be derived from +``requirements/base.in`` with a simple Python function declared in +``setup.py`` itself. An example can be found in the +`setup.py file for edx-completion`_. + +.. _requirements file: https://pip.readthedocs.io/en/1.1/requirements.html +.. _cookiecutter-django-app: https://github.com/edx/cookiecutter-django-app/tree/master/%7B%7Bcookiecutter.repo_name%7D%7D/requirements +.. _Environment markers: https://www.python.org/dev/peps/pep-0508/#environment-markers +.. _PyPI: https://pypi.org/ +.. _setup.py file for edx-completion: https://github.com/edx/completion/blob/master/setup.py + +Generate Exact Dependency Specifications +---------------------------------------- + +Although we want to keep our manually edited requirements files very simple, +we need a separate set of requirements files which list every single package +needed for each usage context, with exact versions of each for reproducible +test runs and consistent development and production environments. We can +generate these automatically using `pip-tools`_, which consists of two related +utilities: + +* ``pip-compile`` generates a requirements file from one or more high-level + input requirements files, listing exact versions of every listed and + indirect dependency needed to satisfy the given constraints. +* ``pip-sync`` ensures that the current virtualenv contains exactly (and only) + the packages listed in the given requirements files, installing, upgrading, + and uninstalling packages as needed. + +Open edX packages use an ``upgrade`` make target to use ``pip-compile`` to +automatically update the detailed requirements files (``requirements/*.txt``) +to use the newest available packages which satisfy the constraints in the +direct dependencies files. These generated files are then used anywhere that +runs a command to install dependencies: ``tox.ini``, ``.travis.yml``, the +``requirements`` make target (for updating a local development environment), +etc. + +Sometimes ``pip-compile`` will be unable to find a suitable version of a +dependency for the output file because there are incompatible version +constraints in the input files and/or the stated installation requirements +of the other dependencies. In cases like this, installing and running +`pipdeptree`_ can help identify the conflicting constraints so at least one +of them can be sufficiently relaxed such that a version of the dependency +exists which satisfies them all. + +.. _pip-tools: https://github.com/jazzband/pip-tools +.. _pipdeptree: https://github.com/naiquevin/pipdeptree + +Automate Updates of Exact Dependency Specifications +--------------------------------------------------- + +While we want all dependencies explicitly pinned in order to benefit from +consistent testing and development environments, it isn't acceptable to leave +these versions untouched for long stretches of time. Packages we depend on +routinely release new versions to address security issues, fix bugs, and add +new features. While we don't necessarily need to update our repositories +every time a new dependency version is released, we do want to keep them +current enough that upgrading a single package to fix a known issue doesn't +require suddenly adapting to a few years' worth of API changes that we didn't +pay attention to. + +Each Open edX repository should have the following: + +* An ``upgrade`` make target as described above, to update the pinned versions + of all dependencies (and account for any new or removed indirect + dependencies). +* An automated test suite with reasonably good code coverage, configured to + be run on new GitHub pull requests. +* A service configured to periodically auto-generate a GitHub pull request + that tests the output of running ``make upgrade`` (if it results in any + changes). This can either be a service such as `requires.io`_ which tracks + new releases of Python package dependencies, or a recurring scheduled job. +* At least one designated maintainer who receives notifications of the + generated pull requests and will merge or fix them as needed. This + maintainer should scan the changelog for each upgraded package to look for + changes that merit closer inspection; services like `requires.io`_ and + `AllMyChanges.com`_ can make this easier. + +.. _requires.io: https://requires.io/ +.. _AllMyChanges.com: https://allmychanges.com/ + +Installing Dependencies from URLs +--------------------------------- + +As noted above, you should generally avoid installing requirements from a URL +or local directory instead of PyPI. But there are a few circumstances where +it can be appropriate: + +* You need to test a release candidate of the dependency to make sure it will + work with your code. +* You critically need a fix for a package which has not yet been included in + a release, and you cannot arrange for a release to be made in a timely + manner. + +In most other circumstances, the package should be added to PyPI instead. +There are several good reasons for this: + +* Specified VCS branches, commits, and tags can all be deleted from a + repository at any time, suddenly making it impossible to find and install + the dependency. +* Editable requirements (starting with "-e ") are downloaded and/or inspected + with each installation of the requirements file, even if the correct version + is already installed. This can significantly slow down updates of installed + requirements. +* Packages installed from local directories don't reflect any changes to + package metadata (like required package versions) until the version number + is incremented or the package is uninstalled; just installing again doesn't + help. +* Package URLs tend to be long and difficult to read, with the actual name of + the package hidden in the middle or not even present at all. +* As of this writing, ``pip-tools`` still has some bugs in handling packages + installed from local directories or URLs that require special care to work + around. `Non-editable URL installations`_ are not supported, and + `relative local paths are expanded to absolute paths`_. These can be + partially worked around via a post-processing script for the generated + requirements files; an example can be found in `edx-platform`_ at + ``scripts/post-pip-compile.sh``. + +If you do need to include a package at a URL, it should be editable (start with +"-e ") and have both the package name and version specified (end with +"#egg=NAME==VERSION"). + +.. _Non-editable URL installations: https://github.com/jazzband/pip-tools/issues/355 +.. _relative local paths are expanded to absolute paths: https://github.com/jazzband/pip-tools/issues/204 +.. _edx-platform: https://github.com/edx/edx-platform + +Rationale +========= + +The practices outlined here help prevent the following problems that we have +encountered in the past: + +* A new deployment of an Open edX release fails because an unpinned indirect + dependency recently released a backwards-incompatible version. +* Tests unrelated to a new code change fail, because an unpinned dependency + was upgraded to a backwards-incompatible version. This can be difficult + to diagnose because the upgrade doesn't appear in the diff of pending + changes. +* Tests have been running against a particular set of pinned versions for + years, but we now need to upgrade one (like Django) which requires also + upgrading several of the other dependencies. This can force dealing with + a few years' worth of backwards-incompatible changes in multiple packages + all at once, whereas dealing with them one at a time every few months in + smaller pull requests would have been more manageable. +* We have a different version of a dependency installed than we expect, + because the constraints imposed on pip for choosing a version vary between + different requirements files and we install them one file at a time. +* We keep using years-old package versions despite the availability of newer + versions with accumulated bug fixes and performance improvements. +* We install in production environments packages which are only needed for + testing, because we didn't make a clean distinction between the dependencies + for different usage contexts. This slows down deployments. +* We try to exhaustively pin all indirect dependencies manually, but miss some + (especially when a seemingly innocuous upgrade adds some new dependencies). +* We keep installing a package long after we stopped using it, because nobody + remembers why it was added to the requirements file (especially true for + indirect dependencies that were later dropped as requirements of the package + we use directly). +* We install an exhaustive set of testing dependencies in Travis, even though + we really only need it to run tox and codecov; the rest of the testing + dependencies are installed in a separate virtualenv created by tox, which + should have a separate requirements file. +* An attempt to pin dependencies in setup.py (or parse its dependencies + automatically from a requirements file) forces us to change that package + before we can upgrade one of those dependencies in another repository + using that package. +* We add a dependency without realizing that it requires multiple additional + indirect dependencies; we may have chosen an alternative if that had been + apparent. + +Reference Implementation +======================== + +Many of the Open edX repositories have already begun to comply with the +recommendations outlined here. In particular, repositories generated using +`cookiecutter-django-app`_ should be configured correctly from the outset. +These may also be useful for reference: + +* `django-user-tasks `_ +* `edx-completion `_ +* `XQueue `_ + +Rejected Alternatives +===================== + +`pipenv`_ is a relatively new utility for managing Python dependencies, +written by Kenneth Reitz (author of the `requests`_ package). Although it +recently became the default dependency management tool recommendation of the +`Python Packaging User Guide`_, it lacks some features that we strongly want +for Open edX: + +* The ability to specify more than 2 sets of dependencies (core and + development) +* The ability to add comments to the dependencies listing explaining why each + one is needed +* Indication of which other dependencies caused the inclusion of indirect + dependencies in the full set of requirements +* Easy interoperability with `tox`_, especially for testing multiple versions + of a major dependency + +As a younger package than ``pip-tools``, it also seems to have more +significant still-unresolved problems, although those are gradually being +fixed. + +.. _pipenv: https://docs.pipenv.org/ +.. _requests: http://python-requests.org/ +.. _Python Packaging User Guide: https://packaging.python.org/tutorials/managing-dependencies/#managing-dependencies +.. _tox: https://tox.readthedocs.io/ + +Change History +==============