Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation documentation and issues #5

Closed
rabernat opened this issue Jul 29, 2020 · 23 comments
Closed

Installation documentation and issues #5

rabernat opened this issue Jul 29, 2020 · 23 comments

Comments

@rabernat
Copy link

This issue is in reference to my JOSS review of your paper (openjournals/joss-reviews#2407).

I am trying to understand the recommended installation procedure for this package.

On one hand, you seem to provide a pip package with dependencies:

IPART/setup.py

Lines 14 to 22 in 6bbf452

install_requires=[
"cdms2",
"matplotlib",
"scipy",
"scikit-image",
"pandas",
"cdutil",
"basemap",
"networkx",

On the other hand, your installation instructions don't mention pip and instead recommend manually creating a new conda environment from scratch. The instructions are mostly about how to create conda environments. This does not adhere to best practices for distributing python code. For example, what if you want to use this package in an existing environment? Do you have to create a new environment?

A better solution would be to provide working pip and / or conda-forge packages which automatically check for and, if needed, install the required dependencies on installation. This would allow the user to simply run pip install ipart or conda install -c conda-forge ipart. Could you provide a pip or conda-forge package for this software?

A related issue is whether the user is required to clone the master repository in order to use the software. I notice that there are various scripts and notebooks that are part of the repo but not the package itself. Are these required or optional. Again, this is not a standard practice for scientific python software. I recommend that users only be required to interact with the git repository if they wish to contribute to IPART development. Otherwise a pip / conda install should be sufficient to use the software.

Making these changes would also help resolve #2.

@rabernat
Copy link
Author

To support these suggestions, here are some references about how to properly package python software:

@Xunius
Copy link
Collaborator

Xunius commented Jul 30, 2020

@rabernat Thanks for the comments. I ran into some difficulties addressing these issues.

I tried to upload to PyPI, but it seems that basemap is no longer install-able using pip:

(test) 😷:~$ pip3 install basemap
ERROR: Could not find a version that satisfies the requirement basemap (from versions: none)
ERROR: No matching distribution found for basemap

So it won't work as a dependency.

Then I tried building a conda package, but failed again. Previously I managed to conda-build a cdat-lite based version for py2, I don't know what changes I made that make the new attempts fail. The trackback starts from a line of license='GPL-3 (see below), which is really confusing.

This is my meta.yml:

package:
    name: ipart
    version: '2.0'
source:
    git_url: https://github.com/ihesp/IPART.git
    git_tag: master
build:
    number: 0
requirements:
    build:
        - python
    run:
        - python 3.7*
        - numpy>=1.16
        - scipy
        - netcdf4
        - pandas
        - scikit-image
        - networkx
        - matplotlib
        - basemap
about:
    home: https://github.com/ihesp/IPART
    license: GPL-3.0

The build.sh:

#!/bin/bash
python setup.py install

Both build.sh and meta.yml are in a folder ipart, which also contains a LICENSE and a bld.bat.

The conda-build ipart outputs (starting from errors):

Installed $PREFIX/lib/python3.7/site-packages/ipart-2.0.2-py3.7.egg
Processing dependencies for ipart==2.0.2
Searching for basemap==1.2.0
Traceback (most recent call last):
  File "setup.py", line 37, in <module>
    license='GPL-3'
  File "/home/guangzhi/anaconda3/conda-bld/ipart_1596074610735/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/site-packages/setuptools/__init__.py", line 165, in setup
    return distutils.core.setup(**attrs)
  File "/home/guangzhi/anaconda3/conda-bld/ipart_1596074610735/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()

<<<a number of similar lines starting "File "/home/guangzhi/anaconda3/conda-bld/ipart_1596074610735">>>

  File "/home/guangzhi/anaconda3/conda-bld/ipart_1596074610735/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/site-packages/setuptools/package_index.py", line 327, in process_url
    "Be sure to add all dependencies in the meta.yaml  url=%s" % url)
RuntimeError: Setuptools downloading is disabled in conda build. Be sure to add all dependencies in the meta.yaml  url=https://pypi.org/simple/basemap/
Traceback (most recent call last):
  File "/home/guangzhi/anaconda3/bin/conda-build", line 11, in <module>
    sys.exit(main())
  File "/home/guangzhi/anaconda3/lib/python3.7/site-packages/conda_build/cli/main_build.py", line 469, in main
    execute(sys.argv[1:])
  File "/home/guangzhi/anaconda3/lib/python3.7/site-packages/conda_build/cli/main_build.py", line 460, in execute
    verify=args.verify, variants=args.variants)
  File "/home/guangzhi/anaconda3/lib/python3.7/site-packages/conda_build/api.py", line 209, in build
    notest=notest, need_source_download=need_source_download, variants=variants)
  File "/home/guangzhi/anaconda3/lib/python3.7/site-packages/conda_build/build.py", line 2344, in build_tree
    notest=notest,
  File "/home/guangzhi/anaconda3/lib/python3.7/site-packages/conda_build/build.py", line 1492, in build
    cwd=src_dir, stats=build_stats)
  File "/home/guangzhi/anaconda3/lib/python3.7/site-packages/conda_build/utils.py", line 398, in check_call_env
    return _func_defaulting_env_to_os_environ('call', *popenargs, **kwargs)
  File "/home/guangzhi/anaconda3/lib/python3.7/site-packages/conda_build/utils.py", line 378, in _func_defaulting_env_to_os_environ
    raise subprocess.CalledProcessError(proc.returncode, _args)
subprocess.CalledProcessError: Command '['/bin/bash', '-o', 'errexit', '/home/guangzhi/anaconda3/conda-bld/ipart_1596074610735/work/conda_build.sh']' returned non-zero exit status 1.

I think I've added all dependencies in both the setup.py and meta.yml. Do you spot anything wrong? What is missing?

@rabernat
Copy link
Author

For building a conda package, the only way to go is to use conda forge: https://conda-forge.org/#add_recipe
Don't waste time trying to run conda-build yourself.

it seems that basemap is no longer install-able using pip

Basemap is no longer maintained. Cartopy has replaced it.

But more generally, why does your package need to depend on a mapping tool at all? Or any other visualization package? Of course one needs visualization to plot the results, but that's not the same thing as a dependency.

@Xunius
Copy link
Collaborator

Xunius commented Aug 2, 2020

I think I've managed to get a PR merged to conda-forge, and I just tested that it is installable using conda install -c conda-forge ipart in my Linux box.
Please also checkout this commit where I updated the installation guides.

@kbarnhart
Copy link
Contributor

@Xunius these changes are a great step towards making IPART much easier to install. 👍

Looking at your meta.yaml for conda-forge, it looks like no tests run as part of making the conda-forge build. Recipes should include at least some tests (see, for example, this part of the conda forge documentation).

Your tests take about 9 seconds to run, so I would recommend having them all run.

One other minor comment:

  • In the Readme section on tests I would recommend making clear that running the tests will only work if one has done a source code install.

@Xunius
Copy link
Collaborator

Xunius commented Aug 24, 2020

@kbarnhart So if I understood you correctly, the suggested change is to add something like the following to the meta.yaml?

test:
  imports:
    - ipart
    - ipart.utils
  source_files:
    - tests
  commands:
    - python -m unittest discover -s tests

If so, is the README change still required?

@kbarnhart
Copy link
Contributor

@Xunius, based on my understanding of IPART and conda-forge, what you've described above would be the appropriate change. With the test commands you then ensure that the packages built by conda-forge pass the tests before they are distributed.

I would still recommend making the README change because a user who installed with conda-forge would not be able to run python -m unittest discover -s tests. I've had users get confused that their install was broken when, in fact, they had just installed the binaries and shouldn't have been expected to run any tests.

Xunius added a commit that referenced this issue Aug 26, 2020
Xunius added a commit to Xunius/ipart-feedstock that referenced this issue Aug 26, 2020
@Xunius
Copy link
Collaborator

Xunius commented Aug 26, 2020

@kbarnhart After a bit of trial-n-error, I added unittest to the meta.yml recipe and merged the PR.

Also edited README in this commit

@kbarnhart
Copy link
Contributor

@Xunius thanks for making that change to the conda-forge recipe.

As this is @rabernat 's issue and is a bit more substantial than #6 I'm going to let him chime in on whether he thinks the issues he has raised have been addressed.

I want to follow up with some guidance from JOSS regarding dependencies, and in particular deprecated dependencies. It appears that basemap is a listed dependency because it is used to make some built in result plots. I consider whether or not these sort of functions are in or out of the package as a point of developer preference and will not express an editorial opinion on it here.

However, as @rabernat points out, basemap was deprecated (quite a while ago) and replaced with cartopy.

I consulted with the rest of the JOSS editorial board regarding this case: Unless you can make a strong case that the deprecated dependency cannot be replaced by the new package or it would be a substantial burden to replace it, then you must replace it as part of the review. The JOSS :_ formal peer review process that is designed to improve the quality of the software submitted_. https://joss.theoj.org/about

Basemap is used relatively infrequently in ipart, and thus I do not think it is an undue burden to have you replace its use with cartopy.

@Xunius
Copy link
Collaborator

Xunius commented Aug 28, 2020

@kbarnhart @rabernat I've made the transition from basemap to cartopy in these commits:
commit 1
commit 2
commit 3
commit 4

New conda-recipe is updated here

A side note regarding the cartopy replacement, I feel that we are pushing the deprecation too hard, last time I checked out cartopy it didn't support the latitude/longitude labeling in projections other than PlateCarree and Mercator, this functionality is added in the recent 0.18.0 release in May. It took me a good while to figure out how to plot the data across the dateline, and you can see some relevant issues still open. And basemap is perfectly installable inside conda. Anyway, please let me know whether this issue has been addressed.

@kbarnhart
Copy link
Contributor

Thanks for this update @Xunius.

I'll let @rabernat weight in on whether his concerns have been addressed.

@rabernat
Copy link
Author

rabernat commented Sep 1, 2020

Hi @Xunius -- sorry for my long silence. Thanks for all the work you've put into this. It's great that the package is now conda installable!

I don't have strong feelings about basemap vs. cartopy, other than pointing out that basemap is deprecated. Relying on basemap will make it harder to maintain your package in the future. But my most relevant comment was:

... more generally, why does your package need to depend on a mapping tool at all? Or any other visualization package? Of course one needs visualization to plot the results, but that's not the same thing as a dependency.

It don't think this question has been addressed. I don't fundamentally understand why you need cartopy / matplotlib as a dependency. This is not a visualization package, is it? I see you have some utility routines (e.g. plotAR) which call those packages. But these could be made optional? Matplotlib and cartopy are huge dependencies. Imagine that someone wants to use your package as part of an automated system--they don't need any visualization, they just want to detect the ARs and write the data to a file. In its current form, you are requiring them to install matplotlib / cartopy in their environment. Your own tests don't even cover the plotAR function, so clearly it is not crucial to the package's functionality. (Conversely, if it is, it should be covered by tests, but that's a separate issue.) You could remove the matplotlib imports from your test suite, as the package is not used at all in the tests.

Regarding the original issue here, installation instructions, I see you have made some updates. However, your recommended installation is still:

Users are advised to obtain the code of this pacakge from the github page, and create a new conda environment using the environment files provided:

Considering these recommendations differ from 99% of existing python packages, I would like to understand the justification for them. Why is iHESP so special that it requires its own dedicated environment + installation from github? Why can't the installation instructions just simply be?

conda install -c conda-forge ipart

Wouldn't that be simpler and easier for your users?

@Xunius
Copy link
Collaborator

Xunius commented Sep 1, 2020

@rabernat Thanks for coming back.

I used basemap previously for plotting, but also for some computations as well. I used its coordinates transformation function to compute some polygon areas measured in km^2. But that relevant functionality was turned off in a later change. So it is now solely for plotting. Matplotlib's contour function is also used to obtain the contour coordinates, without actually plotting them out. That's why I have to import matplotlib in the test scripts, and change the backend to 'Agg' immediately.

My understanding is that since JOSS requires the authors to provide some notebooks to help the users to get to know the software, plotting modules are therefore required dependencies. Similarly for the "advised" installation method, I feel that it's encouraged to let the user get the git repository, so that they can obtain the notebooks. Besides, the conda install -c conda-forge ipart guide is provided just below that, I don't think that's too big a confusion to the readers.

@kbarnhart
Copy link
Contributor

I'll chime in quickly with some JOSS-specific guidance: JOSS does not require notebooks, specifically. However, we do require that there is sufficient narrative documentation such that a user can learn to use the software, and API documentation. Often for python packages, Jupyter Notebooks are one of the easiest ways to provide narrative documentation and examples.

However, this is beside the point of whether matplotlib and cartopy are appropriate as dependencies.

@Xunius
Copy link
Collaborator

Xunius commented Sep 2, 2020

@kbarnhart So it is ok to remove cartopy as a dependency but leave the notebooks there? Just put a sentence somewhere in the notebook to tell the user to install cartopy themselves?

@kbarnhart
Copy link
Contributor

@Xunius That is not how I would interpret my comment. My comment was meant to point out that jupyter notebooks are not an explicit requirement of JOSS. However, having narrative documentation and examples are a requirement (and these are commonly created as jupyter notebooks).

One approach that I have seen to communicate install instructions when there are packages needed to run examples but not needed for the package itself is as follows:

  • list the core dependencies for the package in the setup.py and conda recipe.
  • create a conda environment for running examples that includes the package itself and anything else needed for running the examples (packages listed are often use for visualization).
  • create a conda environment for source code development which includes the packages dependencies, and developer tools such as pytest, coveralls, and sphinx.

A nice byproduct of this approach is that one of the "examples" conda environment can be used to specify Binder.

It supports a pretty clean set of install instructions, namely:

  1. To use the package install via conda install -c conda-forge ipart
  2. To create a conda environment in which the examples can run create an environment with "environment.yml".
  3. To create a conda environment for source code modification use "environment-dev.yml"

@rabernat may have comments on this approach.

However, it appears to me that there are still some comments made by @rabernat above to address. Specifically (and I paraphrase):

  • Why are the installation instructions different than 99% of python packages
  • Why are matplotlib and cartopy dependencies? (@Xunius started to address this here).

@rabernat
Copy link
Author

rabernat commented Sep 2, 2020

This is a really nice summary @kbarnhart.

I think this is a technical issue, related to what it means to create a package vs. documentation / examples of that package. Let me use my experience as an xarray developer to help clarify.

Dependencies

Required Dependencies

Required dependencies should be the bare minimum of other packages required by your package in order to run its core functionality. Xarray's dependencies, as specified in its setup.cfg, are just

    numpy >= 1.15
    pandas >= 0.25
    setuptools >= 38.4

https://github.com/pydata/xarray/blob/2acd0fc6563c3ad57f16e6ee804d592969419938/setup.cfg#L76-L82

These must be installed for xarray to run, and if you try to install xarray with pip or conda, these dependencies will automatically be installed as well. Setuptools handles that for you.

So you should ask yourself: what are the bare minimum requirements to run iHESP's core functionality?

Optional Dependencies

Optional dependencies are extra packages which, if installed, enable extra optional features in your package. For xarray, matlplotlib in an optional dependency. It is not required, but some functions use it (like all of the plotting functions). If you try to use these functions without matplotlib installed, you just get an error when xarray tries to import matplotlib. But the rest of the package works fine.

Environments

An environment is not the same as the dependencies. An environment is a self-contained python installation. Most packages define several different environments for different purposes, but users should not be required or expected to use these exact environments. Most users already have existing environments that they know and like, and they don't want to create a new environment just for your package.

Test environments

In xarray, we define 8 different environments for testing:
https://github.com/pydata/xarray/tree/master/ci/requirements
These environments contain different combinations of python and required / optional dependencies. At least one environment should have all of the required dependencies, to make sure you test all of your package's code. These environments are built every time our CI runs.

Documentation / Demo Environments

Xarray's documentation has it's own environment:
https://github.com/pydata/xarray/blob/master/ci/requirements/doc.yml
This environment contains many optional dependencies, as well as other packages unrelated to xarray itself, like sphinx for actually building the docs.

Finally, we have a binder environment, designed to let users try xarray in the cloud without installing anything:
https://github.com/pydata/xarray/blob/master/.binder/environment.yml
It includes many optional dependencies. The binder links appear in our documentation, like on this page.


This is a lot to digest. If this is your first time putting together a python package, it may be overwhelming. My point is that looking at the best practices established by mature, widely-used python packages can be a very useful guide.

@Xunius
Copy link
Collaborator

Xunius commented Sep 6, 2020

@kbarnhart @rabernat Thanks for all these information. Now I understand your point better.

So I have tried to make cartopy an optional dependency and make it installable via the environment file approach, in these commits:

  1. make cartopy optional
  2. installation guide change
  3. minor wording change in index.rst
  4. remove cartopy from test_install.py.

and the updated conda recipe.

Regarding matplotlib, I'm still using it to compute contour coordinates, so it's not only for plotting, therefore a required dependency.

@rabernat
Copy link
Author

rabernat commented Sep 8, 2020

Regarding matplotlib, I'm still using it to compute contour coordinates, so it's not only for plotting, therefore a required dependency.

Makes sense. 👍

None of these links works for me, so it's hard for me to directly validate your claims / approve your changes.

@Xunius
Copy link
Collaborator

Xunius commented Sep 8, 2020

@rabernat My bad. I copied the wrong string. Please check the updated links below:

@kbarnhart
Copy link
Contributor

@Xunius with one small exception I think these changes look good. Of course, I'll let @rabernat chime in and indicate what he thinks.

My only recommendation would be to put the cartopy import in ipart\utils\plot.py in a try except block with an informative error message telling a user that they've tried to use a part of ipart which relies on the optional dependency. For example:

try:
    import cartopy
except ImportError:
    raise ImportError("Your message here")

@kbarnhart
Copy link
Contributor

@Xunius and @rabernat I wanted to check in about the status of this issue.

@rabernat
Copy link
Author

I second @kbarnhart's suggestion above. But I think enough has been done here that we can close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants