Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Qurro to support pandas v1 and up, and thus newer versions of QIIME 2 #322

Merged
merged 36 commits into from
Oct 20, 2022

Conversation

fedarko
Copy link
Collaborator

@fedarko fedarko commented Jul 5, 2022

  • Port CI from Travis to GitHub Actions (close Switch to GitHub actions for CI #316)

    • Made use of the nice matrix functionality in GitHub Actions to automatically test on multiple QIIME 2 versions.
    • Currently, we just test on the oldest and newest QIIME 2 versions that Qurro supports (2020.11 and whatever the latest is). (The reason 2020.11 is the oldest Q2 version we support is because it's the first that switched over to pandas >= 1.)
    • Added a standalone CI, analogous to EMPress' standalone CI.
  • Updated the code to not use pd.SparseDataFrame, and instead use pd.DataFrame with sparse arrays (close Support pandas v1: Switch from SparseDataFrame to "regular ... DataFrame with sparse values" #258, close Qurro will break on newer (>= 2020.11) versions of QIIME 2 #315)

    • This required much less work than it sounds like, because biom.Table.to_dataframe() does all the heavy lifting. Previously, Qurro was actively avoiding using that function due to a bug with it ([python] to_dataframe does not produce sparse data frames biom-format#808), but -- now that that bug is fixed in all biom versions that Qurro accepts -- we can just delegate to it.
  • Updated the code to remove various sources of warnings. A lot of things have become deprecated in the past few years (e.g. pd.DataFrame.append(), using a set as an indexer in pandas, np.matrix, ...)

  • Updated the documentation regarding the minimum Q2 version stuff

This PR is almost ready to go -- the main things I need to fix are

  • rerunning all of the example notebooks now that there aren't five million warnings being created from Support pandas v1: Switch from SparseDataFrame to "regular ... DataFrame with sparse values" #258
    • as an extra difficulty spike, some of the dependencies of these notebooks are now incompatible with the sort of conda environments we set up (e.g. Songbird explicitly requires a pandas version < 1). There are various ways to get around this, ranging from "convert the notebooks to markdowns and call it a day", "split into separate notebooks", etc; let's see what's easiest.
  • updating the changelog

fedarko added 18 commits July 5, 2022 03:38
Maybe we can make another GitHub Actions for these later; but
Songbird is causing tensorflow nonsense to pop up, and this is not
the sort of thing I think we should spend time fixing (esp with
the advent of birdman)
See biocore#258 and biocore#315. not confident this is done yet (and if nothing
else the rest of the code gleefully refers to "SparseDataFrame"
because 2019 marcus was a schmuck), but this at least fixes a fair
amount of failing tests
The problem was using .loc[] on these sparse dataframes. whoops
since apparently it's deprecated, or about to be deprecated, idk
most of these warnings were just pd.DataFrame.append() being
deprecated and replaced with pd.concat()
looks like it's a tiny floating-point thing -- probably an artifact
of working here on a new operating system, on a new python version,
a new pandas version, a new biom version, etc. shouldn't make a
noticeable difference
IIRC something about how our specific altair version works makes it
incompatible with python 3.10. let's test that here -- if needed,
we can update the README to disallow python versions >= 3.10. (And
then we can look into removing the altair pin when absolutely needed.)
@codecov-commenter
Copy link

codecov-commenter commented Jul 5, 2022

Codecov Report

Merging #322 (994cc18) into master (005cc44) will decrease coverage by 9.56%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##           master     #322      +/-   ##
==========================================
- Coverage   95.09%   85.53%   -9.57%     
==========================================
  Files          19       19              
  Lines        1408     1417       +9     
  Branches      234      237       +3     
==========================================
- Hits         1339     1212     -127     
- Misses         57      193     +136     
  Partials       12       12              
Impacted Files Coverage Δ
qurro/_json_utils.py 92.38% <ø> (ø)
qurro/_metadata_utils.py 87.87% <ø> (-6.07%) ⬇️
qurro/generate.py 97.56% <ø> (ø)
qurro/q2/_visualizers.py 0.00% <ø> (-100.00%) ⬇️
qurro/qarcoal.py 0.00% <0.00%> (-100.00%) ⬇️
qurro/scripts/_plot.py 88.23% <ø> (ø)
qurro/__init__.py 100.00% <100.00%> (ø)
qurro/_df_utils.py 97.33% <100.00%> (-0.06%) ⬇️
qurro/_rank_utils.py 100.00% <100.00%> (ø)
qurro/support_files/js/display.js 95.86% <100.00%> (+0.08%) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2509145...994cc18. Read the comment docs.

@gibsramen
Copy link
Collaborator

  • as an extra difficulty spike, some of the dependencies of these notebooks are now incompatible with the sort of conda environments we set up (e.g. Songbird explicitly requires a pandas version < 1). There are various ways to get around this, ranging from "convert the notebooks to markdowns and call it a day", "split into separate notebooks", etc; let's see what's easiest.

Honestly I think we should just not run the notebooks and keep them static. Dependencies issues are going to accumulate and I don't think we should be expected to maintain them continually specifically for notebooks. Maybe just add a note that dependencies may have changed since creation. (Willing to change my mind on this, though.)

@fedarko fedarko changed the title [WIP] Update Qurro to support pandas v1 and up, and thus newer versions of QIIME 2 Update Qurro to support pandas v1 and up, and thus newer versions of QIIME 2 Jul 6, 2022
fedarko added 2 commits July 5, 2022 21:53
since i thiiiink these versions mighta worked with the pandas >= 1
syntax

that being said, i don't think it makes sense to devote time/energy
to officially supporting them; just wanna check
@fedarko
Copy link
Collaborator Author

fedarko commented Jul 6, 2022

Honestly I think we should just not run the notebooks and keep them static. Dependencies issues are going to accumulate and I don't think we should be expected to maintain them continually specifically for notebooks. Maybe just add a note that dependencies may have changed since creation. (Willing to change my mind on this, though.)

That's fair -- this is definitely the more sustainable way to handle this :) That being said, I figure things are pretty stable for the time being: after looking into things, the bulk of the problems wound up coming from Songbird now being incompatible with Qurro's environments.

In my opinion, supporting the combo of Songbird + Qurro is important, so I took some time and updated the "Red Sea" notebook to explain how to run Songbird and Qurro in separate conda environments. That fixed most of the problems (the other main thing was updating the ALDEx2 notebook, which wasn't too bad); the notebooks are now all up-to-date (and, as a nice bonus, now the users don't see all of those pandas warnings about SparseDataFrame being deprecated...) However, if things break horribly in a month, ... then I'll probably just add a note, like you suggest.

This PR should now be ready for review / merging.

fedarko added 5 commits July 5, 2022 22:04
Looks like the tests themselves pass for these versions, but the
style-checking with black fails due to incompatibility with click.

yeah this is enough for me to not bother supporting these versions
imo
the apocalypse came and all i got was this pull request
@fedarko fedarko merged commit 59d0d65 into biocore:master Oct 20, 2022
@fedarko fedarko mentioned this pull request Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants