Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML categories on crates.io and automatically collecting crates from registry #120

Open
elpiel opened this issue Jul 19, 2022 · 1 comment

Comments

@elpiel
Copy link
Contributor

elpiel commented Jul 19, 2022

Last year I've added a few categories to crates.io related to aerospace (rust-lang/crates.io#4105). I think it's a feature that's not used too much and you can see that there aren't a lot of categories added anyway.

  • What categories could we add to crates.io for ML? Maybe the ones that are defined on the website?
  • Would you consider building the crates lists automatically using provided categories?
    While this will take some time (for crates to add these categories and release a new version) I think it would help a lot with maintaining this list, crates.io categories feature and the community at large.
    There are a few caveats that we need to be careful with:
    • Malicious crates
    • Crates that belong to certain category but haven't listed it

In this line of thought, I can suggest having a whiltelist & blacklist per category where we can add addition or exclude given categories.

We're looking to implement this approach for https://areweinspaceyet.org which will become the Aerorust website and current WIP of the new website can be found at https://github.com/AeroRust/AeroRust.github.io/

@elpiel elpiel changed the title Categories on crates.io and automatically collecting crates from registry ML Categories on crates.io and automatically collecting crates from registry Jul 19, 2022
@elpiel elpiel changed the title ML Categories on crates.io and automatically collecting crates from registry ML categories on crates.io and automatically collecting crates from registry Jul 19, 2022
@anowell
Copy link
Owner

anowell commented Jul 19, 2022

What categories could we add to crates.io for ML? Maybe the ones that are defined on the website?

I'm not sure. I don't know if we even have the right set of categories currently (though, it has been a couple years since the last suggestion to tweak the categories). I wonder if the categorization will change a bit as the ecosystem matures. And would it even make sense for crates.io to create some of the more specific categories. Consider "data structures" vs "ML data structures" or "Science" vs "Scientific Computing", or might there be GPU programming crates that are specific to games vs ML vs other.

I'd suggest starting with categories that are clearly ML-specific and well-defined like Neural Networks and/or NLP.

Would you consider building the crates lists automatically using provided categories?

yes. But first and foremost, I think AWLY should prioritize finding the best way to organize and surface crates in the ML ecosystem over aligning with crates.io categorization (the latter being good if not at the expense of the former).

To that end, if such crates.io categories were created, it'd be fairly easy to fetch crates from a category. For the implementation, I'd want to see:

  • continue to support topics that don't have a crates.io category
  • continue to support crates that are in crates.yaml (regardless of whether or not the topic maps to a crates.io category)
  • continue to override fields based on crates.yaml (as sometimes the crates.io metadata is wrong or missing and requires publishing a new version to update)
  • cache crates.io results, and avoid re-querying for either the category list or individual crates unless the clean task is run. Bonus here is that clean runs might be faster if enough of the crates are populated from the category query.

I think initially some category map file could also blacklist crates, but if that list grows large or needs updated regularly, I think we'd want to consider some heuristic to filter on instead. Being an ML ecosystem, it would be awesome if we had a classifier that could help filter crates, but I'll refrain from over-engineering a solution to a problem we don't yet have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants