Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] shared yarnPackages.nix to reduce evaluation overhead #83499

Open
ghost opened this issue Mar 27, 2020 · 14 comments
Open

[discussion] shared yarnPackages.nix to reduce evaluation overhead #83499

ghost opened this issue Mar 27, 2020 · 14 comments
Labels
0.kind: enhancement Add something new 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: nodejs

Comments

@ghost
Copy link

ghost commented Mar 27, 2020

As @Mic92 described in #78810, the current approach of generating a yarn.nix file for every application is not scalable. We should try to make yarn2nix work with a shared yarnPackages.nix file, while preserving the exact result.
We currently have the following yarn.nix files in nixpkgs:

140473 ./pkgs/development/tools/yarn2nix-moretea/yarn2nix/yarn.nix
444016 ./pkgs/servers/web-apps/codimd/yarn.nix
36480 ./pkgs/applications/networking/instant-messengers/riot/riot-desktop-yarndeps.nix

I will also take into account the mastodon package which is yet to be merged:

445678 pkgs/servers/mastodon/yarn.nix

Now I did some further analysis. All yarn.nix files have a total of 3534 entries, of which 1093 could be saved by merging the files (these are not unique). Assuming all entries are about equal in size, this would shrink the definitions from 1066647 bytes to 736753 bytes, which is a decrease of 31%.

@ghost ghost added 0.kind: enhancement Add something new 6.topic: nodejs labels Mar 27, 2020
@Mic92
Copy link
Member

Mic92 commented Mar 27, 2020

Adding yarn support to node2nix would be one solution: svanderburg/node2nix#28
Than we can also share the information with npm packages as well, while preserving the correct dependency tree that yarn provides.
cc @svanderburg what do you think about that?

@Mic92
Copy link
Member

Mic92 commented Mar 27, 2020

Also cc @moretea

@Mic92
Copy link
Member

Mic92 commented Mar 27, 2020

I would assume the potential sharing when including all node packages would be even bigger compared to what found here.

@ghost
Copy link
Author

ghost commented Mar 27, 2020

This is a nice idea. It would further reduce the number of entries from 7985 (nodePackages + yarnPackages) to 6567 (~-18%)

@ghost
Copy link
Author

ghost commented Mar 27, 2020

One doubt I have is the time it would take to regenerate this shared file. yarn2nix can take the yarn.lock file and extract most information from it, including the hashes, so it does not have to fetch everything. When trying to add things to nodePackages, it took a very long time to fetch all packages.

Would it be possible to write a utility that can use package-lock.json and yarn.lock files to generate this combined file in a small amount of time?

@Mic92
Copy link
Member

Mic92 commented Mar 27, 2020

I suppose this information could be used in node2nix as well. Ideally we would not need to commit yarn.lock files to keep the repository smaller. The time it takes to regenerate the shared file is indeed a problem. However for regular users I would prefer if node2nix would implement its own cache to speed up the process. This would than work also for normal npm packages.

@svanderburg
Copy link
Member

So if I understand the topic of this discussion correctly, the goal is to reduce the overhead of yarn packages by allowing them to share common dependencies? and that node2nix could potentially solve this problem?

About using node2nix to generate expressions for yarn packages: I believe this (in theory) is very well possible because from a conceptual point of view the package managers are very similar, e.g. they use the same package.json format and the same dependency concepts. Furthermore, both package managers (although yarn was clearly the pioneer in this area) support lock files.

To get yarn support, we need to do two things. The easy part is that we can create a node2nix sub module that interprets yarn lock files, that probably has a very similar structure to the NPM lock file generation. The second thing we need is a build environment that runs yarn package manager, so that possible build scripts can be executed. This currently is very hard to do, because the nodeEnv implementation is very complicated and needs to be simplified/split. This is something I'm currently investigating, but that will take a bit of time to complete.

Although it is theoretically possible to use node2nix for yarn deployments, there is another important reason why yarn2nix exits -- in addition to making it possible to deploy yarn packages from Nix, it also works entirely lock file driven.

node2nix, on the contrary, is also a generator that has its own implementation of NPM's dependency resolution algorithm to work with projects and direct installation from the NPM registry that lack lock files. With the generator you get all kinds of nice things in return (such as convenient end-user package deployments from the NPM registry), but it makes the integration process very complicated and time consuming.

The advantage that yarn2nix users see is that no generation is needed -- you can directly use all relevant deployment properties from a lock file that already exists making the process very simple.

But there are limitations to this process obviously -- some dependencies from yarn lock files cannot be used out of the box, because they lack the information that Nix needs, such as the output hashes of Git checkouts, and tarballs that can be downloaded from external HTTP/HTTPS sites.

Furthermore, as far as I know, yarn2nix also does not run yarn in a derivation. This means that more complex projects that need build script will not run out of the box.

I had quite a few discussions with a variety of people about this in the past and it seems that the preferences among Nix users differ -- some people prefer accuracy and accept the slow generation process, others can live with the yarn2nix limitations and prefer the speed/integration simplicity.

That's also an important reason why both generators exist, and (for example) why I created node2nix as a new tool, rather than resuming npm2nix that is now completely abandoned.

@Mic92
Copy link
Member

Mic92 commented Apr 15, 2020

Another case where could need shared expressions: #84189

@calbrecht
Copy link
Member

Maybe we could write NixOs integration plugins for the next version of yarn

@happy-river
Copy link
Contributor

happy-river commented Aug 13, 2020

My PR for Mastodon, #78810, would add to the number of duplicated yarn and ruby bundler packages in Nixpkgs. To understand the size of the problem, I wrote a little tool that looks for automatically generated yarn and ruby packages in a directory tree and counts the duplicates (based on hashes): https://github.com/happy-river/dedup

This doesn't count duplicates in the output of node2nix, but it could be made to do that. It doesn't print out the unique definitions it finds as Nix code, but, as I have written a function that renders a Nix parse tree as Nix code, it's not far away from being able to. So it's feasible that I could make this into a postprocessor for yarn2nix, bundix and node2nix that maintains shared definitions files and rewrites the individual package yarn.nix, gemset.nix and node-packages.nix to refer to the shared ones.

My questions are:

  • Is having a postprocessor the best approach, or should this be built into yarn2nix, bundix and node2nix?
  • If having a postprocessor is desirable, is Common Lisp the best language to write it in? The intersection of my language skills with the tools I could find to parse Nix meant my easiest path to get something working was to feed the output of tree-sitter's command line tool to the Lisp reader, but I know that's a distinctly uncommon tooling choice. Perhaps someone who can use tree-sitter's Rust bindings should work on this.

@Mic92
Copy link
Member

Mic92 commented Aug 14, 2020

My PR for Mastodon, #78810, would add to the number of duplicated yarn and ruby bundler packages in Nixpkgs. To understand the size of the problem, I wrote a little tool that looks for automatically generated yarn and ruby packages in a directory tree and counts the duplicates (based on hashes): happy-river/dedup

This doesn't count duplicates in the output of node2nix, but it could be made to do that. It doesn't print out the unique definitions it finds as Nix code, but, as I have written a function that renders a Nix parse tree as Nix code, it's not far away from being able to. So it's feasible that I could make this into a postprocessor for yarn2nix, bundix and node2nix that maintains shared definitions files and rewrites the individual package yarn.nix, gemset.nix and node-packages.nix to refer to the shared ones.

My questions are:

* Is having a postprocessor the best approach, or should this be built into `yarn2nix`, `bundix` and `node2nix`?

* If having a postprocessor is desirable, is Common Lisp the best language to write it in? The intersection of my language skills with the tools I could find to parse Nix meant my easiest path to get something working was to feed the output of `tree-sitter`'s command line tool to the Lisp reader, but I know that's a distinctly uncommon tooling choice.  Perhaps someone who can use `tree-sitter`'s Rust bindings should work on this.

It would be indeed hard to get people with common lisp on-board, from my head I only know @7c6f434c who is using it and is also nixpkgs committer. Usually it is a good idea to write the tooling for a specific language in the same language as you can expect people understanding the language ecosystem also understand the language. If you are looking for a decent nix parser in rust, there is https://gitlab.com/jD91mZM2/rnix which is heavily used in the nixpkgs-fmt: https://github.com/nix-community/nixpkgs-fmt

If there was a post-processor would it not need all the information of the fetcher in the first place in order to de-duplicate information? Ideally the de-duplication could happen in the original tool itself because de-duplication would already take place when retrieving this data from external sources i.e. npm. Also it would add another step people/bots have to go through when updating packages.
However most of the upstream maintainers I talked with have little motivation to make their tools scale well for the nixpkgs case and have not implemented this logic as they largely use it for their own out-of-nixpkgs packages.

I am currently not quite sure how your design for a post-processor would look like, could you maybe go a bit more into detail and than we can discuss this further?

@stale
Copy link

stale bot commented Feb 12, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Feb 12, 2021
@Mic92
Copy link
Member

Mic92 commented Feb 12, 2021

Still relevant. Also npm 7 now supports yarn.lock files itself: https://github.blog/2020-10-13-presenting-v7-0-0-of-the-npm-cli/

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Feb 12, 2021
@stale
Copy link

stale bot commented Aug 13, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: enhancement Add something new 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: nodejs
Projects
None yet
Development

No branches or pull requests

4 participants