Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the way we handle output from worker processes. #219

Closed
wants to merge 0 commits into from

Conversation

CensoredUsername
Copy link
Owner

Fixes #206 . Putting this here first to run tests.

@CensoredUsername
Copy link
Owner Author

@madeddy this is what I was referring to #204 . No more printlock, no multiprocess management required, just a single Pool.imap call.

Amusingly, it seems to run much faster on my machine now.

@CensoredUsername
Copy link
Owner Author

It doesn't incorporate all your changes yet to the end report, would you mind seeing if you could build those on top of this structure instead?

@madeddy
Copy link
Contributor

madeddy commented Apr 18, 2024

this is what I was referring to #204

Ok. Till you mentioned context obj, i never came across the concept and did try to read up on it and understand it. Sources or code examples however hard to find to my experience, at least for python.

What you did is somewhat how i guessed. That's at least something. So, in the end we write to attrs of a class and drag the instance along the process we're in. Pretty simple if you see it finally. How are the context instances closed after everything? With the process-end or with the end of the whole pool?

seems to run much faster

They write imap is lazier but its faster for you? Now that sounds funny. Lazy means usually slow in my book.
A thought about speed and things: I read you should use Threads for I/O bound tasks and Processes for CPU bound. Isn't unrpyc with all the file read/write stuff more I/O bound or is decompiling really so much CPU heavy? Its hard for me to measure as i run a PCIe4 NVME.

would you mind seeing if you could build those on top

Sure. No problem. I do the PRs then on this branch. Or? You have a PR open for this so should i wait till you merge it?

... Argh! This means i need to adjust the translate PR after this here.

@CensoredUsername
Copy link
Owner Author

CensoredUsername commented Apr 19, 2024

How are the context instances closed after everything? With the process-end or with the end of the whole pool?

It's just a normal python object, no closing necessary. In this case I only create them inside the new process, and at the end of the process, they get returned (containing both log information and result information).

They write imap is lazier but its faster for you? Now that sounds funny. Lazy means usually slow in my book.

Lazy in coding terms doesn't imply slow. It implies only doing the work when it's needed, rather than doing it all first and then proceeding.

In this case, that isn't really why I changed it. You can change it back to map and it'd work fine, but then you'd have the tool being quiet while working on things, and then dumping all output at once near the end. That's not really great UX. imap with a queue size of 1 allows me to print output on the main thread as soon as a single file is done.

A thought about speed and things: I read you should use Threads for I/O bound tasks and Processes for CPU bound. Isn't unrpyc with all the file read/write stuff more I/O bound or is decompiling really so much CPU heavy? Its hard for me to measure as i run a PCIe4 NVME.

unrpyc is completely and utterly CPU bound. The amount of files we read and write is tiny, and we do a lot of complex processing on the data, in python. Which'll always be CPU heavy by virtue of being python.

Sure. No problem. I do the PRs then on this branch.

I think you did that already, but yeah that's fine.

@madeddy
Copy link
Contributor

madeddy commented Apr 20, 2024

OK. Thanks for explaining. This CPU/IO bound question tickled me for some while, but i couldn't wrap my head around what we have here.

About imap - i think its better with it and the return of results if they're ready and not like withmap, is perhaps the reason for the speed gain as you mentioned.

I think you did that already

Yeah, lol. Small mistake. Just after the question i theorized my PR+merge to the branch should be auto-added to your PR to dev - bulls eye.

@CensoredUsername
Copy link
Owner Author

I'm happy with this, gonna clean the history a bit and merge it into dev.

@CensoredUsername
Copy link
Owner Author

This was merged manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants