-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GenStage and Ecto streams? #150
Comments
Temporary answer: if you have easily parallelizable work, consider using
Repo.Stream with Task.async_stream.
I will expand on GenStage and Ecto tomorrow.
--
*José Valim*
www.plataformatec.com.br
Skype: jv.ptec
Founder and Director of R&D
|
Thanks for the quick response. Unfortunately, GenStage fits our use case better than |
@mtwilliams unfortunately Repo.stream won't work with GenStage.from_enumerable because both need the inbox to work. The fact those can't work together is one of the reasons we ended-up creating GenStage anyway, so the best would be if Postgrex support GenStage directly in the connection. Outside of that there is not much GenStage itself can do. For now you will have to use any of the alternative solutions we have mentioned here. |
I have made a module that hacks an ecto stream into a genstage producer. Easy to use and seems quite performant. It also respects the demanded amount and won't send more than demanded, and also waits until more items have been demanded. https://gist.github.com/narrowtux/286666711864246d3dbb6859dda0d694 Might be useful for anybody that's stumbling upon this issue in the future. |
@narrowtux nice that you got it working! With this approach the forwarder will fetch There is work in progress to support transactions over multiple callbacks during a process' lifecycle (e.g begin in |
Oh that’s great news. Do you think that my module does not actually fetch a
lot of rows at a time from the DB? Posted this here anyway because I wasn’t
able to get the code snippets to work as-is
James Fish <[email protected]> schrieb am Mo. 9. Okt. 2017 um 18:57:
… @narrowtux <https://github.com/narrowtux> nice that you got it working!
With this approach the forwarder will fetch max_rows at a time, and then
push those based on demand. So even though its sending what is demanded, it
will always fetch 500 (by default) from the database. There is work in
progress to support transactions over multiple callbacks during a process'
lifecycle (e.g begin in init/1, commit in terminate/2) and support lower
level access to the cursors so that each fetch would only fetch the
number of demanded rows at a time. elixir-ecto/postgrex#321
<elixir-ecto/postgrex#321> xerions/mariaex#196
<xerions/mariaex#196>. It has stalled on my end
but I hope to return to it soon.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#150 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAlpV9U-HLalxgaeGC-WzQ9At2ElqzO7ks5sqlB_gaJpZM4MNwlq>
.
|
Your module will fetch 500 rows (or |
Ah ok. Just scanned your response earlier and wasn't sure whose code you were talking about :) |
I edited my previous comment so there are 2 paragraphs to make it clearer |
Any update on a good approach for this problem? I am currently dealing with the same issue. |
I have found https://github.com/mtwilliams/bourne |
@sobolevn I wrote it exactly for this use case. |
During my work, I've discovered an "impedance mismatch" between
GenStage
andEcto
. Specifically, we drive a lot work usingEcto.Stream
and its lower-level equivalent:We use the above pattern a lot. Unfortunately, Ecto requires streams to be run inside a transaction, thus making
GenStage.from_enumerable/2
unusable.To work around this, we spawn a forwarding process that sends chunks of events whenever our GenStage producer requests them:
We also tried to write our own producer that reduces streams in a transaction in a similar fashion to
GenStage.Streamer
. It didn't work because – as far as I could tell – the continuations reuse the connection from the first transaction?While the aforementioned hack works, it is sub-optimal.
Do you see any way
GenStage.Streamer
can support such a use case through some sort of generalized functionality?If not, should
Ecto
or another library provide a GenStage producer that produces events from a query?The text was updated successfully, but these errors were encountered: