-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is streamz not maintained anymore? What happened to cuStreamz? #476
Comments
Yeah, I'm also wondering about this. I'd be happy to help with this, I actually like the framework very much. |
Let's update it then I guess. |
Hi! maintainer here. Thanks for the interest. @rjzamora , who at nvidia can speak to the status of cuStreamz? |
I think custreamz is still maintained within the cudf repo, but I don’t think the project is very active. @jdye64 can probably provide a more accurate status :) |
Hi. Panel and HoloViz ecosystem maintainer here. I'm on constant doubt on whether I/ we should still be recommending/ promoting examples with Streamz library in our docs. For example here https://panel.holoviz.org/reference/panes/Streamz.html and here https://holoviews.org/user_guide/Streaming_Data.html. When trying to use Streamz for my work related use cases I find it hard as its not a very active library. And most of the functionality (if not all?) of Streamz can now be replaced by Param Reactive Expressions and Param Generators for which I can find support in the community. Any thoughts here? Would it be better for the Python ecosystem in general to continue using/ recommending Streamz? Or is it a project that should not be used any more? In holoviz/panel#6980 I suggest recommending to use Panel/ Param native features instead of Streamz. But that is my personal recommendation based on my experience using Streamz. |
Yes.
Is that so? I do not want to go off-topic here, thus I will put the code in spoiler-boxes but I think Param only covers a subset of Streamz. I did not test the following code, might very well contain bugs (just quickly drafted something that came to mind where the two libraries would be similar). If we were to build some reactive updating mechanism to dynamically change a parameter in a visualization task, then an implementation using
Whereas one might achieve something similar with
In both cases, the crucial behavior to achieve would be having "something dynamic that gets updated throughout the process". This could be dynamic/reactive parameters using However, at least in my understanding @MarcSkovMadsen, both libraries also differ a lot. In my opinion, Moreover, I lack to see where reactive behavior, parameter validation, lazy evaluation, ... would be available in |
I think @Daniel451 's outline comparison of the libraries if fair. As far as this library goes, indeed there has been no development, and there are no plans, as I said above. It always seemed like a great idea, but without quite enough of a demonstrative use case to get users interested. If your end use case is really connecting with panel or other viz/frontend, maybe param makes a lot of sense. If you actually want to do general purpose async event management with complex branching/aggregation logic in python, maybe you would end up writing something like streamz specialised to the particular use case. |
Thx for taking the time @Daniel451. I also think your comparison is very fair. For completenes I was thinking about and referring to `pn.rx`import numpy as np
import matplotlib.pyplot as plt
import panel as pn
source = pn.rx(1) # same as param.rx(1)
def update_plot(frequency):
fig, ax = plt.subplots()
t = np.linspace(0, 1, 100)
y = np.sin(2 * np.pi * frequency * t)
ax.plot(t, y)
plt.close(fig)
return fig
plot_stream = source.rx.pipe(update_plot)
def set_frequency(event):
source.rx.value = event.new
slider = pn.widgets.FloatSlider(name='Frequency', start=0.1, end=10, value=1)
slider.param.watch(set_frequency, 'value')
pn.Column(plot_stream, slider).servable() Here is an alternative version that is more `pn.rx` - more .rx likeimport numpy as np
import matplotlib.pyplot as plt
import panel as pn
slider = pn.widgets.FloatSlider(name='Frequency', start=0.1, end=10, value=1)
frequency = slider.rx()
t = np.linspace(0, 1, 100)
y = np.sin(2 * np.pi * frequency * t)
def update_plot(y):
fig, ax = plt.subplots()
ax.plot(t, y)
plt.close(fig)
return fig
plot_rx = pn.rx(update_plot)
plot_stream = plot_rx(y) # could also have been y.rx.pipe(update_plot)
pn.panel(plot_stream).servable() In my view What is especially hard for me to understand is whether Streamz provides any kind of performance optimizations built in? For example when displaying rolling windows of a DataFrame and streaming one row at a time. That would be a big advantage of Streamz over DIY via For completenes. Here is a streaming version based on a generator function. `pn.rx` - streaming generatorimport numpy as np
import matplotlib.pyplot as plt
from time import sleep
import panel as pn
def source():
while True:
for i in range(0,10):
yield i
sleep(0.25)
frequency = pn.rx(source)
t = np.linspace(0, 1, 100)
y = np.sin(2 * np.pi * frequency * t)
def update_plot(y):
fig, ax = plt.subplots()
ax.plot(t, y)
plt.close(fig)
return fig
plot_rx = pn.rx(update_plot)
plot_stream = plot_rx(y) # could also have been y.rx.pipe(update_plot)
pn.panel(plot_stream).servable() |
The plotting itself is handled by the panel/hv stack, so streamz only passes the data on. Streamz can handle stateful dataframes, dataframe updates and per-row also; and hv allows for the updating of in-browser data one row at a time without replacing previous data. You should get decent performance either way (within what is possible from the jupyter comm/websocket implementation). |
Rx may be useful for GUI apps, but I don't think streamz can be replaced for complex data pipelines, especially because it can scale with dask. What's also important when going distributed is that data should not be updaded in-place, since tasks may have to be re-tried. For example, streamz' accumulation node lets you account for that by letting you specify how the state is updated by giving you access to both What's also very nice is how you can call I use streamz pretty much every day, so it's a shame that it's not maintained more actively. While it has grouping by keys in the partition node, it should also support grouping in all other nodes that hold a state. E.g. here's an example of how I've adjusted streamz' accumulation node locally: OB_latest_state = (
agg_deltas
.accumulate(
update_book_state,
start={},
groupby=symbol_grouper_fn, # <-- custom grouper
)
.map(OB_record_to_dataset)
) Another nice thing to add could be metrics per node, so it's easier to spot where the backpressure is coming from. |
As the title suggests I would be interested in knowing why there is no active development happening anymore? Is streamz deprecated? Not maintained anymore?
Also, does anyone know what happened to cuStreamz? (NVIDIA's GPU-accelerated streamz fork within their rapids.ai framework)
The text was updated successfully, but these errors were encountered: