Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poll_* methods to support custom futures implementations #78

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

dgrr
Copy link

@dgrr dgrr commented May 29, 2024

No description provided.

@dgrr
Copy link
Author

dgrr commented May 29, 2024

I'll implement the poll_* methods for the WebSocketRead & WebSocketWrite

@dgrr
Copy link
Author

dgrr commented May 29, 2024

btw, if you implement Stream + Sink from futures you can call split. It's difficult for Stream to return a Frame<'_>, or maybe impossible

Cargo.toml Outdated

[features]
default = ["simd"]
simd = ["simdutf8/aarch64_neon"]
upgrade = ["hyper", "pin-project", "base64", "sha1", "hyper-util", "http-body-util"]
unstable-split = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional to remove unstable-split? I would like to keep it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can put it back in

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, the question is, why is it unstable?

@matheus23
Copy link
Contributor

matheus23 commented Jun 20, 2024

Would love to see this land. This library would become a great alternative to tokio-tungstenite and tokio-websockets in that case. Especially since it supports true, lock-free stream splitting.

@matheus23
Copy link
Contributor

FWIW, it'd be nice to have the same poll_ treatment for FragmentCollectorRead 👀 😁

@matheus23
Copy link
Contributor

matheus23 commented Jun 24, 2024

btw, if you implement Stream + Sink from futures you can call split. It's difficult for Stream to return a Frame<'_>, or maybe impossible

This is true, but internally it means it'll simply create two references to the same thing using a BiLock. So you end up locking every time you use either the Stream or Sink.

Having a "true" split implementation that is lock free would be a lot cooler! :)


Also, while it's impossible to write an implementation of Stream<Item = Frame<'???>>, it's possible to map the Frame<'f> into your own domain-specific application struct that doesn't have a lifetime, e.g. by parsing binary and text frames into your own types, before returning them in your stream implementation as Items.

Just spelling this out for anyone reading this ✌️

@dgrr
Copy link
Author

dgrr commented Jun 24, 2024

Having a "true" split implementation that is lock free would be a lot cooler! :)

How is this not lock free? If you use tokio::io::split it will use a Mutex internally.

Also, while it's impossible to write an implementation of Stream<Item = Frame<'???>>, it's possible to map the Frame<'f> into your own domain-specific application struct that doesn't have a lifetime, e.g. by parsing binary and text frames into your own types, before returning them in your stream implementation as Items.

Yes, that's what I do mostly. Because I want my code to be compatible with futures::Stream and Sink. I shouldn't but I had to code it in a bit of a rush 👀

@matheus23
Copy link
Contributor

Having a "true" split implementation that is lock free would be a lot cooler! :)

How is this not lock free? If you use tokio::io::split it will use a Mutex internally.

Yeah no I'm agreeing with you and that's exactly what I'm trying to say.
You wrote "if you implement Stream + Sink from futures you can call split.", but that gives you worse performance than having the stream and sink already split from the start ✌️ (which this PR enables/makes easier!)

@dgrr
Copy link
Author

dgrr commented Jun 24, 2024

Yeah no I'm agreeing with you and that's exactly what I'm trying to say. You wrote "if you implement Stream + Sink from futures you can call split.", but that gives you worse performance than having the stream and sink already split from the start ✌️ (which this PR enables/makes easier!)

Ah yes, but it is mainly for compatibility because sometimes it might be useful to use futures::StreamExt. Anyways, the user can implement it on their own, given that Frame<'_> cannot be returned as the Item in futures::Stream

@dgrr
Copy link
Author

dgrr commented Jun 24, 2024

@matheus23 about poll in FragmentCollectorRead, what would you expect? Do you expect a callback also or that it returns early with the mandatory reply frame?

@matheus23
Copy link
Contributor

Personally, I'd highly prefer returning the mandatory frame :)

@dgrr
Copy link
Author

dgrr commented Jun 24, 2024

Personally, I'd highly prefer returning the mandatory frame :)

Indeed. Me too

Copy link
Member

@littledivy littledivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and sorry for slow replies, I hope you don't mind me taking some time here as this is a relatively big change :)

It seems there is a regression in echo_server benchmark with larger payloads:

# this pr
$ ./load_test 10 0.0.0.0 8080 0 0 102400
Msg/sec: 35997.750000
Msg/sec: 35021.000000

# main
$ ./load_test 10 0.0.0.0 8080 0 0 102400
Msg/sec: 42045.750000
Msg/sec: 42146.500000

(measured on an M1 macbook; similar numbers on x64 Linux server)

@matheus23
Copy link
Contributor

Just as an FYI, I've been trying to get the benchmarks to compile & run here on my NixOS to help diagnose the regression, but I'm still fighting my way through with the linker. Will of course post once I got something.

@dgrr
Copy link
Author

dgrr commented Jun 30, 2024

Just as an FYI, I've been trying to get the benchmarks to compile & run here on my NixOS to help diagnose the regression, but I'm still fighting my way through with the linker. Will of course post once I got something.

I managed to reproduce the benchmarks, not in macos in Linux, but I cannot find where the regression is

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
Co-authored-by: Conrad Ludgate <[email protected]>
src/lib.rs Outdated
@@ -197,7 +259,7 @@ pub(crate) struct WriteHalf {
vectored: bool,
auto_apply_mask: bool,
writev_threshold: usize,
write_buffer: Vec<u8>,
buffer: BytesMut,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Vec is faster in my testing. To replicate buffer.advance(written) you can do buffer.splice(..written, [0u8; 0]). Because this is expensive what I found is better is to instead have a buf_pos: usize that I increment, and only in start_send do I run

self.buffer.splice(..self.buf_pos, [0u8; 0]);
self.buf_pos = 0;

before the fmt_head call to write the frame into the buffer. This is because BytesMut::advance is kinda expensive it seems.

@conradludgate
Copy link

conradludgate commented Jul 1, 2024

Further testing shows that, for large buffers, removing the vectored write support is a significant source of the regressions. Now the difference it just 43500 Msg/sec compared to 44000 Msg/sec

Opened a PR against this PR 😅 dgrr#1

@dgrr
Copy link
Author

dgrr commented Jul 6, 2024

@littledivy it seems that this PR is ok now thanks to the contributions of @conradludgate

@ShabbirHasan1
Copy link

@littledivy All checks have passed successfully 🚀. Could you please review and merge this PR at your convenience 🙏.

Thank you @dgrr and @conradludgate for your kind contributions in making this library even better. 🫡

@dgrr
Copy link
Author

dgrr commented Sep 1, 2024

@littledivy

github-merge-queue bot pushed a commit to n0-computer/iroh that referenced this pull request Dec 4, 2024
)

## Description

Before we used to depend on both tungstenite version 0.21 as well as
0.24, because:
```
tungstenite v0.21.0
└── tokio-tungstenite v0.21.0
    └── tokio-tungstenite-wasm v0.3.1
        ├── iroh v0.29.0 (/home/philipp/program/work/iroh/iroh)
        └── iroh-relay v0.29.0 (/home/philipp/program/work/iroh/iroh-relay)
            ├── iroh v0.29.0 (/home/philipp/program/work/iroh/iroh)
            └── iroh-net-report v0.29.0 (/home/philipp/program/work/iroh/iroh-net-report)
                └── iroh v0.29.0 (/home/philipp/program/work/iroh/iroh)
tungstenite v0.24.0
└── tokio-tungstenite v0.24.0
    ├── iroh v0.29.0 (/home/philipp/program/work/iroh/iroh)
    └── iroh-relay v0.29.0 (/home/philipp/program/work/iroh/iroh-relay)
        ├── iroh v0.29.0 (/home/philipp/program/work/iroh/iroh)
        └── iroh-net-report v0.29.0 (/home/philipp/program/work/iroh/iroh-net-report)
            └── iroh v0.29.0 (/home/philipp/program/work/iroh/iroh)
```

Basically, `tokio-tungstenite-wasm` pulls in `0.21` and there's no newer
version of it yet.
But we updated all our dependencies including `tungstenite`, duplicating
it.

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

I want this to be temporary until we can finally switch to
`fasterwebsockets` entirely once it implements [`poll`-based
methods](denoland/fastwebsockets#78) (but I
worry the project's maintenance is ... unclear).

I checked the [tungstenite
changelog](https://github.com/snapview/tungstenite-rs/blob/master/CHANGELOG.md),
and it doesn't look like there's anything critical in there. The
`rustls` update doesn't affect us - we don't duplicate rustls versions
after this rollback.

## Change checklist

- [x] Self-review.
- [x] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- ~~[ ] Tests if relevant.~~
- [x] All breaking changes documented.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants