Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: handle DmaFile::read_at called with a size larger than max_sectors_size #635

Closed
wants to merge 1 commit into from

Conversation

ozgrakkurt
Copy link

@ozgrakkurt ozgrakkurt commented Jan 5, 2024

What does this PR do?

This PR tries to prevent the case of executing DmaFile::read_at with a larger size than DmaFile::max_sectors_size. Because this causes the process to hang forever in D state in some kernels (e.g. 5.15)and just hang forever in interruptible state in other kernels. (e.g. 6.1)

Motivation

What inspired you to submit this pull request?

I have this case where I read pages of parquet files from disk in parallel and this problem causes my program to hang. I am able to mitigate this issue by just splitting my reads before calling read_at/read_many but this is still a problem since it took a lot of time to debug and it causes entire process to hang instead of just returning an error/splitting the reads internally. Imo there should be an error if read_at is called like this and read_many should split the big reads internally.

Related issues

A list of issues either fixed, containing architectural discussions, otherwise relevant
for this Pull Request.

#633

Additional Notes

Anything else we should know when reviewing?

I want to also implement the internal splitting inside read_many if it makes sense. Also can implement internal splitting for read_at as well if error doesn't make sense. Also want to do the same stuff for write functions, I am guessing the same thing happens with those as well.

Checklist

[] I have added unit tests to the code I am submitting
[] My unit tests cover both failure and success scenarios
[] If applicable, I have discussed my architecture

@ozgrakkurt
Copy link
Author

Also is it at least possible to make pub fn max_sectors_size() -> usize on both DmaFile and ImmutableFile ?

@glommer
Copy link
Collaborator

glommer commented Jan 6, 2024

There's something going on here. I don't think anything is supposed to hang when I/O is too large, and this smells a bit like a bug (where, is the question)

@glommer
Copy link
Collaborator

glommer commented Jan 6, 2024

FWIW, that file on /sys is read-write, so it doesn't really mean any intrinsic I/O size limit from the perspective of the filesystem. The kernel should be able to split transfers larger than that. Have you tried running your example in different kernels?

@ozgrakkurt
Copy link
Author

FWIW, that file on /sys is read-write, so it doesn't really mean any intrinsic I/O size limit from the perspective of the filesystem. The kernel should be able to split transfers larger than that. Have you tried running your example in different kernels?

I ran with 5.15, 6.1 and 6.2. On 5.15 and 6.2, the process goes into D state and hangs forever in uninterruptible state. On 6.1 it hangs forever but can be shut down with ctrl+c. I'll try newest kernel as well. Which kernel do you normally run on/test?

@ozgrakkurt
Copy link
Author

ozgrakkurt commented Jan 6, 2024

FWIW, that file on /sys is read-write, so it doesn't really mean any intrinsic I/O size limit from the perspective of the filesystem. The kernel should be able to split transfers larger than that. Have you tried running your example in different kernels?

I ran with 5.15, 6.1 and 6.2. On 5.15 and 6.2, the process goes into D state and hangs forever in uninterruptible state. On 6.1 it hangs forever but can be shut down with ctrl+c. I'll try newest kernel as well. Which kernel do you normally run on/test?

actually nevermind. It hangs in D state if compiled with --release flag but can be exited with ctrl-c if compiled in dev mode not sure, it is annoying

@ozgrakkurt
Copy link
Author

It does work on kernel 6.6.10-zabbly+ installed on Debian so maybe can close this

@ozgrakkurt
Copy link
Author

closing this as it seems to be a kernel bug and the hanging doesn't happen on newer kernel (6.6) for me

@ozgrakkurt ozgrakkurt closed this Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants