Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

miri: Enable tests using epoll_wait #19956

Merged
merged 2 commits into from
Jul 14, 2023
Merged

Conversation

def-
Copy link
Contributor

@def- def- commented Jun 14, 2023

Seems to have been implemented recently in rust-lang/miri#2764, I'll work on fixing this up. Probably an alternative to #19950

Based on recent segfaults we definitely want more Miri testing.

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@def- def- force-pushed the pr-epoll-miri branch 5 times, most recently from 1ac4112 to 8a3b731 Compare June 15, 2023 07:47
@def- def- changed the title miri: Try enabling epoll_wait miri: Enable tests using epoll_wait Jun 15, 2023
@def- def- force-pushed the pr-epoll-miri branch 3 times, most recently from 8e3d2d9 to d5b2c0b Compare June 15, 2023 14:15
@@ -926,7 +926,6 @@ mod tests {

#[mz_ore::test(tokio::test(flavor = "multi_thread"))]
#[cfg_attr(coverage, ignore)] // https://github.com/MaterializeInc/materialize/issues/18898
#[cfg_attr(miri, ignore)] // unsupported operation: can't call foreign function `epoll_wait` on OS `linux`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd also be totally reasonble to continue ignoring the persist stuff that actually uses external network, if that's easier or lets us use more aggressive timeouts:

  • file_blob
  • postgres_consensus
  • s3_blob

(off the top of my head, I think everything else under src/persist* uses in-mem impls). if you do, could also delete these env whitelisted above

            --env MZ_PERSIST_EXTERNAL_STORAGE_TEST_S3_BUCKET
            --env MZ_PERSIST_EXTERNAL_STORAGE_TEST_POSTGRES_URL

@def- def- force-pushed the pr-epoll-miri branch 6 times, most recently from 954bccd to d5fd537 Compare June 15, 2023 22:52
@def- def- marked this pull request as ready for review June 16, 2023 07:18
@def- def- requested review from a team June 16, 2023 07:18
@def- def- requested review from benesch, a team and umanwizard as code owners June 16, 2023 07:18
@def- def- requested a review from nrainer-materialize June 16, 2023 07:18
@def-
Copy link
Contributor Author

def- commented Jun 16, 2023

One problem is that this increases the miri runtime to 39 minutes. At least it can run in parallel with the normal build + tests but this will make our normal tests CI slower. If it's not acceptable I can disable some more tests, or have to figure out a way to split this into multiple parts (like testdrive and cloudtest), but there is some overhead in that case since they all have to build the tests again, and use builder agents for that.

The counter argument is that this found quite a few potential issues, so seems valuable. Could also maybe cut down the number of tests here further, and move the full suite to nightly. But I already put in more work than expected into this the last 2 days.

Based on recent segfaults we definitely want more Miri testing.
@def-
Copy link
Contributor Author

def- commented Jul 13, 2023

One problem is that this increases the miri runtime to 39 minutes. At least it can run in parallel with the normal build + tests but this will make our normal tests CI slower. If it's not acceptable I can disable some more tests, or have to figure out a way to split this into multiple parts (like testdrive and cloudtest), but there is some overhead in that case since they all have to build the tests again, and use builder agents for that.

The counter argument is that this found quite a few potential issues, so seems valuable. Could also maybe cut down the number of tests here further, and move the full suite to nightly. But I already put in more work than expected into this the last 2 days.

@philip-stoev @benesch Opinions? With 6 workers it's possible to keep this in normal tests. The main disadvantage is that each worker has to build the tests again. The main disadvantage of moving to nightly is that devs wouldn't notice Miri failures immediately and QA has more to do.

My preference would be a split approach, same as for SLT. Keep a small set of fast Miri running in tests, put the rest into a slow nightly. But I'd like some confirmation before I rework that again. Maybe we're fine with the extra cost of keeping it in tests.

@def- def- requested review from philip-stoev and nrainer-materialize and removed request for nrainer-materialize July 13, 2023 10:15
@philip-stoev
Copy link
Contributor

So you are saying that miri regularily detects regressions and is not one of those CI jobs that always pass without anything of interest? If this is the case, a split approach would be preferable to parallelism of 6 . six is a lot

@def-
Copy link
Contributor Author

def- commented Jul 13, 2023

Not sure how regular, devs fix them before submitting since it's part of tests run currently.

@def-
Copy link
Contributor Author

def- commented Jul 13, 2023

@philip-stoev Implemented. Should be good to go in now. Nightly to verify successfully completed: https://buildkite.com/materialize/nightlies/builds/2817#_

Copy link
Member

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surface changes LGTM, glad we're getting more miri coverage!

@benesch
Copy link
Contributor

benesch commented Jul 13, 2023

My preference would be a split approach, same as for SLT. Keep a small set of fast Miri running in tests, put the rest into a slow nightly.

My preference as well!

@def- def- merged commit 5149d3d into MaterializeInc:main Jul 14, 2023
@def- def- deleted the pr-epoll-miri branch July 14, 2023 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants