Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: replace std Instant with wasm-compatible wrapper #9189

Merged
merged 14 commits into from
Feb 29, 2024

Conversation

waynexia
Copy link
Member

@waynexia waynexia commented Feb 10, 2024

Which issue does this PR close?

Relates to #7651 and #7652

Rationale for this change

std::time::Instant doesn't work on target wasm32-unknown-unknown. It would result in a runtime panic when using it. A detailed explanation can be found in the replacement wrapper crate instant's repository.

It doesn't have a difference when runs on other platforms.

What changes are included in this PR?

Replace all the occurrences of std::time::Instant with instant::Instant.

Are these changes tested?

No. I've used the compiled artifact on wasm32-unknown-unknown but I'm still trying to figure out how to write UT for that artifact.

Are there any user-facing changes?

There is no difference on platforms other than wasm32-unknown-unknown

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Feb 10, 2024
Signed-off-by: Ruihang Xia <[email protected]>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @waynexia

I took a look through the Cargo.lock in my datafusion checkout and it appears that adding instant would be a new dependency. I am always worried when we potentially increase the dependencies of DataFusion

I, would it be possible to add a wasm feature flag to datafusion and only pull in the needed libraries optionally?

Having a wasm feature flag might make it easier to disable other parts of the code that aren't compatible with WASM

For example, we could have something like

# in datafusion/common/src/wasm.rs

/// DataFusion wrapper around std::instant
/// uses std::time::Instant normally, but uses
/// instant::Instant library when targeting web assembly
struct Instant { 
...
}

🤔

It would also really be nice to get some sort of test / way to run wasm to trigger a bug without changes

I wonder if we could extend https://github.com/apache/arrow-datafusion/tree/main/datafusion/wasmtest 🤔

@alamb alamb marked this pull request as draft February 16, 2024 11:43
@alamb
Copy link
Contributor

alamb commented Feb 16, 2024

Marking as draft as I don't think this PR is waiting on review anymore and I am trying to clear the review queue.

@waynexia
Copy link
Member Author

Thanks for your review @alamb 👍

Enclose wasm-specific dependencies and logic with a feature gate looks great to me. I would warp one Instant struct for this case.

@waynexia waynexia marked this pull request as ready for review February 18, 2024 13:49
@waynexia
Copy link
Member Author

I've extracted all the references into datafusion_common and added a target family gateway. Now the implementation of Instant will switch automatically when compiling under wasm32-unknown-unknown.

I also extended wasmtest to include unit tests, which can be run by wasm-pack test. Refer to datafusion/wasmtest/README.md for detailed commands. Noticed that I haven't added CI for it, because the --headless mode has some unknown problem and would timeout in my environment. Once I fix it we can have a CI about wasm by following the link I added in README.md.

The lock file of datafusion-cli still contains instant crate. This seems to be a bug of cargo: rust-lang/cargo#10801. I've tested that it won't be used in actual.

Signed-off-by: Ruihang Xia <[email protected]>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @waynexia -- I think this looks good to me. I wonder if you want to put an "exclude" type lint in to prevent people from accidentally using std::time::Instant

Similar to how @DDtKey did this in https://github.com/apache/arrow-datafusion/blob/c439bc73b6a9ba9efa4c8a9b5d2fb6111e660e74/clippy.toml#L1-L4 / #9318


//! WASM-related utilities

#[cfg(target_family = "wasm")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// Parse SQL (using datafusion-sql)
let sql = "SELECT 2 + 37";
let dialect = GenericDialect {}; // or AnsiDialect, or your own dialect ...
let ast = Parser::parse_sql(&dialect, sql).unwrap();
log(&format!("Parsed SQL: {ast:?}"));
}

#[cfg(test)]
mod test {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 for the test

Copy link
Contributor

@DDtKey DDtKey Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question:
it's wrapper for Instant, but module name is wasm while "wasm" it or not depends on target family
A bit confusing to use wasm::Instant in code, like it's something specific, e.g https://github.com/apache/arrow-datafusion/blob/857a2e34299b7ec5be97b40a5019b15d0fc23bd0/datafusion/core/benches/parquet_query_sql.rs#L28

Shouldn't it be datafusion_common::instant::Instant instead? This way we fully encapsulate wrapper and can easily extend in the future if we wish

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be datafusion_common::instant::Instant instead? This way we fully encapsulate wrapper and can easily extend in the future if we wish

I agree this would be a better home

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense 👍 please see c823e38

@DDtKey
Copy link
Contributor

DDtKey commented Feb 28, 2024

I wonder if you want to put an "exclude" type lint in to prevent people from accidentally using std::time::Instant

Unfortunately, I think it's not possible "as is" for this case, because it's actually just a type alias. However new-type pattern would work (wrap Instant into new type instead of type-alias)

Somebody already tried, btw: rust-lang/rust-clippy#10406

@waynexia
Copy link
Member Author

I wonder if you want to put an "exclude" type lint in to prevent people from accidentally using std::time::Instant

Unfortunately, I think it's not possible "as is" for this case, because it's actually just a type alias. However new-type pattern would work (wrap Instant into new type instead of type-alias)

Somebody already tried, btw: rust-lang/rust-clippy#10406

TIL, thanks for your information.

I find disallowed_type works for our case: edb67fa, it seems work by matching the import types and won't follow type alias 😉

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @waynexia and @DDtKey

@@ -2,3 +2,7 @@ disallowed-methods = [
{ path = "tokio::task::spawn", reason = "To provide cancel-safety, use `SpawnedTask::spawn` instead (https://github.com/apache/arrow-datafusion/issues/6513)" },
{ path = "tokio::task::spawn_blocking", reason = "To provide cancel-safety, use `SpawnedTask::spawn` instead (https://github.com/apache/arrow-datafusion/issues/6513)" },
]

disallowed-types = [
{ path = "std::time::Instant", reason = "Use `datafusion_common::instant::Instant` instead for WASM compatibility" },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 90eddf6 into apache:main Feb 29, 2024
24 checks passed
@waynexia waynexia deleted the warp-std-instant branch March 1, 2024 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants