Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GCSStore #1547

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Add GCSStore #1547

wants to merge 2 commits into from

Conversation

asr2003
Copy link

@asr2003 asr2003 commented Dec 16, 2024

Description

Please include a summary of the changes and the related issue. Please also
include relevant motivation and context.

Add Google Cloud Storage Store

GCS Store implementation closely mirrors the S3 version but adapts it to the GCS gRPC API and metadata structure. It aligns with S3's structure and behavior while leveraging GCS-specific features. And it also eliminates manual generating of protobufs

  • S3 manages concurrency using a bounded mpsc::channel and FuturesUnordered. GCS currently processes chunks sequentially in the provided implementation.. GCS lacks native support for concurrent part uploads in a single upload session which can be a performance bottleneck for very large files.
  • GCS sequential resumable uploads are better suited for the cases where data integrity and fault tolerance are more critical than performance and to support concurrency in uploads which may require out of box solution
  • In next iteration, I am going to add tests similar of s3 store

Fixes #659
/claim #659

Type of change

Please delete options that aren't relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

Please also list any relevant details for your test configuration

Checklist

  • Updated documentation if needed
  • [ TODO] Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

@asr2003
Copy link
Author

asr2003 commented Dec 16, 2024

@SchahinRohani Here's the PR with the approach we discussed to add GCS Store

@MarcusSorealheis
Copy link
Collaborator

@asr2003 could convert to a Draft PR until ready?

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Before we move on to a more in-depth review, please ensure the following:

Reviewable status: 0 of 2 LGTMs obtained, and 0 of 11 files reviewed, and pending CI: Bazel Dev / ubuntu-24.04, Cargo Dev / ubuntu-22.04, Coverage, Installation / ubuntu-22.04, NativeLink.com Cloud / Remote Cache / ubuntu-24.04, Publish image, Publish nativelink-worker-init, Remote / lre-cc / large-ubuntu-22.04, Remote / lre-rs / large-ubuntu-22.04, docker-compose-compiles-nativelink (22.04), integration-tests (22.04), and 3 discussions need to be resolved

@asr2003 asr2003 marked this pull request as draft December 16, 2024 16:37
@asr2003
Copy link
Author

asr2003 commented Dec 17, 2024

To work with bazel , I will agree with you @aaronmondal

Don't import openssl or reqwest as they pull in too many dependencies and won't easily build into a statically linked executable. GCS supports gRPC, so let's go with a tonic implementation (I believe the protos are here: https://github.com/googleapis/googleapis/blob/master/google/storage/v2/storage.proto and there is infra for it in the nativelink-proto directory).
and this
Tbh I'm not sure whether the cargo build can generate protos at all 😅 My guess is that this part will only work with Bazel.

Ahh, while the google-cloud-storage crate simplifies things, It seems the tonic/gRPC approach is a cleaner and more future-proof solution for interacting with GCS without pulling in unnecessary dependencies (reqwest/openssl).

If Bazel and CI integration for tonic/protobuf generation is feasible I will rewrite them to this. Can I ahead with this @aaronmondal

@SchahinRohani
Copy link
Contributor

First of all, thank you for your effort @asr2003.
We are looking for a grpc solution as @aaronmondal mentioned. Therefore the proposed solution is not suitable.

@asr2003 asr2003 marked this pull request as ready for review December 18, 2024 13:13
@asr2003
Copy link
Author

asr2003 commented Dec 18, 2024

Dear reviewers,

Now GCS Store implementation closely mirrors the S3 version but adapts it to the GCS gRPC API and metadata structure. It aligns with S3's structure and behavior while leveraging GCS-specific features. And it also eliminates manual generating of protobufs

  • S3 manages concurrency using a bounded mpsc::channel and FuturesUnordered. GCS currently processes chunks sequentially in the provided implementation.. GCS lacks native support for concurrent part uploads in a single upload session which can be a performance bottleneck for very large files.
  • GCS sequential resumable uploads are better suited for the cases where data integrity and fault tolerance are more critical than performance and to support concurrency in uploads which may require out of box solution
  • In next iteration, I am going to add tests similar of s3 store

This are my views and any reviews and suggestions are much appreciated. Let me know if I am missing anything and I would like to incorporate them

@aaronmondal
Copy link
Member

aaronmondal commented Dec 18, 2024

@asr2003 Looks much nicer than the previous implementation!

Some things I'm noticing:

  • I wasn't aware of the google-tonic-xxx crates. Since we're only talking to an external endpoint I think it's fine to not generate the protos via https://github.com/TraceMachina/nativelink/blob/main/nativelink-proto/BUILD.bazel. I'd say we're good on that part 👍
  • The authentication might need some work. I'm not sure how GCS handles this, but IIUC currently the only way to handle auth is via the config parameter file. The S3Store uses a credential provider:
    let s3_client = {
    let http_client =
    HyperClientBuilder::new().build(TlsConnector::new(spec, jitter_fn.clone()));
    let credential_provider = credentials::default_provider().await;
    let mut config_builder = aws_config::defaults(BehaviorVersion::v2024_03_28())
    .credentials_provider(credential_provider)
    .app_name(AppName::new("nativelink").expect("valid app name"))
    .timeout_config(
    aws_config::timeout::TimeoutConfig::builder()
    .connect_timeout(Duration::from_secs(15))
    .build(),
    )
    .region(Region::new(Cow::Owned(spec.region.clone())))
    .http_client(http_client);
    // TODO(allada) When aws-sdk supports this env variable we should be able
    // to remove this.
    // See: https://github.com/awslabs/aws-sdk-rust/issues/932
    if let Ok(endpoint_url) = env::var("AWS_ENDPOINT_URL") {
    config_builder = config_builder.endpoint_url(endpoint_url);
    }
    aws_sdk_s3::Client::new(&config_builder.load().await)
    };
    Self::new_with_client_and_jitter(spec, s3_client, jitter_fn, now_fn)
    }
    It would be good to have some mechanism that works with an environment variable, similar to the S3 store as this makes it easier to handle the setup uniformly in cloud environments.
  • Ensure that we have decent test coverage. Doesn't need to be a 100% perfect branch-based coverage setup, but the core functionality should be tested (IIRC the OSSF guidelines recommend 80%). The way we generate Coverage in CI is like this: https://www.nativelink.com/docs/contribute/guidelines#generating-code-coverage. This can be somewhat slow during development so you might want to take a look at this as well which is less precise but faster: https://bazel.build/configure/coverage

@MarcusSorealheis
Copy link
Collaborator

In the first post on the issue, there is an issue template. You should complete it like the other issues.

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include a summary of the changes and the related issue. Please also
include relevant motivation and context.

This is a placeholder for what you are supposed to add at the top of the issue.

@asr2003
Copy link
Author

asr2003 commented Dec 20, 2024

@aaronmondal The gprc/tonic GCS store is tightly coupled with the StorageClient<Channel> , So that I am unable to mock store, I think I need to rewrite Store to implement abstract Trait for StorageClient that enable us to mock GCS Store for testing but I am not sure whether that makes sense to go. Have any guidance on this mocking?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

💎 Implement Google Cloud Bucket Store
4 participants