Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: client telemetry #121

Merged
merged 2 commits into from
Jan 21, 2025
Merged

feat: client telemetry #121

merged 2 commits into from
Jan 21, 2025

Conversation

port8080
Copy link
Contributor

No description provided.

@port8080 port8080 requested a review from jgodlew December 13, 2024 01:04
@port8080
Copy link
Contributor Author

port8080 commented Dec 13, 2024

I hacked the endpoint to prod, and saw the attached json show up under ds-hub-telemetry. NB that I will need to apply https://github.com/huggingface-internal/infra/pull/1631 to see the records as parsed timestamps and counts instead of strings
logs on the client

{"timestamp":"2024-12-13T00:53:37.183852Z","level":"INFO","fields":{"action":"clean","repo_id":"Some(\"675b6a8fa4c359344add6b23\")","file_name":"None","file_size_count":11489,"new_bytes_count":11489,"start_ts":"2024-12-13T00:53:36.793427+00:00","end_processing_ts":"2024-12-13T00:53:37.183780+00:00"},"filename":"/Users/ajitb/hf/xet-core/data/src/clean.rs","line_number":227}

and logs sent to telemetry endpoint at https://huggingface.co/api/telemetry/xet/cli attached below
Screenshot 2024-12-12 at 5 17 38 PM

record1.json

@port8080 port8080 force-pushed the ajit/clean-telemetry-publish branch from ee582ba to a4efa48 Compare December 13, 2024 01:17
@Hugoch
Copy link
Member

Hugoch commented Dec 13, 2024

@port8080 After some discussions with @ylow @XciD @julien-c , we would prefer to be extra careful when implementing client-side telemetry. We advocate for keeping server-side metrics. I understand that client-side may be useful for local cache hits assessment, if you really want to keep it let's make it opt-in for testing people but not the default 😄

@port8080 port8080 force-pushed the ajit/clean-telemetry-publish branch from a4efa48 to 54b0dea Compare January 16, 2025 21:56
@port8080 port8080 marked this pull request as ready for review January 16, 2025 22:10
@port8080
Copy link
Contributor Author

Moving fwd with this after #142 and a discussion between @rajatarya and @XciD

@port8080 port8080 requested review from seanses and ylow January 16, 2025 22:18
Copy link

@ylow ylow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this implode if we update the JWT contents?

@port8080
Copy link
Contributor Author

@ylow JWT parsing failure will cause repo_id to be None but the code should not explode.
I agree that copying pub struct TokenClaim violates DRY and is clunky, but do not see a way around it for now.

@port8080
Copy link
Contributor Author

I ran it e2e with the hash based approach and it works.

  • note 67859b5f82c688377fe022d0 is my repo "port8080/test-xet-sagemaker"
  • command run was HF_HUB_ENABLE_TELEMETRY=1 python ~/Desktop/test-poc.py
  • kibbana query is from ds-hub-telemetry | where url == "/api/telemetry/xet/cli" with the output being
    record.json

Copy link
Contributor

@jgodlew jgodlew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me!

@port8080 port8080 merged commit 8f52fad into main Jan 21, 2025
2 checks passed
@port8080 port8080 deleted the ajit/clean-telemetry-publish branch January 21, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants