Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add OrganizationSpansFrequencyStatsEndpoint for Comparison Workflows project #84765

Merged
merged 38 commits into from
Feb 10, 2025

Conversation

Mitan
Copy link
Member

@Mitan Mitan commented Feb 7, 2025

For comparison workflows, we need the count distribution data of all attribute/value pairs (both string and numeric, however we'll focus on string first). This PR adds a sentry backend endpoint for retrieving these stats.
The protos required for this task were already merged here
The snuba RPC was merged here

More info on the project can be found here

@Mitan Mitan linked an issue Feb 7, 2025 that may be closed by this pull request
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Feb 7, 2025
Copy link

codecov bot commented Feb 7, 2025

❌ 3 Tests Failed:

Tests completed Failed Passed Skipped
23473 3 23470 289
View the top 3 failed test(s) by shortest run time
tests.sentry.api.endpoints.test_organization_spans_fields_stats.OrganizationSpansFieldsStatsEndpointTest::test_max_buckets
Stack Traces | 3.04s run time
#x1B[1m#x1B[.../api/endpoints/test_organization_spans_fields_stats.py#x1B[0m:120: in test_max_buckets
    assert response.status_code == 200, response.data
#x1B[1m#x1B[31mE   AssertionError: {'detail': 'Internal Error', 'errorId': None}#x1B[0m
#x1B[1m#x1B[31mE   assert 500 == 200#x1B[0m
#x1B[1m#x1B[31mE    +  where 500 = <Response status_code=500, "application/json">.status_code#x1B[0m
tests.sentry.api.endpoints.test_organization_spans_fields_stats.OrganizationSpansFieldsStatsEndpointTest::test_distribution_values
Stack Traces | 3.81s run time
#x1B[1m#x1B[.../api/endpoints/test_organization_spans_fields_stats.py#x1B[0m:140: in test_distribution_values
    assert response.status_code == 200, response.data
#x1B[1m#x1B[31mE   AssertionError: {'detail': 'Internal Error', 'errorId': None}#x1B[0m
#x1B[1m#x1B[31mE   assert 500 == 200#x1B[0m
#x1B[1m#x1B[31mE    +  where 500 = <Response status_code=500, "application/json">.status_code#x1B[0m
tests.sentry.api.endpoints.test_organization_spans_fields_stats.OrganizationSpansFieldsStatsEndpointTest::test_filter_query
Stack Traces | 4.21s run time
#x1B[1m#x1B[.../api/endpoints/test_organization_spans_fields_stats.py#x1B[0m:163: in test_filter_query
    assert response.status_code == 200, response.data
#x1B[1m#x1B[31mE   AssertionError: {'detail': 'Internal Error', 'errorId': None}#x1B[0m
#x1B[1m#x1B[31mE   assert 500 == 200#x1B[0m
#x1B[1m#x1B[31mE    +  where 500 = <Response status_code=500, "application/json">.status_code#x1B[0m

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@Mitan Mitan changed the title Dmitrii/stats endpoint feat: Add OrganizationSpansFrequencyStatsEndpoint for Comparison Workflows project Feb 7, 2025
@Mitan Mitan requested a review from a team as a code owner February 7, 2025 16:37
{"attributeDistributions": []} # Empty response matching the expected structure
)

serializer = OrganizationSpansFieldsEndpointSerializer(data=request.GET)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we validating our request with this serializer? Looks like you're already doing the dataset validation below and this serializer contains fields not pertinent to this endpoint

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, added a serializer for this endpoint

class OrganizationSpansFrequencyStatsEndpoint(OrganizationEventsV2EndpointBase):
snuba_methods = ["GET"]
publish_status = {
"GET": ApiPublishStatus.PRIVATE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a ticket to document this API as a follow up! This API should likely be a public one eventually so it would be good to document it while you have the context!

src/sentry/utils/snuba_rpc.py Outdated Show resolved Hide resolved
src/sentry/api/urls.py Outdated Show resolved Hide resolved
dataset = serializers.ChoiceField(["spans", "spansIndexed"], required=False, default="spans")
# if values are not provided, we will use zeros and then snuba RPC will set the defaults
# Top number of frequencies to return for each attribute, defaults in snuba to 10 and can't be more than 100
max_buckets = serializers.IntegerField(required=False, min_value=0, max_value=100, default=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's pick a more reasonable default - like 5 or 10?

# Top number of frequencies to return for each attribute, defaults in snuba to 10 and can't be more than 100
max_buckets = serializers.IntegerField(required=False, min_value=0, max_value=100, default=0)
# Total number of attributes to return, defaults in snuba to 10_000
max_attributes = serializers.IntegerField(required=False, min_value=0, default=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we're setting a 0 default? let's just not set anything here for default since it defeats the purpose of making it a not required field



class OrganizationSpansFrequencyStatsEndpointSerializer(serializers.Serializer):
dataset = serializers.ChoiceField(["spans", "spansIndexed"], required=False, default="spans")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spansIndexed should not be a valid choice here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then i can just remove dataset parameter, since technically we don't need it at all

class OrganizationSpansFieldsStatsEndpoint(OrganizationEventsV2EndpointBase):
snuba_methods = ["GET"]
publish_status = {
"GET": ApiPublishStatus.PRIVATE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, private means this endpoint will never be useful for customers calling our APIs directly and only used in Sentry frontend. Is that right? If it is marked as private only because it's not stable yet, please make it EXPERIMENTAL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, yes, we can mark it as experimental

Copy link
Member

@shruthilayaj shruthilayaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just a small comment

stats_type = StatsType(
attribute_distributions=AttributeDistributionsRequest(
max_buckets=serialized["max_buckets"],
max_attributes=serialized.get("max_attributes", 0),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't default to 0 here, just set it to None

Suggested change
max_attributes=serialized.get("max_attributes", 0),
max_attributes=serialized.get("max_attributes"),


@region_silo_endpoint
class OrganizationSpansFieldsStatsEndpoint(OrganizationEventsV2EndpointBase):
snuba_methods = ["GET"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove

@Mitan Mitan merged commit ae8a52b into master Feb 10, 2025
49 checks passed
@Mitan Mitan deleted the dmitrii/stats-endpoint branch February 10, 2025 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(workflows) Build a Sentry endpoint to query the stats
3 participants