Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSR per allocator #216

Open
4 tasks
Tracked by #213
bajtos opened this issue Jan 6, 2025 · 1 comment
Open
4 tasks
Tracked by #213

RSR per allocator #216

bajtos opened this issue Jan 6, 2025 · 1 comment
Assignees
Labels

Comments

@bajtos
Copy link
Member

bajtos commented Jan 6, 2025

Provide retrieval-based RSR calculated on a per-allocator basis.

Pre-requisites:

Related discussions:

Notes:

  • In spark-evaluate, we are (or will be) mapping retrievals to clients, see RSR per client #193.

  • The missing piece is aggregating per-client data to per-allocator data. The important insight is that each client is linked to a single allocator only.

  • fil-deal-ingester maintains a mapping between clients and allocators in the table allocator_clients.

  • We need to access this mapping from spark-evaluate or spark-stats, depending on where we aggregate per-client to per-allocator stats.

    Possible options to consider:

    1. Enhance each retrieval task in the round details with the list of allocators in addition to the list of clients.
      • This may be the easiest path if we map clients to allocators in spark-evaluate.
    2. Implement a new REST API endpoint in spark-stats to allow spark-evaluate or spark-stats to map clients to allocators.
      • We must be careful about performance - we don't want to make one request for each client.
      • This endpoint can be useful for per-client dashboard as it will allow us to show the allocator from which the client has received DataCap.
      • spark-stats can provide a public facade for this endpoint using the same mechanism we have already in place for getting deals eligible for retrieval testing (source code).
@bajtos bajtos mentioned this issue Jan 6, 2025
19 tasks
@bajtos bajtos mentioned this issue Jan 30, 2025
58 tasks
@juliangruber juliangruber mentioned this issue Feb 13, 2025
4 tasks
@bajtos
Copy link
Member Author

bajtos commented Feb 13, 2025

Since one allocator can have multiple clients, we should not combine per-client aggregated stats to produce per-allocator stats.

For example, one measurement can be linked to two clients from the same allocator. This should account for 1 total measurement. If we combine per-client aggregated stats, we would get 2 total measurements.

Based on the above, I think we need to produce per-allocator stats inside spark-evaluate using a similar algorithm to the one producing per-client stats.

  1. Loop over all measurements.
  2. Map each measurement to a list of clients of the deal(s) measured and then from clients to allocators. The goal is to get Map<allocator, measurement[]>.
  3. For each allocator, aggregate the measurements to calculate the stats (total, successful, successful_http, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 📋 planned
Development

No branches or pull requests

3 participants