Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prod Burr using S3-backed API, initial scaffolding/implementation #288

Merged
merged 11 commits into from
Aug 5, 2024

Conversation

elijahbenizzy
Copy link
Contributor

@elijahbenizzy elijahbenizzy commented Jul 29, 2024

Will fill in with architecture later

Changes

How I tested this

Notes

Required for prod:

  • Test it out fully
    • Script for demo data to display here
    • Open to public internet to test scalability
  • Enumerate edge cases of missed data and ensure that it can run without certain datums
    • There are cases in which data files can come out of order
  • README with architecture
    • Configuration + settings
    • Architecture
    • Deployment
  • Configuration + settings to it
    • Use pydantic (?)
    • Configure refresh rate
  • Add refresh buttons in the UI
    • Optional mixin on backend interface
    • WIll be enabled if BE supports forced refresh
  • Add test for from_log
    • Get filesystem version to use it too
  • Handle partition key in URL
    • might just combine it with application
  • Add admin view with indexing jobs
    • Mixin, same with refresh buttons
  • Break the scan_db into functions
    • Ensure we can invert control (server does saving/indexing)
  • Consider whether we want s3 namespaced by project or not
    • Decision: yes, as it makes project-level ACLs easier
  • Get it to save/load the sqlite progress to s3
    • Load up latest version by name on start, upload version on end
  • Determine the right schema for naming files – how much do we want to rely on ordering being perfectly correct
  • Figure out a plan for mismatched timestamps (perhaps every n jobs will go back an hour...)
  • Ensure the update job is idempotent
  • OpenTel integration
    • rely on FastAPI? Tortoise?
  • Add ability to run from CLI

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@elijahbenizzy elijahbenizzy force-pushed the tracker-s3 branch 16 times, most recently from 5487491 to d7070be Compare August 3, 2024 04:50
@elijahbenizzy elijahbenizzy mentioned this pull request Aug 3, 2024
7 tasks
This way the user can write to s3 and have Burr server pick it up.
We need this to write the log files to s3 and index them properly.
High-level architecture:
1. Clients writes to s3 bucket
2. Server powers up with a SQLite(pluggable) db
3. Server indexes the s3 on a recurring job
4. We have pointers for everything in the UI stored in the db except the
   data for the traces
5. Server saves/loads sqlite database with highwatermark to s3

We have not implemented (5) yet, but the rest are done.

Some specifics:
1. backend has been broken into mixins -- e.g. indexing backend,
   standard backend, etc... -- this allows us to have it implement
   classes and have that be called
2. If it's the indexing backend we have an admin view with jobs
3. We use tortoise ORM to make switching between DBs easy -- we will
   very likely enable postgres soon
4. The indexing function should be easy to invert control -- E.G. rather
   than writing to s3, we write to the server which logs to s3.
5. We store a high-watermark so we don't go over the same one twice
We also made it so we can wire through the s3 data through the command
This has pages on indexing jobs + a few other updates
This is a hack for the local version, soon we'll be using
postgres/others and it will be less necessary.
You can now:
1. List all apps of a partition key
2. Navigate to a specific partition key

Note that the file storage is still not distinct between partition keys.
This will change stability of URL but that's OK for now.

For null partition keys we just use __none__.
@elijahbenizzy elijahbenizzy marked this pull request as ready for review August 5, 2024 00:28
@elijahbenizzy elijahbenizzy changed the title WIP for prod burr on S3 Prod Burr using S3-backed API, initial scaffolding/implementation Aug 5, 2024
@elijahbenizzy elijahbenizzy requested a review from skrawcz August 5, 2024 00:29
@elijahbenizzy elijahbenizzy merged commit 40b84ff into main Aug 5, 2024
12 checks passed
@elijahbenizzy elijahbenizzy deleted the tracker-s3 branch August 5, 2024 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants