-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add created_at
and updated_at
to all tables
#3225
Comments
Can you give us more details why this is exactly needed? I'd like us to challenge every decision to add data to backend database that is not directly used by backend services (unless really necessary which seems like it is in this case). Otherwise, we keep on adding dependencies on other teams that use our database directly and it's getting increasingly harder to make backend changes without breaking other stuff. |
Overall I personally feel like |
This has come up multiple times in the past (e.g. when discussing data bucketing by month which requires a timestamp for the solver rewards accounting script to know in which bucket a certain settlement falls and recently when adjusting the tenderly web3 action). I believe the solver team now built a workaround to fetch the timestamp for trades and in the web3 action we have to add a ton of network specific logic in order to express something like "check if the trade was less than 24h ago". I agree with your worry that there is risk in adding domain specific columns which other teams depend on and that make it harder for us to refactor later. However, timestamps of when a db row is created (and updated) is not domain specific and doesn't contribute to the risk you mentioned (there are DBs which store this metadata by default, unfortunately postgres is not one of them). |
For the data pipeline this would be really useful: we copy data from the backend db to the analytics db. We will do a one time full load of all the tables we use in the pipeline, and then automate incremental loads, using dune synch v2. We could of course either look at the index of source table and only copy data when the index does not exist in the target table but this can be quite heavy for large datasets and we need to be sure each table has an index. And in case a row is updated but the index does not change, we would not catch the change. We could also look at tables that have columns that increase like block number and only load the rows where block number is higher than the max block number we have in the analytics db. BUT, if there is every a block that somehow was added to the backend db later on, then we would not catch that. AND, as far as I now , not all tables have a column that increases linearily. |
Problem
This was sparked by discussion regarding data analytics efforts. Mirroring the contents of a DB are easier if these fields are available. Also it's generally nice to have timestamps for all sorts of data.
Suggested solution
Implement a DB migration that adds triggers to all tables which set
created_at
once when a new row gets created andupdated_at
whenever the row gets modified.That means whenever a new row gets created
created_at
andupdated_at
would be initialized to the same value.Doing it in the DB means we don't have to adjust any Rust code and it's probably less error prone as well.
Acceptance criteria
All tables receive
created_at
andupdated_at
columns storing timestamps of the respective event.All historic data gets backfilled with the timestamp when the migration gets applied.
The text was updated successfully, but these errors were encountered: