-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicitly specify DB connection when we need to, and don't when we don't need to #8224
base: master
Are you sure you want to change the base?
Conversation
we still need to specify the connection on - raw SQL - transactions but db routers will handle the ORM calls now
We can write these queries using the ORM so lets just do that
this file uses both, so I did this in its own commit for clarity
I don't think this was doing anything useful
def get_principal_db_connection(): | ||
return connections[get_principal_db_name()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of notes on this.
- I added this function as discussed, but given we have to pass the connection name in as a string when we use
transaction.atomic()
I'm not sure it is actually that useful. It might be more consistent to just use the string name only for both purposes. - To make this PR something we can review independently of the commits in Only serve from replica when there is a request in scope #8221 I've slightly falsified the commit history here. We'll need to resolve a conflict with that PR if we keep the function. When resolved, the "real" diff for this file will be:
diff --git a/polling_stations/db_routers.py b/polling_stations/db_routers.py
index 0a03ff41..dd41e3ca 100644
--- a/polling_stations/db_routers.py
+++ b/polling_stations/db_routers.py
@@ -3,6 +3,7 @@ import os
from django.conf import settings
from django.db import DEFAULT_DB_ALIAS
from django_middleware_global_request import get_request
+from django.db import connections
PRIMARY = settings.PRINCIPAL_DB_NAME
@@ -54,3 +55,7 @@ def get_principal_db_name():
if PRIMARY in settings.DATABASES:
return PRIMARY
return REPLICA
+
+
+def get_principal_db_connection():
+ return connections[get_principal_db_name()]
@transaction.atomic | ||
def send_error_email(self): | ||
subject = "File upload failed" | ||
message = f"File upload failure: {self}. Please investigate further." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell from reading this function, this transaction wasn't really doing anything so I just removed it.
cursor = connection.cursor() | ||
cursor = get_principal_db_connection().cursor() | ||
cursor.execute( | ||
""" | ||
SELECT a.uprn, d.polling_station_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially the 2 SQL statements in this file could be rewritten using the ORM. I think you could probably replace
get_uprns_by_district()
with either
def get_uprns_by_district(self):
districts = PollingDistrict.objects.filter(
council_id=self.council_id, area__contains=OuterRef("location")
).values("polling_station_id")[:-1]
return (
Address.objects.filter(
uprntocouncil__lad=self.gss_code,
)
.annotate(polling_station_id=Subquery(districts))
.values_list("uprn", "polling_station_id")
)
OR
def get_uprns_by_district(self):
districts = PollingDistrict.objects.filter(
council_id=self.council_id, area__contains=OuterRef("location")
).values("polling_station_id")
return (
Address.objects.filter(
uprntocouncil__lad=self.gss_code,
)
.annotate(polling_station_id=Subquery(districts))
.values_list("uprn", "polling_station_id")
)
but I couldn't quite get my head round which and there's about 10 years of thinking about edge cases that I don't want to untangle here. I settled for just specifying the connection and sticking with raw SQL for these two.
I wanted to flag a couple of places where I explicitly chose not to change anything:
|
Refs https://app.asana.com/0/1204880927741389/1208838937595665/f
This PR is a bit of a mashup of several different things. There are 3 core things I'm doing in this PR.
Prior to #8221 our "solution" to this problem was to pepper
.using()
andusing=
across scripts we thought we might want to run via SSH. #8221 should make this less necessary (but not completely unnecessary - we'll come on to this) so one of the things I'm doing in this PR is removing a lot of that stuff from places where we had done it explicitly and relying on the DB router to route queries to the right DB.Then that said.. Django's DB routers are a somewhat leaky abstraction. They apply to ORM operations, but that doesn't mean they apply to all database I/O. Specifically there are 2 things we need to worry about:
connection.cursor
This one is kind of obvious if you think about it. The DB router expects a model class as one of the inputs so anything where you're running raw SQL doesn't make sense in that world. There is also no way to enforce
allow_relation
if we're working with raw SQL so if you think about it, it is obvious that the DB routers won't work if you're using raw SQL.There's a thread about this on the django community forum https://forum.djangoproject.com/t/allow-database-router-to-override-default-connection/34529
Anyway, I can't see this changing soon. Also, we use raw SQL in several places in this application. Usually for valid reasons. So we do need to think about this.
In this PR, I've broadly done 2 things:
.cursor()
to execute SQL because if we don't explicitly specify it, django just uses the default DB connection.transaction.atomic
This one is perhaps less obvious.
Regardless of what your database routers say, transaction atomic blocks use the default DB connection unless you explicitly specify it with
using=
.https://github.com/django/django/blob/5a2c1bc07d126ce32efaa157e712a8f3a7457b74/django/db/transaction.py#L182-L183
https://github.com/django/django/blob/5a2c1bc07d126ce32efaa157e712a8f3a7457b74/django/db/transaction.py#L18-L25
In our case, because the default DB is the replica rather than the primary in production this means that when we write code like this:
The transaction applies to database IO over the replica connection but the writes we are doing are happening on the primary.
So it looks like that code is safely wrapped in a transaction, but it isn't really when we run it in prod 😱
Maybe this is telling us we should actually make the primary DB the default connection? (I accept there are tradeoffs either way round). For now I have chosen to just explicitly pass a
using=
to everytransaction.atomic
so that transactions actually working as intended·Anyway, the point is: If we're using raw SQL or transactions it is up to us to manually specify what connection we're using or django will just use the default and it doesn't matter what we put in DB routers.
What I haven't attempted to do in this PR is prevent us from writing wrong code in future. That is another issue/PR.
The scope of this PR is only to make sure all the code we have already written is talking to the correct DB and remove code that is no longer needed following #8221
I have attempted to split this down into commits which are easy to digest one at a time.