Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move expired token deletion off validation path #3231

Merged

Conversation

portante
Copy link
Member

@portante portante commented Feb 5, 2023

Work pulled from PR #3114, commits:

  • (26) 12f0adf Token without a user is an internal error
  • (22) 0e29160 Just remove all expired tokens when looking
  • (12) 0aa6f90 s:Move expired token deletion off validation path
  • (08) a1c1b94 Move expired token deletion off validation path

Suggested commit message to use below.


The process of validating an API call should NOT incur database deletion
traffic, where expired tokens could cause unwanted load on the database.
Instead, only during a new login are ALL expired tokens purged from the
database.

The `created` column of `ActiveTokens` is replaced by the new
`expiration` column to make it easy to find all expired tokens for a
user and remove them.

Further, we add a SQLAlchemy `relationship()` between the active token
table and the user table so that it is trivial to fetch the `User` from
the `ActiveTokens` object.

The operation of adding a token for a user is moved to its own method,
`add_token()`.

@portante portante added Server API Of and relating to application programming interfaces to services and functions Database labels Feb 5, 2023
@portante portante added this to the v0.72 milestone Feb 5, 2023
@portante portante self-assigned this Feb 5, 2023
@portante portante force-pushed the rm-exp-tok-off-val-path branch from 664fca5 to 8e8b16b Compare February 6, 2023 02:18
@portante portante force-pushed the rm-exp-tok-off-val-path branch 2 times, most recently from 2dcaeea to 34a0a64 Compare February 14, 2023 14:44
@portante portante marked this pull request as ready for review February 14, 2023 14:48
@portante portante requested a review from npalaska February 14, 2023 14:49
webbnh
webbnh previously approved these changes Feb 14, 2023
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a bunch of nits and small stuff to consider before you merge this.

dbutenhof
dbutenhof previously approved these changes Feb 15, 2023
Copy link
Member

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cleanup would be nice, but it doesn't look "evil" 👀

Database.db_session.commit()
except Exception:
Database.db_session.rollback()
current_app.logger.exception(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception traceback is likely to be really deep but all within the commit and not particularly useful. I might suggest just a logger.error with the str(exc) embedded to tell us the actual cause more compactly???

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbutenhof we use this same sequence in datasets.py. Can you fix it there and I'll fix this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha; maybe elsewhere, too, then. Ah, well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those were the only two I could find.

Comment on lines 30 to 32
op.alter_column(
"active_tokens", "expiration", nullable=False, new_column_name="created"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more interesting, since we're extending the lifetime of any extant auth tokens. Again, it might be better (and in this case more "correct") to just clear the table here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this script automatically generated? @dbutenhof, are you suggesting that we should edit these scripts after generating them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script is generated, but usually edited. And I'd be surprised (though it wouldn't be the first time) if Alembic took it upon itself to decide that renaming a column was an appropriate upgrade when one column name disappears and another appears! In this case that works with existing data (simply pre-expiring all tokens), but in general it could be a really bad idea...

Copy link
Member

@webbnh webbnh Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree: if we're entitled to edit this script, then I think we should be removing and creating (i.e, wiping) these columns instead of renaming them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to use alter column instead of drop and create. It seems silly to do anything about the table data since no token will be "created" in the future, and the first login will take care of the expirations.

Comment on lines 29 to 31
def __init__(self, auth_token: str, expiration: datetime):
self.token = auth_token
self.created = datetime.datetime.now()
self.expiration = expiration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh; I hadn't noticed this before. Normally declarative_base classes shouldn't have __init__ methods; they risk inconsistency, because while they will be called on an explicit ActiveTokens(...) call, they're not necessary (the default is a kwargs with keywords matching the column names ... and here the mismatch between auth_token and self.token is a bit annoying), and they're not called when SQLAlchemy creates proxies from the DB... which means the temptation to put extra logic into a constructor is potentially confusing. (You can use validator hooks for that, which will be called.)

That is, the only functional difference if this were removed is that you'd need to be careful to ActiveTokens(token=value, expiration=value) instead of ActiveTokens(auth_token=value, expiration=value), which wouldn't work because auth_token isn't a column. But I don't like the name mismatch anyway. (And, yeah, it's not new with this PR, but I doubt there are that many constructor calls in the code so cleaning it up probably wouldn't be hard...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this.

npalaska
npalaska previously approved these changes Feb 16, 2023
Copy link
Member

@npalaska npalaska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this generally looks good, I agree with some of the comments already made here and I also have a comment.

"New auth token registered for user {}", user.username
)

# We take the opportunity to remove any expired tokens for ANY user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do it during the login? I don't know the impact of it during the login process but I think the better place to remove all the expired tokens would be in the logout.

I might also suggest if we are removing the expired tokens during the login we might do it only for the user who is trying to log in instead of the Any user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want this operation on a common but low frequency path. Login is a nice reliable spot. I don't see how logout would be better, and, for many users, they will probably just close the tab and never log out, so that wouldn't be a particularly reliable point in the flow.

As for removing expired tokens, an expired token is invalid no matter who it belongs to, and we can get them all with a single database operation, so why filter it to only the current user?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discussion is nearly moot as we expect our API keys to be permanent (at least until manually revoked through some API) and we don't hold or control Auth tokens from Keycloak. So this code has a short lifetime but might be valuable for our initial March deployment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People must login to do something, but they more often than not fail to logout.

@portante portante dismissed stale reviews from npalaska, dbutenhof, and webbnh via 9479552 February 17, 2023 14:55
@portante portante force-pushed the rm-exp-tok-off-val-path branch from 34a0a64 to 9479552 Compare February 17, 2023 14:55
Copy link
Member Author

@portante portante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied code review changes suggested, but did not rebase on top of main yet. Too many conflicts with #3191 to muddle the review.

Once folks review the direction this is going, then I'll rebase.

@portante
Copy link
Member Author

Bugger all, CI jobs won't execute if they have conflicts with the target branch.

@portante portante requested a review from webbnh February 17, 2023 17:30
The process of validating an API call should NOT incur database deletion
traffic, where expired tokens could cause unwanted load on the database.
Instead, only during a new login are ALL expired tokens purged from the
database.

The `created` column of `ActiveTokens` is replaced by the new
`expiration` column to make it easy to find all expired tokens for a
user and remove them.

Further, we add a SQLAlchemy `relationship()` between the active token
table and the user table so that it is trivial to fetch the `User` from
the `ActiveTokens` object.

The operation of adding a token for a user is moved to its own method,
`add_token()`.
@portante portante force-pushed the rm-exp-tok-off-val-path branch from 9479552 to e17d45d Compare February 17, 2023 17:58
webbnh
webbnh previously approved these changes Feb 17, 2023
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GTG

@portante
Copy link
Member Author

Bugger all, I messed up the alembic migration in the conflict merge.

Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine AFAICT.

@portante portante merged commit 566d50b into distributed-system-analysis:main Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Of and relating to application programming interfaces to services and functions bug Database enhancement Server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants