Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an atomic lock check action. #37

Closed
wants to merge 1 commit into from

Conversation

josecv
Copy link

@josecv josecv commented May 18, 2018

Analogous to a hardware lock check, this will atomically try to acquire
a specified lock (and wait until it can), then immediately release the lock.

Co-authored-by: Rafay [email protected]

@pivotal-issuemaster
Copy link

@josecv Please sign the Contributor License Agreement!

Click here to manually synchronize the status of this Pull Request.

See the FAQ for frequently asked questions.

Copy link

@william-tran william-tran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs docs as well, take a look at the last feature addition

Analogous to a hardware lock check, this will atomically try to acquire
a specified lock (and wait until it can), then immediately release the lock.

Co-authored-by: Rafay <[email protected]>
@pivotal-issuemaster
Copy link

@josecv Thank you for signing the Contributor License Agreement!

@josecv
Copy link
Author

josecv commented May 18, 2018

docs have been added

@vito
Copy link
Member

vito commented May 23, 2018

I'm interested in hearing the use case for this before merging, in the interest of minimizing knobs. What does this solve over having two puts? What part of your workflow requires just waiting on a lock without holding it before continuing? (Is that really safe?)

@josecv
Copy link
Author

josecv commented May 23, 2018

we use locks to implement environment freezes. in short, it's safe to have multiple deployments running at once, but at certain points we might choose to prevent all deployments. when a deployment job runs, it attempts to fetch a lock: if it can, that means the environment isn't frozen and deployments are permitted; the lock is immediately released so that a simultaneous deployment is allowed through.

when we need to actually freeze an env, we manually place the lock in claimed state and don't unclaim it until we want the environment to thaw.

at first we did implement this using two puts; unfortunately this turns out quite problematic. if folks interrupt their deploy (or indeed if the deploy crashes) at just the right time, it's possible for the deploy to exit while holding on to the lock, thus deadlocking an environment until the lock is manually released. we considered doing an on_abort or ensure step to release the lock but this turns out to be quite tricky. if the claim dies after it's managed to put the lock but before it's managed to get the lock, then the release will fail whether or not it's inside an ensure (since the lock metadata is not available to it). if we put a get before the release to make sure that the release will succeed, then we run the chance of an aborted deploy unlocking an intentionally locked environment.

an atomic check operation solves this problem, since it happens all at once.

@vito
Copy link
Member

vito commented May 23, 2018

Interesting. This will indeed help, but it still won't completely save you from aborts. Someone could click 'abort' and cause the put operation itself to be interrupted in between the claim and unclaim.

Going back to your use case, it actually sounds a bit like a read-write lock or exclusive/shared acquire semantics. This is discussed in some form as #35 and concourse/concourse#842. Some form of that feels like a better approach to me - the utility of the check verb here was not immediately apparent. I may be missing something here but it also seems like there's nothing preventing the 'freezer' or 'cleanup' job (i.e. the exclusive acquirer) from running while existing deploys are in flight, because they just immediately release their lock.

I'd be happy to work with y'all towards a proposed API/approach if you're interested.

@vito
Copy link
Member

vito commented Aug 23, 2018

Sorry but I'm gonna close this out; I'm still fishy on this approach to the problem. Happy to hash out the other stuff I mentioned at some point though.

@vito vito closed this Aug 23, 2018
@alnavart
Copy link

alnavart commented Aug 6, 2021

Hi! I'm experiencing the same issue than @josecv commented on #37 (comment)
Are there any update on this approach about atomic operation? or maybe disallowing to abort on some specific steps..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants