Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persisters should error if asked to save a persisted resource which doesn't already exist. #837

Closed
tpendragon opened this issue Aug 10, 2020 · 7 comments · Fixed by #867
Closed

Comments

@tpendragon
Copy link
Collaborator

Right now the following works:

resource = metadata_adapter.persister.save(resource: Resource.new)
metadata_adapter.persister.delete(resource: resource)
saved_again = metadata_adapter.persister.save(resource: resource)
saved_again.id == resource.id # => true

However, this causes race conditions with background jobs. It should do the following:

resource = metadata_adapter.persister.save(resource: Resource.new)
metadata_adapter.persister.delete(resource: resource)
saved_again = metadata_adapter.persister.save(resource: resource)
# => Valkyrie::Persistence::ObjectNotFoundError

I'm open to other ideas, instead of ObjectNotFoundError. A quick experiment with AR shows that it just returns true for save, but doesn't do anything.

@dgcliff
Copy link
Contributor

dgcliff commented Aug 10, 2020

ObjectNotFoundError seems to be the best fit, but perhaps we can highlight for the user the cause of the issue to help them untangle it. If they arrived at the error as a result of a race condition, they may not even know when/where the resource was deleted.

Setting the error message could potentially fulfill both goals.

Valkyrie::Persistence::ObjectNotFoundError.new("Attempted save on persisted resource which was not found") (stand-in language, I'm sure a better explanation is possible)

@no-reply
Copy link
Contributor

could this be adapter specific behavior?

my intuition is that making the #persisted? check an API level contract is introducing unneeded state dependence, and likely to end up being more complicated than we bargained for. is there a proposed generic implementation that doesn't involve an extra round-trip to the backend when saving an object for which #persisted? is true?

it seems reasonable (already) that an adapter or backend might want to avoid recreating items with the same ids as any that have already been deleted, and that they might use a variety of strategies to ensure that.

i think i'd also like to hear more about the nature of the race conditions. pulibrary/figgy#4174 provides a hint, but i'm curious to know more.

@tpendragon
Copy link
Collaborator Author

I don't think it should be adapter specific, but I'm sure we can find efficient methods of implementation. Fedora, for instance, will have a tombstone and just throw an error. A database save could UPDATE, and solr's cheap to check.

The race condition for us was the following:

A. User uploads an item with 600 pages. 25 workers promptly start working them.
B. Worker A pulls the object it generates derivatives for, creates a derivative. This takes a bit to process.
C. User deletes the object because they've found some mistake and want to reingest. This deletes all the pages.
C. Worker A saves the page it was generating a derivative for. Now the page is suddenly persisted again, and separated from its parent (which is gone.)

hackartisan added a commit that referenced this issue Oct 22, 2020
hackartisan added a commit that referenced this issue Oct 22, 2020
@hackartisan
Copy link
Contributor

hackartisan commented Oct 23, 2020

Work started on branch master...837-persisters-should-error

Note one iCLA will be needed before this can be merged. -- Update: CLA is received / on file

@tpendragon
Copy link
Collaborator Author

tpendragon commented Oct 29, 2020

I added some commits to master...837-persisters-should-error and now only Solr is left, but I don't have a good plan on how to make Solr efficiently.

@tpendragon
Copy link
Collaborator Author

I also don't know what to do about #save_all. Erroring beforehand would make it stop in the middle of saving them with some of these implementations.

@hackartisan
Copy link
Contributor

Hm. For save_all, perhaps the strategy AR uses would be worth considering - don't error but also don't do anything.

tpendragon pushed a commit that referenced this issue Jul 28, 2021
tpendragon pushed a commit that referenced this issue Jul 28, 2021
tpendragon pushed a commit that referenced this issue Aug 4, 2021
tpendragon pushed a commit that referenced this issue Aug 4, 2021
tpendragon pushed a commit that referenced this issue Sep 13, 2021
tpendragon pushed a commit that referenced this issue Sep 13, 2021
tpendragon pushed a commit that referenced this issue Sep 14, 2021
tpendragon pushed a commit that referenced this issue Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants