-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
De-duplicate checksum verification #1012
De-duplicate checksum verification #1012
Conversation
@@ -130,11 +130,10 @@ trimVerifyChecksums_v0.0 = %clientScriptsDirectory%trimVerifyChecksums.py | |||
trimVerifyManifest_v0.0 = %clientScriptsDirectory%trimVerifyManifest.py | |||
updateSizeAndChecksum_v0.0 = %clientScriptsDirectory%archivematicaUpdateSizeAndChecksum.py | |||
validateFile_v1.0 = %clientScriptsDirectory%validateFile.py | |||
verifyAIP_v0.0 = %clientScriptsDirectory%verifyAIP.py | |||
verifyAIP_v1.0 = %clientScriptsDirectory%verify_AIP.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.python.org/dev/peps/pep-0008/#package-and-module-names
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.
import sys | ||
|
||
import django | ||
django.setup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this call inside the if __name__ == '__main__':
block? I find it's more explicit and clearer. If there was a case where someone is using this module then the caller should be responsible to initialize Django. Probably not very important, but I've been doing it this way recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
|
||
def extract_aip(aip_path, extract_path): | ||
os.makedirs(extract_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default mode is ok? Just checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be ok. Extraction works
""" | ||
|
||
sip_uuid = sys.argv[1] # %sip_uuid% | ||
aip_path = sys.argv[2] # SIPDirectory%%sip_name%-%sip_uuid%.7z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice way to put it with the examples, that's useful :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't do it, but I'll take the compliment
return_code = 1 | ||
|
||
if return_code == 0: | ||
return_code = verify_checksums(bag, sip_uuid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could argue that it's not verify_checksums
's concern to know about exit codes - it'd be more pythonic to have it raise an exception right? Feel free to ignore this. Just a random thought, I've done the same so many times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you're right. I'll change.
|
||
# cleanup | ||
if not is_uncompressed_aip: | ||
shutil.rmtree(extract_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to make the script report an error if the clean up did not succeed? You may want to trap OSError
if that's not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
- Removes the verifyPREMISChecksums_v0.0 client script verifyPREMISChecksums.py. - Removes verifyPREMISChecksums_v0.0 from archivematicaClientModules. - Introduces a migration that removes the "Verify checksums generated on ingest" micro-service.
- After performing Bag validation to verify the AIP, the new verify_AIP client script (verifyAIP_v1.0) queries the db for transfer-generated checksums and then verifies that they match what is documented in the corresponding bag-generated manifest file. - Added DEFAULT_CHECKSUM_ALGORITHM to MCPClient Django settings. - Modified "remove verify premis checksums" migration so that the "Verify AIP" micro-service would point to the new verifyAIP_v1.0 client script as its `execute` value. - Added cosmetic refactoring of databaseFunctions.py::insertIntoEvents.
Sorry I didn't realize that it wasn't really your code what I was looking at! |
defb42c
to
9451abf
Compare
Fixes #918
In short, removes the once-per-file verifyPREMISChecksums_v0.0 micro-service and replaces it with once-per-SIP equivalent functionality in a new verifyAIP_v1.0 client script (which supersedes verifyAIP_v0.0). Question: I have deleted verifyAIP_v0.0 because other superseded client scripts appear to have been deleted. Should it be retained?
The new verifyAIP_v1.0 parses the BagIt manifest file for the checksums that
bag
has just calculated instead of needlessly re-calculating them.The functional consequences of this change are as follows.
AIP METS files will no longer have
fixity check
PREMIS events for each file.The AIP pointer file will now contain a single
fixity check
PREMIS event for the entire AIP:AIPs without pointer files will have no record of the SIP/AIP-level fixity check. This was determined to be an acceptable trade-off and should be remedied once we [begin creating pointer files for uncompressed AIPs] (Problem: uncompressed AIPs need pointer files archivematica-storage-service#324), which is motivated by other considerations as well.