-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preparing changes to add format types (classification) from DROID sig file #226
base: develop
Are you sure you want to change the base?
Conversation
@@ -316,6 +316,7 @@ func IsArchive(id string) Archive { | |||
// Clear clears loc and mimeinfo details to avoid pollution when creating multiple identifiers in same session | |||
func Clear() func() private { | |||
return func() private { | |||
identifier.noContainer = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently we need a change like this, but we're not then resetting reports to pronom
so I am not sure which way is correct of if there is still some pollution somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resetting reports to pronom
probably a good idea too. In principle I'd say it's good to add anything to Clear where you are setting config options that change defaults.
@@ -119,7 +119,7 @@ func New(opts ...config.Option) (core.Identifier, error) { | |||
pronom = identifier.ApplyConfig(pronom) | |||
id := &Identifier{ | |||
Base: identifier.New(pronom, config.ZipPuid()), | |||
hasClass: config.Reports() != "" && !config.NoClass(), | |||
hasClass: !config.NoClass(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is all we need to do here when this is possible with the signature file too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest dropping the extra comments added here.
Agree absolutely that anything exported should be commented (and much to do here) but these structs aren't part of the public API (hidden by the "internal" path) & are only exported internally for the sake of unmarshalling (the golang Unmarshal funcs can only assign to exported fields).
Given these structs are all just mappings to the XML file it is hard to say anything else meaningful about them apart from what's said at the top of the file - so in this case I think it is best to just keep this mapping file lean and clean (even if linters scream about it!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if linters scream about it!
That's just it:
C:\Users\Spencer\code\git\siegfried\pkg\pronom>%gopath%\bin\golint ./...
internal\mappings\droid.go:22:6: exported type Droid should have comment or be unexported
internal\mappings\droid.go:29:6: exported type InternalSignature should have comment or be unexported
internal\mappings\droid.go:34:6: exported type ByteSeq should have comment or be unexported
internal\mappings\droid.go:39:6: exported type SubSequence should have comment or be unexported
internal\mappings\droid.go:48:6: exported type Fragment should have comment or be unexported
internal\mappings\droid.go:55:6: exported type FileFormat should have comment or be unexported
... ~45 more warnings ...
So, as with other code you may have noticed, I follow the process of -- if I am in the code, I try to add placeholder comments so that we can say more in future, or see fewer warnings. It's probably more idiomatic not to have any comment to have some comment rather than none, but maybe there are some project level settings that makes the intent to not address these easier for maintainers? I.e. something a linter can pick up and run with?
Given these structs are all just mappings to the XML file it is hard to say anything else meaningful about them apart from what's said at the top of the file
My documentation isn't that useful, but I don't agree everyone has the same knowledge about these objects that means it's a given this resource doesn't add more. Godoc defaults its view to show the layout and comments on internal packages: https://pkg.go.dev/github.com/richardlehane/[email protected]/pkg/pronom/internal/mappings which does lead to interesting possibilities in digipes.
I'll have a think about some sort of pre-commit settings and rules to ignore internal packages, but otherwise this is a simple enough commit to drop when I rebase later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was a discussion of this on the golint
issues, e.g. win-t feels my pain: golang/lint#186 (comment)
golint
is deprecated now. I'm using staticcheck
for my linter these days which I don't think nags about this (though it does have all sorts of other "opinions" about our code!).
we should agree on a common linter and document on the wiki: e.g. a short style guide in the contributors' docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable to me and I'm very happy to help draft/review. I keep a number of references/ideas in this area which may be useful.. I'll check out static check as well and see what tooling it offers.
thanks @ross-spencer, nice to be prepped for a change to the droid file. Did you get a sense from the TNA crew that this might be on the horizon? |
No problem. And yes, I've been putting the case forward at the PRONOM bi-weeklys and Francesca has been following up their end. Last update is they'll likely reach out to you soon (a month? a few months maybe?) to ensure there are no other incompatibilities in the code here for any proposed changes to the signature file. There should be some other interesting things coming from that change too - but I won't ruin the surprise! (NB. shouldn't result in any further code changes). |
@@ -12,30 +12,40 @@ | |||
// See the License for the specific language governing permissions and | |||
// limitations under the License. | |||
|
|||
// Package mappings contains struct mappings to unmarshal three | |||
// Package mappings contains struct mappings to unmarshall three |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While totally inconsequential, I just wanted to share my observation that the previous spelling was already the correct one (see https://en.wiktionary.org/wiki/unmarshal)?
Update from the PRONOM team is that this is likely to appear: 27th/28th/29th November- will be the same content as v.115 but with the addition of the formattypes tag -- so I will look at correcting these issues this weekend or next weekend. I have been experimenting with |
35aa4ca
to
a8c5b49
Compare
hi @richardlehane I think the necessary changes are in here now (including reverting the docs), a8c5b49 Changes are based on the latest sample DROID file: https://github.com/digital-preservation/PRONOM_Research/blob/0d0869ba854baf44e25e9f8543f4f0c4dc98273c/Test%20Releases/DROID_SignatureFile_Classification_V2.xml -- and no further changes have been discussed in the PRONOM bi-weekly. It looks like format type/classification will appear as an attribute, so: <FileFormat FormatType="Image (Raster), Dataset" ID="1807" Name="Nearly Raw Raster Data"
PUID="fmt/1002" Version="1">
<InternalSignatureID>1357</InternalSignatureID>
<Extension>nrrd</Extension>
</FileFormat> or: <FileFormat FormatType="Audio, Video" ID="655" MIMEType="video/x-msvideo"
Name="Audio/Video Interleaved Format" PUID="fmt/5">
<InternalSignatureID>51</InternalSignatureID>
<Extension>avi</Extension>
<HasPriorityOverFileFormatID>2741</HasPriorityOverFileFormatID>
</FileFormat> Building with noreports: ---
siegfried : 1.11.0
scandate : 2023-11-05T19:41:35+01:00
signature : default.sig
created : 2023-11-05T19:41:19+01:00
identifiers :
- name : 'pronom'
details : 'DROID_SignatureFile_V114.xml; container-signature-20230822.xml; built without reports'
---
filename : 'testdata/skeleton-suite/fmt/fmt-1002-signature-id-1357.nrrd'
filesize : 9
modified : 2023-09-16T22:53:24+02:00
errors :
matches :
- ns : 'pronom'
id : 'fmt/1002'
format : 'Nearly Raw Raster Data'
version : '1'
mime :
class : 'Image (Raster), Dataset'
basis : 'extension match nrrd; byte match at 0, 9'
warning : and reports: ---
siegfried : 1.11.0
scandate : 2023-11-05T19:42:18+01:00
signature : default.sig
created : 2023-11-05T19:42:07+01:00
identifiers :
- name : 'pronom'
details : 'DROID_SignatureFile_V114.xml; container-signature-20230822.xml'
---
filename : 'testdata/skeleton-suite/fmt/fmt-1002-signature-id-1357.nrrd'
filesize : 9
modified : 2023-09-16T22:53:24+02:00
errors :
matches :
- ns : 'pronom'
id : 'fmt/1002'
format : 'Nearly Raw Raster Data'
version : '1'
mime :
class : 'Image (Raster), Dataset'
basis : 'extension match nrrd; byte match at 0, 9'
warning : Looks equivalent. New tests pass as anticipated. Let me know if it needs further revision. I can rebase into a single commit once it looks okay. Not sure how to handle replacing the test signature file when time comes to merge? I think it will eventually just get replaced with the new one? |
thanks @ross-spencer this all looks great to me! |
Changes to DROID signature file parsing to enable reading of file format types (FormatType). 'FormatType' is included in soon to be released DROID signature files as a 'FormatType' attribute within the FileFormat elements of the FileFormatCollection section of the signature file.
a8c5b49
to
b89554e
Compare
Note from TNA but they haven't managed to export the changes on their side just yet, from how it sounds it seems like there's an additional process happening during signature export/publication that's removing the expected data. It's something they're looking at. |
A handful of changes left over from #209 that weren't merged plus the completion of tests to back up some of those assertions.
Code changes assume the use of an element for file format types in a DROID signature file, but can be easily changed to an attribute. We'll rebase/edit the commit if that happens. Tests should still work. The third commit should be dropped before merging.
Obviously following review there may be more changes to wrap into this.
assumed XML format:
Connected to #207