Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demonstration of Software output via Siegfried YAML #152

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ross-spencer
Copy link
Collaborator

This is a conversation starter around what can be added to the Siegfried output via Wikidata. It is also a demonstration of how to do that. The example might need some work as it modifies the Siegfried writer and I think if special cases need handling within the writer, more work might be needed to make that a properly extensible effort moving forward. That is also a conversation.

Example:

---
siegfried   : 1.9.1
scandate    : 2020-11-15T11:37:07-05:00
signature   : default.sig
created     : 2020-11-15T11:25:27-05:00
identifiers : 
  - name    : 'wikidata'
    details : 'wikidata-definitions-2.x.x (2020-11-15)'
---
filename : 'skpro/test1'
filesize : 10
modified : 2020-07-08T23:41:53-04:00
errors   : 
matches  :
  - ns       : 'wikidata'
    id       : 'Q27596100'
    format   : 'Windows Bitmap, version 1'
    URI      : 'http://www.wikidata.org/entity/Q27596100'
    mime     : 
    basis    : 'byte match at 0, 10'
    source   : 'PRONOM (Wikidata) (source date: 2017-08-08)'
    warning  : 'extension mismatch'
    software : 
        Converseen: http://www.wikidata.org/entity/Q97012479
---
filename : 'skpro/test6'
filesize : 8
modified : 2020-07-08T23:53:57-04:00
errors   : 
matches  :
  - ns       : 'wikidata'
    id       : 'Q4045294'
    format   : 'New Executable'
    URI      : 'http://www.wikidata.org/entity/Q4045294'
    mime     : 
    basis    : 'byte match at [[0 2] [6 2]]'
    source   : 'Wikidata reference is empty'
    warning  : 'extension mismatch'
    software : 
        Windows 8: http://www.wikidata.org/entity/Q5046
        Windows 7: http://www.wikidata.org/entity/Q11215
        Windows 98: http://www.wikidata.org/entity/Q483132
        Windows 10: http://www.wikidata.org/entity/Q18168774
---
filename : 'skpro/test9'
filesize : 35
modified : 2020-07-08T23:53:34-04:00
errors   : 
matches  :
  - ns       : 'wikidata'
    id       : 'Q27596325'
    format   : 'Windows Bitmap, version 4'
    URI      : 'http://www.wikidata.org/entity/Q27596325'
    mime     : 
    basis    : 'byte match at 0, 35'
    source   : 'PRONOM (Wikidata) (source date: 2017-08-08)'
    warning  : 'extension mismatch'
    software : 

TODO

  • Should SF support more complex nested YAML results?
  • JSON and CSV output might need to be improved (or abstracted for Wikidata?)
  • Possibly need a more generic writer pattern.
  • Is a plain list--type okay? e.g. just IRIs?
  • Code clean-up.
  • Anything else?

More to come...

We ask to retrieve the IRI and label for software that can read any
format harvested from Wikidata.
Upped to 2.x.x for lack of a better idea...
@richardlehane
Copy link
Owner

richardlehane commented Nov 16, 2020

Thanks Ross - this is an interesting POC!

If it's desirable to have structured data within results, I'd suggest starting higher in the stack and look at the Values() method in the Identification interface, which currently just returns a slice of strings. Making the change here would lead to a cleaner implementation in the writers, without the need for special casing.

But what would you change it to? For the software use case, it would need to be a map, because each of the software items has keys (the software name) and values (the Q reference). Perhaps you could introduce a new Value interface which would be a string normally, but could also be a map or a list (or even an int or other things down the line)? I.e. a Values() []Value signature.

A simplification would just be to say that you can either have a single string or a list of strings. I.e. a Values() [][]string signature.
This would be easier to implement. But it would be less expressive in your software case and you'd have to accept:

software:
    -     Windows 8 (http://www.wikidata.org/entity/Q5046)
    -     Windows 7 (http://www.wikidata.org/entity/Q11215)
    -     Windows 98 (http://www.wikidata.org/entity/Q483132)
    -     Windows 10 (http://www.wikidata.org/entity/Q18168774)

But if I guess you are compromising like that, you could also just do something like:

software : Windows 8 (http://www.wikidata.org/entity/Q5046); Windows 7 (http://www.wikidata.org/entity/Q11215); Windows 98 (http://www.wikidata.org/entity/Q483132); Windows 10 (http://www.wikidata.org/entity/Q18168774)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants