-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create ADR to propose new field for host institution #133
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these flurry of ADRs and data model discussions, I found @JPrevost's comments helpful in thinking about data model solutions and change over time. Specifically, how we might have an institution
field at the GraphQL layer, and while it may point to TIMDEX.institution
for now, if we choose the move where that data is stored that's just a mapping change at the GraphQL layer. It gets more complicated when you introduce objects and filtering by sub-fields.
For this reason, I approve this ADR. Creating a top level field called institution
is simple and meets the current need. Thanks for the write-up @jonavellecuerdo!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like the more straightforward solution and the easiest to work with on the UI side. Thanks for the write-up, Jonavelle!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Graham and Adam, good writeup!
|
||
For GIS records, the UI can directly reference the values from `TimdexRecord.institution` to display the host institution for a resource on [Geodata](https://geodata.libraries.mit.edu/). | ||
|
||
For other TIMDEX sources, given the decision outlined above, there should be little to no effect. It could be beneficial to revisit these sources and see if there is a field we can map to that was previously ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly nervous that we don't know yet of other sources will use this field. I'm okay if this is only a useful field for GIS at this time, but the nervousness comes from not knowing if we want to move other things here that indeed make the label of it less good (i.e. Institution
is perfect for GIS, but metadata_provider
would work for both GIS and additional sources if we considered mappings more broadly before adding this field. Or not!).
I'd really prefer if we could do the investigation to be able to take a stronger stance here. Either "we don't think this field will be used by any of our current sources so there is no risk" or "we think we'd map fields X from MARC and Y from DSpace and they work perfectly with this plan".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for articulating this @JPrevost, I agree.
Reading this ADR, it felt like a top level, single string field was flexibile enough that we could both modify how it's stored in the data model, and how it's retrieved in the API layer, as needed. But if it's immediate goal is to support the GIS data, as a thought experiment, why don't we call it gis_data_resource_provider
? Obviously, that's too specific; so what then do we mean by institution
? are we expecting other sources to potentially use this?
Though I already approved, I think the ADR would benefit from discussion about what institution
means, and how other sources may, or may not, use it. Going to leave some inline comments, now that I reread it through this lens.
|
||
## Decision | ||
|
||
**A new field called "institution" is added to the TIMDEX data model** (i.e., `TimdexRecord.institution`). This field will denote--as its name implies--the institution or organization that provides access to the resource described in the TIMDEX record. This field exists at the top level of the TIMDEX data model, making it easily accessible for referencing in the UI or querying. This solution also avoids using existing fields (discussed below) in ways that obscure the TIMDEX data model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a closer reread, I do think this does a good job at proposing what this field "means":
"This field will denote--as its name implies--the institution or organization that provides access to the resource described in the TIMDEX record"
But, if we are thinking of the reusability or naming of this new field, perhaps that "access" word is important. Is it possible this is really about access primarily? If so, what if the field were something along the lines of access_provider
? Or even just provider
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think provider
is an interesting concept to consider here. For GIS it seems to map clearly because the provider of the metadata and provider of the resource are generally the same from the TIMDEX perspective.
It seems to get messier for subscription resources that might appear in Alma (which might either help us refine the intent of this field... even if our finding is that this field is not to be used for vended resources :) ). Would the provider
be the vendor that supplies the content or the subscriber to the content on behalf of the user. i.e. MIT subscribes to an Ebsco database so our users can access the content. Is MIT the provider
or is ebsco
the provider. As a consumer service, TIMDEX API should likely take the user perspective?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a consumer service, and a discovery one at that, I suppose I'd expect provider
to indicate what instituion/organization/company is providing physical or digital access to the resource.
...i.e. MIT subscribes to an Ebsco database so our users can access the content. Is MIT the provider or is ebsco the provider.
I like this example because I think a) provider="Ebsco"
would be ideal, but b) I think that level of provider granularity may vary considerably from source-to-source. Examples:
- Libguides / Research Database
- I've confirmed that OAI-PMH records offer no indication of the "leaf" node of access
- Research Databases: might consider a default of
provider="MIT"
- LibGuides: we could say
provider="Springshare"
- DSpace
- feels like we could default to
provider="MIT"
for all records
- feels like we could default to
- Alma
- probably the most complex, where records may have varying degrees of granularity here
- ArchivesSpace
- perhaps another good example where
provider="MIT"
default makes sense
- perhaps another good example where
Don't want to dig too far into the logic, as provider
is just a consideration at the moment. But unless we explicitly know -- from a consumer / discovery POV -- maybe a default of provider="MIT"
wouldn't be inaccurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may lean more towards provider=unknown
for things like Alma/LibGuides that we aren't sure on... and then take that unknown data to the metadata team and ask if they could make that clearer in the records or how we could better detect it/supplement records from non-alma sources, etc. The reason I'd suggest not defaulting to "MIT" is that would make it very difficult to understand which are actually MIT and which are 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, makes sense. And maybe to get a bit technical, we could do provider=NULL
? Understanding that the absence of a field value means we don't know for certain?
And clarification: when you say "LibGuides" above, do you mean "ResearchDatabases" (AZ list)? As for "LibGuides" (they are distinct sources in TIMDEX), it feels like we do know that the provider is always Springshare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I read misread and thus misrepresented. LibGuides is probably either MITL or Springshare. The content is curated/created by MITL but using the Springshare platform.
Research Databases is what I meant to be replying to with the "unknown" or your follow on idea of just not setting it (i.e. null).
**Transformer:** [Ead](https://www.loc.gov/ead/tglib/appendix_d.html) | ||
**Source(s):** MIT ArchivesSpace (aspace) | ||
* [`<publicationstmt><publisher>`|`<bibliography><bibref><imprint><publisher>`](https://www.loc.gov/ead/tglib/elements/publisher.html): When used in the publication statement, the name of the party responsible for issuing or distributing the encoded finding aid. Often this party is the same corporate body identified in the `<repository>` element in the finding aid. When used in a Bibliographic Reference <bibref>, the name of the party issuing a monograph or other bibliographic work cited in the finding aid. | ||
* [`repository`](https://www.loc.gov/ead/tglib/elements/repository.html): The institution or agency responsible for providing intellectual access to the materials being described. Although the repository providing intellectual access usually also has physical custody over the materials, this is not always the case. When it is clear that the physical custodian does not provide intellectual access, use <physloc> to identify the custodian and <repository> to designate the intellectual caretaker. When a distinction cannot be made, assume that the custodian of the physical objects also provides intellectual access to them and should be recognized as the <repository>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if mapping EAD repository
to this new institution/provider
field might make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think EAD's definition for repository
lines up with what we've been defining as a provider
, so I'm okay with this as well!
Hi folks! The last three commits are in response @JPrevost 's comments above. In short, updates are made to the following section:
Let me know what you think! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment about what I think is an outdated instance of TimdexRecord.institution
. That small change aside, I'm still 👍 on this. Thanks for adding this context about non-geo sources!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the spirit of "hopefully this is a widely applicable and longterm solution, but acknowledging future information may require reworking it", I approve of this approach to add a new top level provider
field.
I think that it will be important to think deeply in Transmogrifier source transformations about the difference between a resource's "publisher" and this proposed "provider", but they do strike me as meaningfully distinct. And as touched on in comments, err on the side of setting this field only when confidently known or explicitly stated in the record.
In support of this decision, I would also note the field name parity with the Aardvark field schema_provider_s
, which is defined as:
"To clarify which organization holds the resource or acts as the custodian for the metadata record and to help users understand which resources they can access."
They acknowledge this subtle but important dual purpose of the field where it may denote the institution that holds the actual resource, or just the custodian of the metadata record. Projecting this "provider" mentality to TIMDEX, I would imagine we utilize either where appropriate for this field, but could lean into the "holds the resource" side of things. Also noting that their definition of a "provider" very clearly does not imply who created the resource; it's very much access/discovery oriented.
8e8b497
to
669e7b4
Compare
669e7b4
to
6d31ee0
Compare
Purpose and background context
This PR introduces a new Architecture Decision Record (ADR) that proposes to add a new
institution
field to the TIMDEX data model.This discussion was kicked off when seeking to understand what TIMDEX fields to use for displaying the host institution on GeoData's "result" and "full record" pages for non-MIT GIS resources.
Includes new or updated dependencies?
NO
Changes expectations for external applications?
YES
What are the relevant tickets?