Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguous attributes should not be accepted ? #2403

Open
vincentkelleher opened this issue Nov 8, 2024 · 10 comments · May be fixed by linkml/linkml-runtime#352
Open

Ambiguous attributes should not be accepted ? #2403

vincentkelleher opened this issue Nov 8, 2024 · 10 comments · May be fixed by linkml/linkml-runtime#352
Labels
bug Something that should work but isn't, with an example and a test case. community-generated developer-days smallish tickets that can be considered "maintenance" and fixed within a single session

Comments

@vincentkelleher
Copy link
Contributor

Describe the bug

In a case where a schema has ambiguous class attributes, the SchemaView.get_slot(...) method should throw an exception rather than returning the first attribute it can find with less details about it.

Here is the test case I used to reproduce the issue :

id: https://examples.org/get-slot-with-attribute#
name: get-slot-with-attribute

prefixes:
  test: https://examples.org/get-slot-with-attribute#

default_prefix: test
default_range: string

classes:
  ClassWithAttributes:
    attributes:
      randomAttribute:
        description: "A random attribute for testing purposes"
        range: integer
        minimum_value: 0
        maximum_value: 999

  ImportantSecondClass:
    description: "Important class to reproduce the error I got as the class loop needs to have at least a 
              second iteration"
    attributes:
      randomAttribute:
        description: "Now you see the ambiguity intensifying ?"
        range: integer
        minimum_value: 0
        maximum_value: 111

In my case, the unexpected behaviour comes from this part of the code base which returns a SlotDefinition with its name only.

I would imagine that attributes are made to bring ease of use but in the case above the user should be forced to define a slot instead through an exception.

What do you think ?

@vincentkelleher vincentkelleher added the bug Something that should work but isn't, with an example and a test case. label Nov 8, 2024
@sierra-moxon
Copy link
Member

+1 from me

@sneakers-the-rat
Copy link
Collaborator

sneakers-the-rat commented Nov 9, 2024

Totally agree. I don't think that we should ever return an attribute there, but returning an empty SlotDefinition that just has the name is just confusing.

I think that method should have attributes and strict removed from the signature and only ever return slots. attributes are only meaningful within the context of a class, and it is unclear to me when someone would want to get any attribute defined within any class by a specific name rather than with something like class_slots

if we wanted to preserve that behavior it seems like it should be in a get_attribute method that similarly does not return slots.

@vincentkelleher
Copy link
Contributor Author

@sneakers-the-rat I always got the impression that attributes were introduced to ease OOP developers into the LinkML/ontology paradigm as getting your head around the fact that a slot is a unique property that can be assigned to any class is complicated at first for us (OOP developers). Meaning that I imagined that attributes were transformed into slots but it's not exactly the case.

I find it hard to justify separating attributes and slots, can someone clarify why attributes exist so we can be on the same page ? 😊

@sneakers-the-rat
Copy link
Collaborator

sneakers-the-rat commented Nov 12, 2024

for me personally as someone who is an outside observer and was not part of 99.999% of the metamodel's creation, i think attributes make both semantic and syntactic sense - bridging OOP and other paradigms that might be more slot-forward is a genuine achievement and needs some glue between the impedance mismatches somewhere, and attributes are a pretty intuitive compromise among possible universes i think.

Semantically, things exist, and things sometimes have properties that are private/unique to them. It makes sense to differentiate between "things that could be true of anything" and "things that can only apply to this thing." I think this is especially true when thinking about the practicalities of linked data, where being able to to incremental refinement is a pretty natural modeling desire, but without a shorthand for "refinement of this thing within the context of this other thing" we would be left in RDF-soup land having to parse a million gradations of a slot disembodied from context. In the world where i was dictator of linkml, i would make it canonical for all slots and attributes used within a class to have implicit uris of base_prefix:Classname.{Slot|Attribute}Name alongside the URI for the slot used outside of the class base_prefix:SlotName with a owl:equivalentProperty for attributes/slots with no changes from e.g. attribute overrides, rules, slot_usage, etc. and rdfs:subClassOf for attributes/slots with changes.

Syntactically i think it also makes sense to differentiate between the two cases above^ (unchanged/vs. changed) as well as from an authorship POV: I myself often switch between defining everything as slots and wanting to define things within the context of a class. I think it can be easy to underrate the lowered cognitive barriers for researchers/data-havers generally who aren't used to the cognitive inversion of "predicates first" in RDF universe, and having a syntactic olive branch back to OOP thinking that has a smooth transition into predicate-land is especially easy to underrate.

Also syntactically, I have a known difference of preference between nominal/structural typing, where linkml is very much a nominally-typed system but i often want to edge towards structural typing (and am often wrong, as i am seeing while implementing rustgen how nice it is to not have to worry about structural type inference). So to me getting rid of the metamodel slot named attributes is different than getting rid of the abstract notion of attributes - i wouldn't mind a declaration like this

classes:
  MyClass:
    slots:
      pure_slot:
      pure_slot_2:
      attribute_or_modified_slot:
        description: "idk i changed it somehow"
        max_cardinality: 5 # always a good idea

or equivalently

classes:
  MyClass:
    slots:
    - pure_slot
    - pure_slot_2
    - name: attribute_or_modified_slot
      description: idk i changed it somehow
      max_cardinality: 5

anyway, getting back to the issue at hand - i can be reductive sometimes in thinking that "schemaview is all that matters" as the thing that materializes schema, but i think here it would definitely be good to make schemaview be a bit of a semantic guide, differentiating between slots as always referring to the schema-level slot-definitions and attributes as always referring to slots-or-attributes when used within the context of a class

@cmungall
Copy link
Member

I'll add something to the FAQ about this. There are nuances here...

@vincentkelleher
Copy link
Contributor Author

@sneakers-the-rat about this sentence:

i think here it would definitely be good to make schemaview be a bit of a semantic guide, differentiating between slots as always referring to the schema-level slot-definitions and attributes as always referring to slots-or-attributes when used within the context of a class

I had that understanding at the beginning but when I started using the JSON-LD context generator I realized that there wasn't any support for the concept of class scoped attributes as attributes are defined at a global level.

That brought me to questioning how LinkML can stay generic/universal: should it stop the generation at the limits of the LinkML model or should the LinkML model evolve to support almost any intricacy of every output generator it proposes at the risk of having dead LinkML markup in certain cases ?

@sierra-moxon sierra-moxon added the developer-days smallish tickets that can be considered "maintenance" and fixed within a single session label Nov 21, 2024
vincentkelleher pushed a commit to vincentkelleher/linkml-runtime that referenced this issue Dec 5, 2024
vincentkelleher pushed a commit to vincentkelleher/linkml-runtime that referenced this issue Dec 5, 2024
vincentkelleher pushed a commit to vincentkelleher/linkml-runtime that referenced this issue Dec 5, 2024
vincentkelleher pushed a commit to vincentkelleher/linkml-runtime that referenced this issue Dec 5, 2024
vincentkelleher pushed a commit to vincentkelleher/linkml-runtime that referenced this issue Dec 5, 2024
@vincentkelleher
Copy link
Contributor Author

FYI I've raised this pull request 👉 linkml/linkml-runtime#352

vincentkelleher pushed a commit to vincentkelleher/linkml-runtime that referenced this issue Dec 23, 2024
@BigBlueHat
Copy link

when I started using the JSON-LD context generator I realized that there wasn't any support for the concept of class scoped attributes as attributes are defined at a global level.

I've just found this same limitation and in looking at linkml/linkml-runtime#352 it seems the aim is to throw errors/warnings on redefinition, but not (yet?) implement scoped contexts?

Generating scoped contexts (typically + @protected) is an essential feature for locking down JSON term use via JSON-LD especially in security focused use cases like Verifiable Credentials.

It'd be great to see scoped contexts as an option ideally with @protected alongside.

Great work on LinkML regardless!
🎩

@sneakers-the-rat
Copy link
Collaborator

@BigBlueHat would you be willing to open a new issue describing what you'd like the JSON-LD generator to do re: scoped contexts/what the application for that is in a bit more detail? Selfishly i'm a bit curious to see how these would be used and how that relates both to security and how we could use it for modeling attributes, but also i think it's a sufficiently different thing from this issue that it's worth standing on its own if it's a general need ppl are having :)

@BigBlueHat
Copy link

@sneakers-the-rat definitely happy to share--as I continue to hunt for tooling to bridge the gaps and ease developer pain...and LinkML gets very close to that already!

Here is a small(ish), but pretty typical example of the sort of context we create regularly:

{
  "@context": {
    "@protected": true,

    "image": {"@id": "https://schema.org/image", "@type": "@id"},

    "BirthCertificateCredential": "https://examples.vcplayground.org/vocabs/birth-certificate-vc-v2#BirthCertificateCredential",

    "BirthCertificate": {
      "@id": "https://examples.vcplayground.org/vocabs/birth-certificate-vc-v2#BirthCertificate",
      "@protected": true,
      "@context": {
        "identifier": "https://schema.org/identifier",
        "children": "https://schema.org/children",
        "parent": "https://schema.org/parent",
        "certifyingAuthority": {
          "@id": "https://examples.vcplayground.org/vocabs/birth-certificate-vc-v2#certifyingAuthority",
          "@type": "@id"
        },
        "registrar": {
          "@id": "https://examples.vcplayground.org/vocabs/birth-certificate-vc-v2#registrar",
          "@type": "@id"
        }
      }
    },

    "Organization": {
      "@id": "https://schema.org/Organization",
      "@type": "@id"
    },

    "Person": {
      "@id": "https://schema.org/Person",
      "@type": "@id"
    },

    "givenName": "https://schema.org/givenName",
    "additionalName": "https://schema.org/additionalName",
    "familyName": "https://schema.org/familyName",
    "gender": "https://schema.org/gender",
    "birthDate": "https://schema.org/birthDate",
    "birthPlace": "https://schema.org/birthPlace",
    "roleName": "https://schema.org/roleName",
    "jobTitle": "https://schema.org/jobTitle"
  }
}

From https://github.com/credential-handler/vc-examples/blob/main/contexts/birth-certificate-vc-v2/v1rc1.json

That is currently used in https://github.com/credential-handler/vc-examples/blob/main/credentials/birth-certificate-vc-v2/credential.json

The above credential's context is:

  "@context": [
    "https://www.w3.org/ns/credentials/v2",
    "https://examples.vcplayground.org/contexts/birth-certificate-vc-v2/v1rc1.json"
  ],

You'll note in the above context that it is itself marked as @protected (to prevent it's terms from being later redefined by other contexts) and that is uses a type scoped context for BirthCertificate.

The LinkML schemas already look pretty similar to that, but if I create something that matches the above context and then generate a context file...the generated one currently lacks all the key affordances I need.

Key issues the generated context has are:

  • no type scoped context generation
  • no @protected control
  • uses prefix heavily (vs. explicit URI term mapping)
  • always includes @vocab (which is a footgun...)

I also wasn't sure how best to do the mapping to external vocabulary terms--as you likely noticed we cherry pick terms from schema.org (or elsewhere) rather than including other (often massive vocabularies with terms way beyond the scope of what's needed).

Relatedly, the resulting data documents all depend on https://www.w3.org/ns/credentials/v2 terms. When I tried generating contexts without a LinkML Schema for that vocabulary, I ended up with VerifiableCredential being defined in the generated context file. So...I'm likely missing something with that last bit. 😸

LinkML seems very close to what I'd like to use to generate what amounts to a mix of context documentation (i.e. how to write the JSON data documents) and generate any additional vocabulary ephemera (for RDF devs who need/want it). That's the ideal blend I'm hoping to find.

Sorry that was lengthy...and maybe a bit off topic. 😉 Happy to continue discussing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that should work but isn't, with an example and a test case. community-generated developer-days smallish tickets that can be considered "maintenance" and fixed within a single session
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

6 participants