Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL 1.2 Functions related to initial text direction and language tags #154

Open
afs opened this issue Sep 13, 2024 · 10 comments
Open

SPARQL 1.2 Functions related to initial text direction and language tags #154

afs opened this issue Sep 13, 2024 · 10 comments
Labels
spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial

Comments

@afs
Copy link
Contributor

afs commented Sep 13, 2024

SPARQL 1.2 Functions for language string literals

Functions:
hasLANG(literal), hasLANGDIR(literal),
LANG(literal), LANGDIR(literal),
STRLANGDIR(xsd:string, xsd:string, xsd:string) , STRLANGDIR(xsd:string, xsd:string, xsd:string).

LANG(literal) is part of SPARQL 1.1 and is extended for rdf:dirLangString.

Accessors:

RDF Term hasLANG hasLANGDIR LANG LANGDIR
"abc"@en true false "en" ""
"abc"@en--ltr true true "en" "ltr"
"abc"@en--LTR true true "en" "ltr"
"abc" false false "" ""
"abc"^^rdf:dirLangString false false "" ""
"abc"^^rdf:langString false false "" ""
"123"^^xsd:integer false false "" ""
<http://example/xyz> error error error error

Constructors:

Constructor Literal
STRLANG("abc", "en") "abc"@en
STRLANG("abc", "") error
STRLANG(123, "") error
STRLANGDIR("abc", "en", "ltr") "abc"@en--ltr
STRLANGDIR("abc", "en", "LTR") error
STRLANGDIR("abc", "en", "") error
STRLANGDIR("abc", "", "ltr") error
STRLANGDIR(123, "", "ltr") error
STRLANGDIR(<x:uri>, "en", "ltr") error

It is possible to write "abc"^^rdf:dirLangString and "abc"^^rdf:langString in N-Triples and Turtle.

The functions hasLang and hasLANGDIR test whether an RDF term has the language tag of initial text direction component. See RDF Concepts, section "Literals". They don't test by datatype.

LANG is in SPARQL 1.1. This determines the choice for LANGDIR when passed a non-literal and the result of LANGDIR(123).

The accessors LANG and LANGDIR return the facet or "" following LANG in SPARQL 1.1.
The argument must be a literal otherwise it is an error.

In these cases, hasLANG/hasLANGDIR is false and the return of LANG and LANGDIR is "".
The facet is not present.

It may be possible to write a literals with text direction but no language tag in some other format (note: for RDF/XML we can require "lang=" if "dir=" is present").

Notes

hasFUNC(arg) is equivalent to FUNC(arg) != "".

The name hasLANG/hasLANGDIR is different in style to isLITERAL etc because the has* tests a component, not the RDF term as a whole.

hasLANG applies to rdf:langString and rdf:dirLangString.

Initial Text direction is canonicalized to lowercase: c.f. langtag being canonicalized in RDF 1.2.

It is not possible to write a literal in Turtle or N-Triples with a text direction but no language tag, nor is it possible to write a literal other than rdf:dirLangString and rdf:langStringwith language tag. These are illegal in RDF Concepts but may be it will occur naturally in other syntaxes as corner cases. The accessors approach works on components and would be well-defined.

@afs
Copy link
Contributor Author

afs commented Sep 13, 2024

Difference to the earlier #113 draft

@hartig
Copy link
Contributor

hartig commented Sep 13, 2024

Makes sense!

@rubensworks
Copy link
Member

For reference, there's an issue open with some concerns about the current text direction approach. If an alternative approach is taken, this will have an impact here as well.
So one option might be to hold off with the work here until w3c/rdf-concepts#79 has been discussed or resolved.

@afs
Copy link
Contributor Author

afs commented Sep 17, 2024

@rubensworks - thanks for pointing out that issue. Nothing is final until the publication of the REC 😄

There is no rush to get text into the SPARQL spec for these functions but at the same time, the WG has made a decision and we can't wait until RDF 1.2 is finalized before doing work.

The function list is my view on what is the natural outcome of the WG decision on initial text direction and the changes in RDF. That includes discussions with the internationalization working group.

Bidirectional text is a much larger problem and I don't see that the WG has decided to take up the issue. The only response I recall is along the lines of "use a content-focused literal" (e.g. rdf:HTML).

JSON-LD has non-normative "base direction". So initial text direction (terminology suggested by i18n IIRC) already exists.

Datatypes have been discussed and problems with them identified. A datatype is a class, and the subclass relationship does not work for scripts (a subclass must be usable in a place where the superclass is valid).

There is nothing to stop use of compound literals. The WG initial text direction decision does not block that nor do do the proposed SPARQL changes.

If the WG takes up w3c/rdf-concepts#79 , things may need to change.

FWIW I think the lack of a way to give a direction to a non-language tagged string is a bit odd. It would need a new datatype, not a munging of xsd:string or rdf:dirLangString. Such a change would fit the proposal here because the functions are accessors to components of RDF literals terms.

(General discussion about initial text direction in RDF 1.2 and on the RDF Concepts issues list please.)

@afs
Copy link
Contributor Author

afs commented Sep 17, 2024

The rdf-tests PR w3c/rdf-tests#135 shows that using LTR is illegal in RDF; it is not forced to lowercase.

Therefore STRLANGDIR(?, ?, "LTR") should be an error.

Constructor Literal
STRLANGDIR("abc", "en", "LTR") error

Table in the description updated.

N.B. Langtags are compared and matched in a case insensitive manner but RDF concepts does not mandate lowercase. Some systems use canonical langtags (e.g. en-GB).

@afs
Copy link
Contributor Author

afs commented Sep 25, 2024

The WG looked at w3c/rdf-concepts#79 at TPAC'24 and resolved:

RESOLUTION: The working group has considered w3c/rdf-concepts#79 and will continue to support initial text direction in RDF Language-Tagged Literals. We will not otherwise consider full bidi.

@pfps pfps added the spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial label Nov 20, 2024
afs added a commit that referenced this issue Dec 27, 2024
Co-authored-by: Thomas Tanon <[email protected]>
Co-authored-by: Olaf Hartig <[email protected]>
@rubensworks
Copy link
Member

rubensworks commented Jan 20, 2025

While implementing base direction support, I realized that (at least) the following string-based functions will also need to have their description updated and/or spec tests amended to cope with directional language strings.

  • CONCAT
  • LCASE
  • REPLACE
  • STRAFTER
  • STRBEFORE
  • SUBSTR
  • UCASE

@afs
Copy link
Contributor Author

afs commented Jan 20, 2025

The argument compatibility rules which apply to STRSTARTS, STRENDS, CONTAINS, STRBEFORE and string literal return type have been updated.

Similarly, LCASE, UCASE .

Examples could do with an additional row but what else? And more tests.

REPLACE - unrelated: needs adding to string literal return text (1.1 omission).

"string literal" needs a proper anchor.

What else?

@rubensworks
Copy link
Member

The CONCAT function for instance has the following text:

If all input literals are literals with the same language tag, then the returned string literal is a literal with that language tag. Otherwise, the returned literal is a literal with datatype xsd:string and no language tag.

This probably needs to be extended towards directions as well.

Functions such as STRBEFORE seem to contain similar descriptions that need updates:

For compatible arguments, if the lexical part of the second argument occurs as a substring of the lexical part of the first argument, the function returns a literal of the same kind as the first argument arg1 (literal with datatype xsd:string, literal with the same language tag).

(I haven't looked into all descriptions in detail yet, my comment above was mainly to raise the need to look into it more detail later)

@afs afs assigned afs and unassigned afs Jan 20, 2025
@afs
Copy link
Contributor Author

afs commented Jan 20, 2025

Sub-issue #180 created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial
Projects
None yet
Development

No branches or pull requests

4 participants