Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-2370: Improve validation of dataset graph names #2373

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

kinow
Copy link
Member

@kinow kinow commented Mar 27, 2024

GitHub issue resolved #2370

Pull request Description:

@afs, @jmkeil, first stab at #2370. Had a look at 1 and 2, and I saw some parts of the text mentioning "percent-encoded" in requests, and that the backend is supposed to return 400, as it's doing now. From quickly reading those two (skipping parts, trying to find info for validation), I think

  1. empty is valid
  2. spaces are no-no
  3. otherwise it must be a URI
  4. encoded URL is no-no, it must be a valid URI, without encoded parts

I extended the current validation to also check 3 and 4 now. Does that sound right?

This is a draft until I confirm that the fix looks good, then I will either write the tests, or modify the PR again to move the code to some JS module as that a lot easier to write unit-test against, instead of testing code buried somewhere in Vue code/components.

Cheers


  • Tests are included.
  • Documentation change and updates are provided for the Apache Jena website
  • Commits have been squashed to remove intermediate development commit messages.
  • Key commit messages start with the issue number (GH-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.


See the Apache Jena "Contributing" guide.

Footnotes

  1. https://www.w3.org/TR/2013/REC-sparql11-http-rdf-update-20130321/#http-post

  2. https://www.ietf.org/rfc/rfc3986.html

@kinow
Copy link
Member Author

kinow commented Mar 27, 2024

"" → OK
" " → ERROR
"snoopy" → ERROR
"http://example.org/dataset" → OK
"http://example.org/dataset#graph" → OK
"http://example.org/dataset" → OK
"http://example.org/dataset#graph?Aaa" → OK
"http://example.org/dataset%23graph" → ERROR

@kinow kinow self-assigned this Mar 27, 2024
@kinow kinow requested a review from afs March 27, 2024 12:09
@jmkeil
Copy link
Contributor

jmkeil commented Mar 27, 2024

Any valid absolute IRI should be permitted, including other schemes than http/https, like urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66.

"http://example.org/dataset%23graph" → ERROR

I think this should be OK too, as IRIs serve as identifiers in this case and therefore the Simple String Comparison for equivalence apples. (But @afs probably knows for sure.)

Maybe better leaf it up to the back-end, as it already has sophisticated methods to check that?

@kinow
Copy link
Member Author

kinow commented Mar 27, 2024

Any valid absolute IRI should be permitted, including other schemes than http/https, like urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66.

Haven't tried on this PR, but a quick test on the browser console accepts "urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66" too.

new URL("urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66")
URL { href: "urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66", origin: "null", protocol: "urn:", username: "", password: "", host: "", hostname: "", port: "", pathname: "uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66", search: "" }

I think this should be OK too, as IRIs serve as identifiers in this case and therefore the Simple String Comparison for equivalence apples. (But @afs probably knows for sure.)

👍

Maybe better leaf it up to the back-end, as it already has sophisticated methods to check that?

In some other systems I've seen the UI sending a request to the backend to validate (e.g. Jenkins does that)... this way validation was only done in the backend, and the UI simply relied on the response being OK or NOK + the error explanation.

It's harder to test without the backend, and has an extra cost on the server, but could be a solution here too.
So that should pass the validation without issues.

Happy to implement either way. I think this new version at least improves the previous one. Not sure if there are some corner cases covered by the backend that are not handled correctly in JS though...

Thanks @jmkeil !

@afs afs added the Fuseki UI label Mar 28, 2024
@jmkeil
Copy link
Contributor

jmkeil commented Apr 2, 2024

Not sure what would be the best solution. But the simplest solution I could think of would be to just do the request with the encoded graph name IRI and the back-end would complain, if it is invalid. Of course, that would have the draw back that the user might send a large payload only to learn about an invalid input. This could be avoided by sending a trial request with empty payload to test the graph name IRI. That way, no changes of the back-end are needed. Would that be feasible from the testing-perspective?

@kinow
Copy link
Member Author

kinow commented Apr 2, 2024

Would that be feasible from the testing-perspective?

I think that's doable too. Let's wait to see if @afs has other ideas or suggestions.

Thanks @jmkeil !

@kinow
Copy link
Member Author

kinow commented Aug 3, 2024

ping @afs , @rvesse ☝️ Don't recall much about the context here, but reading the comments, looks like I was waiting for other opinions on the best approach here. Any thoughts on this?

@afs
Copy link
Member

afs commented Aug 5, 2024

Hi @kinow,

Reading back ...

If the client is doing checking and it will give better error messages. e.g. no actual spaces !!!
The server is going to check anyway but all it can say to a request is "no" or "yes".

If the client side parses the URI with uri-js that should be good. It is RFC 3986 compliant.

Addition checks for common mistakes (spaces!) are nice - better error messages.

Aside :: On the server side, Java java.net.URI is not good - it's more like the old RFC 2396, not RFC3986. Jena does not use it.

Percent should be allowed end-to-end - a URI really does have %-2-3 in it. The URI parser should be happy then when sent the request should be making it a safe - avoid applying the algorithm twice, once in Jane JS code and once by the HTTP client. The receiving code should decode the request properly once.

}
// If it reached this part, then it's a valid graph name.
return true
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a separate module/function to make it easier to test/isolate the code.

@@ -60,7 +60,7 @@
placeholder="Leave blank for default graph"
/>
<div class="invalid-feedback">
Invalid graph name. Please remove any spaces.
Invalid graph name. Please remove any spaces, and provide a valid URI (RFC 3986).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text had the part about encoded values... I removed it as in reality the error from the SPARQL docs (http%3A//www.example.com/other/graph) is captured in the new URL call. I believe the JS URL follows this doc, which references RFC 3986 & 3987 https://url.spec.whatwg.org/

const isValidGraphName = validateGraphName(graphName)
const formValidationClass = isValidGraphName ? 'is-valid' : 'is-invalid'
this.graphNameClasses = ['form-control', formValidationClass]
return isValidGraphName
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the code in this UI component is dumb, and the logic is in the pure JS module (easier to test, etc.)

expect(validateGraphName(graphName), `Rejected valid graph name "${graphName}"`).to.equals(true)
}
})
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afs, @ jmkeil I trust with this unit test it will be easier to review the code and decide if the new validation is looking good or not 🙂

@kinow kinow marked this pull request as ready for review October 12, 2024 14:20
@kinow
Copy link
Member Author

kinow commented Oct 12, 2024

Rebased, updated the code to simplify review... it should be ready for review again, @jmkeil , @afs.

@kinow
Copy link
Member Author

kinow commented Oct 12, 2024

UI when you visit the page to upload data:

image

When you enter a valid graph name:

image

And when you enter an invalid graph name:

image

There's one thing pending, and it's to block the "upload now" of each individual file. I can do that and add another test before it's merged, but we can review the validation for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fuseki UI Add Data Dataset graph name not escaped
3 participants