[perf] Use local schemas if available #307
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Edit to add relation to prior issues:
(Partial)
Fix: linkml/linkml#866
(Full)
Fix: linkml/linkml#1012
And i also see this was handled in a few different ways in different prior PRs/issues:
This, to me, points to a greater need to simplify and unify the loading behavior, since it seems like we have a patchwork of fixes here that didn't quite reach the root of the problem because the loading behavior is quite complex.
One can validate that network requests are still being made by, well, monitoring network traffic, as well as adding a debug flag just before the
hbread
printing what it's about to read.Finally took the time to see what network requests were still happening during normal usage, because i kept hanging both on test runs and also when just trying to use the tool.
Turns out that
hbread
doesn't userequests
(which would be cached during testing) and just directly calls urllib. It also turns out that most of the time we are just requestingtypes.yaml
over and over again, and so we can safely use the local version of the meta schema instead - our local version should always be the one we prefer, since it's tagged to the particular version oflinkml_runtime
that we're using, as opposed to the URI version which could be any version (ie. would be the most recent version even if we wanted to use an older version of the spec).edit: this was removed to satisfy a test that needed the fileinfo:
Perf of
request.py(urlopen)
:Before: 288.5s (cumulative) 0.8291s per call
This PR: 18.94s (cumulative) 0.789s per call (we make fewer calls is the point)
Difference: -269s (-93%)
Edit: i have no idea why this test is failing - I tried to fix the source schema file and remove the newline at the end of the file, but otherwise i have no idea why it decided to stop printing the filename between 3 hours ago and now. i'll come back in the morning