Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perf] Use local schemas if available #307

Merged
merged 2 commits into from
Mar 21, 2024

Conversation

sneakers-the-rat
Copy link
Contributor

@sneakers-the-rat sneakers-the-rat commented Mar 13, 2024

Edit to add relation to prior issues:

(Partial)
Fix: linkml/linkml#866
(Full)
Fix: linkml/linkml#1012

And i also see this was handled in a few different ways in different prior PRs/issues:

This, to me, points to a greater need to simplify and unify the loading behavior, since it seems like we have a patchwork of fixes here that didn't quite reach the root of the problem because the loading behavior is quite complex.

One can validate that network requests are still being made by, well, monitoring network traffic, as well as adding a debug flag just before the hbread printing what it's about to read.


Finally took the time to see what network requests were still happening during normal usage, because i kept hanging both on test runs and also when just trying to use the tool.

Turns out that hbread doesn't use requests (which would be cached during testing) and just directly calls urllib. It also turns out that most of the time we are just requesting types.yaml over and over again, and so we can safely use the local version of the meta schema instead - our local version should always be the one we prefer, since it's tagged to the particular version of linkml_runtime that we're using, as opposed to the URI version which could be any version (ie. would be the most recent version even if we wanted to use an older version of the spec).

edit: this was removed to satisfy a test that needed the fileinfo:

The only thing that looks dicey in here is me also trying to just cache all identical results from reads for a given cache period, and so I am avoiding passing FileInfo to hbread so that we can use lru_cache which requires hashable args.

Perf of request.py(urlopen):
Before: 288.5s (cumulative) 0.8291s per call
This PR: 18.94s (cumulative) 0.789s per call (we make fewer calls is the point)
Difference: -269s (-93%)

Edit: i have no idea why this test is failing - I tried to fix the source schema file and remove the newline at the end of the file, but otherwise i have no idea why it decided to stop printing the filename between 3 hours ago and now. i'll come back in the morning

cache hashable calls to yaml loader if possible
… is apparently necessary for test issue 1040 to pass
Copy link

codecov bot commented Mar 16, 2024

Codecov Report

Attention: Patch coverage is 84.21053% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 62.92%. Comparing base (27b9158) to head (33ca663).
Report is 3 commits behind head on main.

Files Patch % Lines
linkml_runtime/loaders/loader_root.py 72.72% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #307      +/-   ##
==========================================
+ Coverage   62.88%   62.92%   +0.03%     
==========================================
  Files          62       62              
  Lines        8528     8545      +17     
  Branches     2436     2437       +1     
==========================================
+ Hits         5363     5377      +14     
- Misses       2554     2557       +3     
  Partials      611      611              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cmungall
Copy link
Member

Thanks! Yes, we should be phasing out the older hbread code over time

@cmungall cmungall merged commit 3b0f817 into linkml:main Mar 21, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

offline access to core imports Implement caching mechanism for imports
2 participants