Leverage `fedora/risearch` for some query operations #17

CanOfBees · 2020-05-29T03:25:58Z

Is your feature request related to a problem? Please describe.
I'd like to think of this as an opportunity to simplify a subset of the interactions between moldybread and fedora. By reducing the number of individual HTTP requests for queries, we may be able provide a better experience for our endusers (us); i.e. possibly speed some operations up.
(See below for some additional context)

I am not convinced that this is a great idea, but I've wanted to talk about it now for several days. Thanks in advance for your thoughts!

Describe the solution you'd like
My specific use-case came up during some post-ingest cleanup for TDH. I needed to pull back the MODS and TEI dsids to do a final audit for some values. When I ran these operations with moldybread (an earlier release! v0.1.5), I ran into problems with fedora (the logs reported 'Too many open files' and I wasn't able to get the dsids serialized). My work-around was to query the fedora/risearch endpoint for the appropriate PIDs, then pull down the corresponding datastreams.

Please note: these queries used the 'tuples' functionality in risearch -- I made multiple attempts at getting the 'triples' query UI to work for me without success.

To pull back all "books" from TDH (as a SPARQL query):

SELECT ?book
FROM <#ri>
WHERE {
  ?book <info:fedora/fedora-system:def/model#hasModel> <info:fedora/islandora:bookCModel> ;
        <info:fedora/fedora-system:def/relations-external#isMemberOfCollection> <info:fedora/collections:tdh> .
}
LIMIT 10

and as a part of an encoded HTTP request (via curl or a similar mechanism, again using SPARQL):

http://localhost:8080/fedora/risearch?type=tuples&flush=false&lang=sparql&format=CSV&limit=10&query=SELECT%20%3Fbook%20FROM%20%3C%23ri%3E%20WHERE%20%7B%20%3Fbook%20%3Cinfo%3Afedora%2Ffedora-system%3Adef%2Fmodel%23hasModel%3E%20%3Cinfo%3Afedora%2Fislandora%3AbookCModel%3E%20%3B%20%3Cinfo%3Afedora%2Ffedora-system%3Adef%2Frelations-external%23isMemberOfCollection%3E%20%3Cinfo%3Afedora%2Fcollections%3Atdh%3E.%20%7D

(see below for documentation on the UI and API)

Describe alternatives you've considered
I think that Mat and I were able to successfully use moldybread the following day without any problems (as in the following section - there are some complex interactions happening inside the stack and I don't think we fully understand what's causing the intermittent errors/failures we see there).

I suppose alternatives could be things like sleeps between the current queries, or re-trying queries on a non-200 response?

Additional context
Anecdotal stuff:
There is some anecdotal evidence that suggests that the fedora API-A/API-M REST endpoint isn't able to deal with a significant number of consecutive requests that utilize some of the querying methods in the endpoint (I want to emphasis the anecdotal -- I think we're still trying to understand some fairly complicated interactions in the fedora stack, and I'm definitely not suggesting that this feature request/idea will be a panacea for our woes).

RISearch links:
Resource Index Search - user interface for tuples
Resource Index Search - API for tuples

Lastly and most importantly: moldybread is a Really Helpful Utility for us. I've been using it off and on for bits and pieces of work and I'm glad we have it available in our toolbox. Thanks for writing it.

The text was updated successfully, but these errors were encountered:

markpbaggett · 2020-05-29T15:59:59Z

@CanOfBees, thanks for suggesting this. I have a couple questions just so I make sure I understand what you're suggesting:

I see that you want to use RISearch to get a list of books. This would in theory add a new way to populate a list of "results" and serve as a replacement for FedoraRequest.populate_results(). After you got a list of results, what would you do to get the actual datastreams? Would you use the Fedora API?

I ask because I'm trying to think about how we'd want to tie this in and what a RisearchConnection type might look like. Right now, we have a few main types; 2 of which are: FedoraRequest and FedoraRecord. FedoraRequest is all about your connection to the FedoraAPI and general things you'd like to do against it. FedoraRequests are more the individual tasks like replacing a DSID or downloading a DSID with the FedoraAPI. Risearch as we're talking about it now is more similar to the GsearchConnection type, but I think it's important that we consider all the ways we might want to leverage Risearch and think about how to integrate it with the moldybread pattern. Are you seeing this mainly as a way to replace FedoraRequest.populate_results() or do you have other thoughts about how this might be used? There are no wrong answers here, and I think fleshing this out can help us move forward.

Regardless, I think we need a separate type for whatever we do. To get started thinking about this, in a separate branch off master, create a file like risearch.nim at src/moldybreadpkg. The fedora.nim file is too big, and we need to break some of the parts into separate files regardless.

In the file, let's go ahead a skeleton out a type. I think we should think about what should be tied to the type. We could think about this as a RisearchRequest or a RisearchConnection. I think there is a subtle difference here are it could influence our attributes. Regardless, it'd look something like this:

type
  RisearchRequest* = ref object
    ## Type to handle requests versus Risearch
    base_url*: string
    client: HttpClient
    request_type*: string
    flush*: bool

The important piece here is starting to think about our methods -- or the actions our RisearchRequest or RisearchConnection might do. If the property or attribute should only be associated with an action and not potentially multiple actions, they should probably be arguments on the method or simply a part of the the method. If they would potentially work for most methods, maybe they should be attributes. I'm not sure of the answer here, but maybe you can help me think about this and how we should implement it.

If you want, I can sketch out a sample module to get us started or if you want to take a stab at this, go for it. Before we get too far though, we really need to think about how Risearch will be used in general to think about where to tie it in, attributes on the type, etc.

Let me know if you want to talk about this synchronously.

CanOfBees added the enhancement New feature or request label May 29, 2020

CanOfBees assigned markpbaggett May 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leverage `fedora/risearch` for some query operations #17

Leverage `fedora/risearch` for some query operations #17

CanOfBees commented May 29, 2020

markpbaggett commented May 29, 2020

Leverage fedora/risearch for some query operations #17

Leverage fedora/risearch for some query operations #17

Comments

CanOfBees commented May 29, 2020

markpbaggett commented May 29, 2020

Leverage `fedora/risearch` for some query operations #17

Leverage `fedora/risearch` for some query operations #17