Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a User I want to get results in the same order as they were in the input #13

Open
dimus opened this issue Dec 3, 2020 · 10 comments
Open

Comments

@dimus
Copy link
Member

dimus commented Dec 3, 2020

I think it is possible by setting up a queue using linked list and a listener.

@dimus dimus transferred this issue from gnames/gnverifier Dec 5, 2020
@dimus dimus transferred this issue from gnames/gnames Dec 5, 2020
@dimus dimus added this to the Iter 11 milestone Dec 5, 2020
@dimus dimus modified the milestones: Iter 11, iter 12 Dec 6, 2020
@dimus
Copy link
Member Author

dimus commented Dec 7, 2020

Looks like this approach is not failproof, I am going to leave it in favor of just running one job.

@dimus dimus closed this as completed Dec 7, 2020
@dimus dimus added the wontfix This will not be worked on label Dec 7, 2020
@dimus
Copy link
Member Author

dimus commented Dec 14, 2020

Having 2nd attempt

@dimus dimus reopened this Dec 14, 2020
@dimus dimus removed the wontfix This will not be worked on label Dec 14, 2020
@dimus dimus changed the title As a User I want to get results in the same order as they were in input As a User I want to get results in the same order as they were in the input Dec 15, 2020
@dimus dimus modified the milestones: iter 12, iter 13 Dec 15, 2020
@dimus dimus added this to the iter 18 milestone Jan 26, 2021
@dimus dimus modified the milestones: iter 18, iter 19 Feb 1, 2021
@dimus dimus modified the milestones: iter 19, Iter 21 Feb 17, 2021
@thompsonmj
Copy link

As an alternative, it could allow a second column in the input file to specify an input unique ID to enable matching after the job completes.
e.g.

f8158395-d663-4f0f-b38d-1c6ecb16a8ca, "g:Homo sp:sapiens"

And the response could include:

{
  "responseId": "16f235a0-e4a3-529c-9b83-bd15fe722110", # Potentially change from "id" for added clarity
  "inputId": "f8158395-d663-4f0f-b38d-1c6ecb16a8ca",
  "name": "Homo sapiens",
  "cardinality": 2,
  "matchType": "FacetedSearch",
  ...

This way, an explicit map between the inputs and responses can be maintained, which could be preferable to simply matching both by indices.

@dimus
Copy link
Member Author

dimus commented May 23, 2024

Hi @thompsonmj , thank you for the feedback! Do you mean using postprocessing to sort results by the unique IDs?

@thompsonmj
Copy link

Not necessarily to sort, but postprocessing to match the results to each input query string.

I assume the desire to keep order of results identical to the order of query strings would be to match them together.

Since the query string (e.g. "g:Homo sp:sapiens" or "n:Homo sapiens" or "Homo sapiens" or "tx:Animalia sp:sapiens") give different values for "name:" in the response, it isn't clear how to map responses back to query strings since the "id:" UUID is created based on the "name:" field rather than the query string.

@dimus
Copy link
Member Author

dimus commented May 23, 2024

I think I did understand your point @thompsonmj. Do I understand correctly, that you use command line gnfinder tool?

@thompsonmj
Copy link

thompsonmj commented May 23, 2024

Yep! Via the Docker container. It is extremely fast for long lists of names, which is quite nice.

Sorry, misread your reply. I use the CLI tool gnverifier.

@dimus
Copy link
Member Author

dimus commented May 24, 2024

Oups, my bad, was working on gnfinder yesterday and made a typo. I did mean gnverifier of course @thompsonmj.

Looks like you use file with names in a way I did not expect. I did not think it would be useful for people to run file with 'FacetedSearch' names in bulk, because quite often such searches return a lot of results and would probably require a human manual intervention to separate useful results from the bulk. Good to know that such usecase exist!

@dimus
Copy link
Member Author

dimus commented May 24, 2024

can you check if gnverifier -j 1 ... does the trick for you @thompsonmj?

@thompsonmj
Copy link

Yes, setting just 1 job gives results back in the same order as they were entered. At ~200 names/sec, the speed is still excellent even for long lists!

Though I feel multiple concurrent jobs with the ability to map results to input strings would be helpful if #115 (optional vernacular names) gets implemented for gnverifier. With the Global Names Resolver API, asking for vernaculars adds considerable overhead. Assuming a similar performance penalty would come with vernaculars in gnverifier, the speed boost from concurrent jobs would be helpful to offset this, and mapping would be needed if the order becomes mixed up.

Looks like you use file with names in a way I did not expect ...

We have a long list of organisms with a wide variety of taxonomic specificity that we want to get fully resolved taxonomic hierarchies for. Our preferred data source is GBIF, but they don't show up in all results, so we're doing some further reconciliation among tied top scoring results in those cases. I'm still determining the best way to get results to be as pinpointed as possible using the gnverifier advanced queries based on the info we have for each organism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants