Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query to relate JobRequest to Job failing #28

Open
pbilling opened this issue Jan 31, 2022 · 2 comments
Open

Query to relate JobRequest to Job failing #28

pbilling opened this issue Jan 31, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@pbilling
Copy link
Collaborator

I am seeing that the query to relate (:JobRequest)-[:TRIGGERED]-(:Job) is routinely not succeeding on the first try and being requeued. Looking at the query it seems inefficient and in need of refactoring.

Query from "RelateJobToJobRequest" trigger with an example trellisTaskId:

1 MATCH (b:Blob)-[:WAS_USED_BY]->(j:Job { trellisTaskId: "220128-172700-948-23484cbb" }), 
2    (b)-[:WAS_USED_BY]->(jr:JobRequest {name: j.name}) 
3 MATCH (b2:Blob)-[:WAS_USED_BY]->(jr) 
4 WHERE NOT (jr)-[:TRIGGERED]->(:Job) 
5 WITH j, jr, COLLECT(DISTINCT b) AS jobInputs, COLLECT(DISTINCT b2) AS requestInputs 
6 WITH j, jr, jobInputs, requestInputs, [b in jobInputs WHERE NOT b in requestInputs] AS mismatches, [b in requestInputs WHERE NOT b in jobInputs] AS mismatches2 
7 WHERE size(mismatches) = size(mismatches2) = 0 MERGE (jr)-[:TRIGGERED]->(j)

I've added line numbers to the query to make it easier to talk about. It looks like my goal in writing this query was to match job requests to jobs by identifying those with the same input nodes and then connecting them. First, using the identity of input sets to map requests to jobs seems like a needlessly complex way to implement this. We can see that this also led to an ugly and difficult to decipher cypher query (sorry).

Things I hate about this query in order from most to least egregious:

  1. Back-to-back WITH statements on line 5 and 6.
  2. (2) list creation statements in the WITH statement on line 6.
  3. Math on line 7. Math isn't always bad, but in general it seems to slow down queries.

I should have looked at this query, seen how convoluted it was, and realized I needed to rethink my approach. But sometimes I get so caught up in figuring out how to do something that I forget to ask: should I be doing this?

@pbilling pbilling added the bug Something isn't working label Jan 31, 2022
@pbilling
Copy link
Collaborator Author

The simplest solution seems to be to generate an ID for each JobRequest and then pass that to the job launching function. Then that value can be used for mapping rather than requiring logic (which inputs are shared?) to map requests to jobs.

@pbilling
Copy link
Collaborator Author

JobRequest nodes already have an "eventId" property that references the unique Pub/Sub message that delivered the query. I could create an index on that and use it for mapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant