You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am seeing that the query to relate (:JobRequest)-[:TRIGGERED]-(:Job) is routinely not succeeding on the first try and being requeued. Looking at the query it seems inefficient and in need of refactoring.
Query from "RelateJobToJobRequest" trigger with an example trellisTaskId:
1 MATCH (b:Blob)-[:WAS_USED_BY]->(j:Job { trellisTaskId: "220128-172700-948-23484cbb" }),
2 (b)-[:WAS_USED_BY]->(jr:JobRequest {name: j.name})
3 MATCH (b2:Blob)-[:WAS_USED_BY]->(jr)
4 WHERE NOT (jr)-[:TRIGGERED]->(:Job)
5 WITH j, jr, COLLECT(DISTINCT b) AS jobInputs, COLLECT(DISTINCT b2) AS requestInputs
6 WITH j, jr, jobInputs, requestInputs, [b in jobInputs WHERE NOT b in requestInputs] AS mismatches, [b in requestInputs WHERE NOT b in jobInputs] AS mismatches2
7 WHERE size(mismatches) = size(mismatches2) = 0 MERGE (jr)-[:TRIGGERED]->(j)
I've added line numbers to the query to make it easier to talk about. It looks like my goal in writing this query was to match job requests to jobs by identifying those with the same input nodes and then connecting them. First, using the identity of input sets to map requests to jobs seems like a needlessly complex way to implement this. We can see that this also led to an ugly and difficult to decipher cypher query (sorry).
Things I hate about this query in order from most to least egregious:
Back-to-back WITH statements on line 5 and 6.
(2) list creation statements in the WITH statement on line 6.
Math on line 7. Math isn't always bad, but in general it seems to slow down queries.
I should have looked at this query, seen how convoluted it was, and realized I needed to rethink my approach. But sometimes I get so caught up in figuring out how to do something that I forget to ask: should I be doing this?
The text was updated successfully, but these errors were encountered:
The simplest solution seems to be to generate an ID for each JobRequest and then pass that to the job launching function. Then that value can be used for mapping rather than requiring logic (which inputs are shared?) to map requests to jobs.
JobRequest nodes already have an "eventId" property that references the unique Pub/Sub message that delivered the query. I could create an index on that and use it for mapping.
I am seeing that the query to relate (:JobRequest)-[:TRIGGERED]-(:Job) is routinely not succeeding on the first try and being requeued. Looking at the query it seems inefficient and in need of refactoring.
Query from "RelateJobToJobRequest" trigger with an example trellisTaskId:
I've added line numbers to the query to make it easier to talk about. It looks like my goal in writing this query was to match job requests to jobs by identifying those with the same input nodes and then connecting them. First, using the identity of input sets to map requests to jobs seems like a needlessly complex way to implement this. We can see that this also led to an ugly and difficult to decipher cypher query (sorry).
Things I hate about this query in order from most to least egregious:
I should have looked at this query, seen how convoluted it was, and realized I needed to rethink my approach. But sometimes I get so caught up in figuring out how to do something that I forget to ask: should I be doing this?
The text was updated successfully, but these errors were encountered: