Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relationship queries have to be routinely retried #16

Open
pbilling opened this issue Apr 19, 2020 · 1 comment
Open

Relationship queries have to be routinely retried #16

pbilling opened this issue Apr 19, 2020 · 1 comment
Labels
bug Something isn't working enhancement New feature or request

Comments

@pbilling
Copy link
Collaborator

Relationship queries currently follows this pattern:

MATCH (j:Job {trellisTaskId: 123}),
              (n:Blob {trellisTaskId: 123, id: 123})
WHERE NOT EXISTS(j.duplicate)
OR NOT j.duplicate=True
MERGE (j)-[:OUTPUT]->(n)

The problem with this is that requires that job and output nodes be added to the database in a synchronous fashion. In cases where the output is added before the job, the current solution is to wait a few seconds and then retry the relationship query (n) amount of times.

This is bad design because 1) it violates the asynchronous nature of the system and 2) even after multiple retries there are cases where the conditions for adding the relationship (i.e. job node is present) are still not met. Additionally, the retry queries increase the load on the database.

Solution to this should be straightforward; instead of matching the job node, just merge it. For example:

MATCH (n:Blob {trellisTaskId: 123, id: 123})
MERGE (j {trellisTaskId: 123})-[:OUTPUT]->(n)

The trade-off is that we lose the MATCH pattern that ensures that duplicate jobs are not related to outputs, but it's not clear how valuable this was in the first place. And now that duplication rates have been reduced, it's even less useful.

@pbilling pbilling added bug Something isn't working enhancement New feature or request labels Apr 19, 2020
@pbilling
Copy link
Collaborator Author

Monitoring dashboard heatmap showing function execution time/time. The functions with a 5 sec runtime represent retried database queries.

Function Execution times  SUM  (2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant