You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When this happens, the interface boots, and user clicks "new graph".
Graph creation then fails (since janus is dead), but not before leaving over some rows in the database.
The user then has a non-functional "graph" name that, if deleted, will leave behind some other resources (solr index namespace). This will cause further errors when re-creating a graph for the same name, even after janus was fixed.
This causes friction between admins and users, since there's no clear indication for the reason of the failure (except for going through the docker logs and looking at stack traces).
Feature description:
The app would benefit from some multiple level of sanity checks:
in docker-compose, use health checks and depends_on: X: service_healthy to make sure the webserver does not start without first having working janus, solr, pg, s3, redis, etc etc
in the webserver, at container boot, attempt to connect to each database type and "ping" it again (e.g. list indexes, list graphs, describe tables) -- and crash container if something was wrong
before significant operations (e.g. create new graph space, create plugin) run the same "ping" operation again (in case something crashed overnight)
Additional context:
The sooner it crashes the sooner we can fix it
The text was updated successfully, but these errors were encountered:
Problem:
It's common for the compose to partially fail:
When this happens, the interface boots, and user clicks "new graph".
Graph creation then fails (since janus is dead), but not before leaving over some rows in the database.
The user then has a non-functional "graph" name that, if deleted, will leave behind some other resources (solr index namespace). This will cause further errors when re-creating a graph for the same name, even after janus was fixed.
This causes friction between admins and users, since there's no clear indication for the reason of the failure (except for going through the docker logs and looking at stack traces).
Feature description:
The app would benefit from some multiple level of sanity checks:
depends_on: X: service_healthy
to make sure the webserver does not start without first having working janus, solr, pg, s3, redis, etc etcAdditional context:
The sooner it crashes the sooner we can fix it
The text was updated successfully, but these errors were encountered: