Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(iatlas): unable to run iatlas-data:serve-detach #2561

Merged
merged 3 commits into from
Mar 13, 2024

Conversation

tschaffter
Copy link
Member

@tschaffter tschaffter commented Mar 11, 2024

Closes #2560

Changelog

  • update build_database.py and requirements.txt from GitLab repo
  • now using the staging schema url defined in SCHEMA_URL_STAGING on GitLab
  • limit the main script of iatlas-data to creating the tables but not pushing data to them

Future Work

Get access to a sample of iAtlas data - or mock data - that can be loaded quickly in the DB for a better DX

Preview

The container for iatlas-data is now limited to creating the tables

$ docker logs -f iatlas-data
[13/Mar/2024 16:23:39] INFO [root._drop_all_tables:32] Dropping all tables
[13/Mar/2024 16:23:40] INFO [root._drop_all_tables:34] Dropped all tables
[13/Mar/2024 16:23:40] INFO [root._get_database_schema:42] Getting database schema
[13/Mar/2024 16:24:05] INFO [root._get_database_schema:44] Got database schema
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:53] Building database
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: patients
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: mutation_types
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: genes
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: features
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: datasets
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: tags
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: publications
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: samples
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: mutations
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: gene_sets
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: nodes
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: snps
[13/Mar/2024 16:24:05] INFO [root._build_database_from_schema:55] Adding table to database schema: cohorts
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: cells
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: tags_to_tags
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: tags_to_publications
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: slides
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: single_cell_pseudobulk_features
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: single_cell_pseudobulk
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: samples_to_tags
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: samples_to_mutations
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: rare_variant_pathway_associations
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: publications_to_genes_to_gene_sets
[13/Mar/2024 16:24:06] INFO [root._build_database_from_schema:55] Adding table to database schema: neoantigens
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: heritability_results
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: genes_to_samples
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: genes_to_gene_sets
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: features_to_samples
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: edges
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: germline_gwas_results
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: driver_results
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: datasets_to_tags
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: datasets_to_samples
[13/Mar/2024 16:24:07] INFO [root._build_database_from_schema:55] Adding table to database schema: copy_number_results
[13/Mar/2024 16:24:08] INFO [root._build_database_from_schema:55] Adding table to database schema: colocalizations
[13/Mar/2024 16:24:08] INFO [root._build_database_from_schema:55] Adding table to database schema: cohorts_to_tags
[13/Mar/2024 16:24:08] INFO [root._build_database_from_schema:55] Adding table to database schema: cohorts_to_samples
[13/Mar/2024 16:24:08] INFO [root._build_database_from_schema:55] Adding table to database schema: cohorts_to_mutations
[13/Mar/2024 16:24:08] INFO [root._build_database_from_schema:55] Adding table to database schema: cohorts_to_genes
[13/Mar/2024 16:24:08] INFO [root._build_database_from_schema:55] Adding table to database schema: cohorts_to_features
[13/Mar/2024 16:24:08] INFO [root._build_database_from_schema:55] Adding table to database schema: cells_to_samples
[13/Mar/2024 16:24:09] INFO [root._build_database_from_schema:55] Adding table to database schema: cells_to_genes
[13/Mar/2024 16:24:09] INFO [root._build_database_from_schema:55] Adding table to database schema: cells_to_features
[13/Mar/2024 16:24:09] INFO [root._build_database_from_schema:55] Adding table to database schema: cell_stats
[13/Mar/2024 16:24:09] INFO [root._build_database_from_schema:57] Database built

@tschaffter tschaffter self-assigned this Mar 11, 2024
@tschaffter
Copy link
Member Author

tschaffter commented Mar 11, 2024

Error after updating script and dependencies

When using the main schema URL:

$ docker logs iatlas-data
Traceback (most recent call last):
  File "/src/build_database.py", line 1863, in <module>
    schema = Schema(
  File "/usr/local/lib/python3.10/site-packages/schematic_db/schema/schema.py", line 146, in __init__
    self.schema_graph = SchemaGraph(config.schema_url, display_label_type)
  File "/usr/local/lib/python3.10/site-packages/schematic_db/schema_graph/schema_graph.py", line 23, in __init__
    self.schema_graph = self.create_schema_graph()
  File "/usr/local/lib/python3.10/site-packages/schematic_db/schema_graph/schema_graph.py", line 31, in create_schema_graph
    subgraph = get_graph_by_edge_type(
  File "/usr/local/lib/python3.10/site-packages/schematic_db/api_utils/api_utils.py", line 202, in get_graph_by_edge_type
    response = create_schematic_api_response("schemas/get/graph_by_edge_type", params)
  File "/usr/local/lib/python3.10/site-packages/schematic_db/api_utils/api_utils.py", line 122, in create_schematic_api_response
    raise SchematicAPIError(
schematic_db.api_utils.api_utils.SchematicAPIError: Error accessing Schematic endpoint; URL: https://schematic-staging.api.sagebionetworks.org/v1/schemas/get/graph_by_edge_type; Code: 500; Reason: INTERNAL SERVER ERROR; Time (PST): 2024-03-11 15:02:27.054946-07:00; Parameters: {'schema_url': 'https://raw.githubusercontent.com/CRI-iAtlas/iAtlasSchema/main/iatlas_schema.jsonld', 'relationship': 'requiresComponent', 'data_model_labels': 'display_label'}

When using the develop schema URL as suggested by @andrewelamb, the container is processing for a few minutes before throwing an error:

[11/Mar/2024 21:28:54] INFO [root._download_manifest:189] Downloading manifest; table name: nodes; manifest id: synXXX
[WARNING] /usr/local/lib/python3.10/site-packages/schematic_db/synapse/synapse.py:71: DtypeWarning: Columns (3,6,10) have mixed types. Specify dtype option on import or set low_memory=False.
  return pandas.read_csv(entity.path, keep_default_na=False, na_values="")

[11/Mar/2024 21:28:58] WARNING [py.warnings._showwarnmsg:109] /usr/local/lib/python3.10/site-packages/schematic_db/synapse/synapse.py:71: DtypeWarning: Columns (3,6,10) have mixed types. Specify dtype option on import or set low_memory=False.
  return pandas.read_csv(entity.path, keep_default_na=False, na_values="")

[11/Mar/2024 21:28:58] INFO [root._download_manifest:195] Finished downloading manifest
[11/Mar/2024 21:28:58] INFO [root._update_table_with_manifest:238] Updating manifest; table name: nodes; manifest id: synXXX
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1960, in _exec_single_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/schematic_db/rdb/sql_alchemy_database.py", line 229, in insert_table_rows
    conn.execute(statement)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1408, in execute
    return meth(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 513, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1630, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1839, in _execute_context
    return self._exec_single_context(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1979, in _exec_single_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2335, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1960, in _exec_single_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

[SQL: INSERT INTO nodes (dataset_id, id, label, name, network, node_feature_id, node_gene_id, score, tag_1_id, tag_2_id, x, y) TOO MUCH TO INCLUDE
(Background on this error at: https://sqlalche.me/e/20/e3q8)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/schematic_db/rdb_updater/rdb_updater.py", line 245, in _update_table_with_manifest
    self.rdb.insert_table_rows(table_name, table)
  File "/usr/local/lib/python3.10/site-packages/schematic_db/rdb/sql_alchemy_database.py", line 231, in insert_table_rows
    raise InsertDatabaseError(table_name) from exception
schematic_db.rdb.rdb.InsertDatabaseError: Error inserting table; Table Name: nodes

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/src/build_database.py", line 1884, in <module>
    updater.update_database(method="insert")
  File "/usr/local/lib/python3.10/site-packages/schematic_db/rdb_updater/rdb_updater.py", line 119, in update_database
    self.update_table(name, method)
  File "/usr/local/lib/python3.10/site-packages/schematic_db/rdb_updater/rdb_updater.py", line 144, in update_table
    self._update_table_with_manifest_id(table_name, manifest_id, method)
  File "/usr/local/lib/python3.10/site-packages/schematic_db/rdb_updater/rdb_updater.py", line 175, in _update_table_with_manifest_id
    self._update_table_with_manifest(
  File "/usr/local/lib/python3.10/site-packages/schematic_db/rdb_updater/rdb_updater.py", line 251, in _update_table_with_manifest
    raise UpdateError(table_name, manifest_id) from exc
schematic_db.rdb_updater.rdb_updater.UpdateError: Error updating table; Table Name: nodes; Dataset ID: synXXX

@tschaffter tschaffter added the sonar-scan-approved-deprecated Ready for Sonar code analysis label Mar 13, 2024
@tschaffter tschaffter marked this pull request as ready for review March 13, 2024 18:00
@tschaffter tschaffter merged commit fbb3f1e into Sage-Bionetworks:main Mar 13, 2024
10 of 11 checks passed
@tschaffter tschaffter deleted the fix-2560 branch March 13, 2024 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sonar-scan-approved-deprecated Ready for Sonar code analysis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Unable to run iatlas-data:serve-detach
1 participant