-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
accidentally overwrote carina with tammy. now they're both there
- Loading branch information
1 parent
7d66b32
commit 3d17c6d
Showing
2 changed files
with
128 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
--- | ||
title: "Tammy the Data Scientist" | ||
toc-expand: 2 | ||
--- | ||
|
||
::: grid | ||
::: {.g-col-12 .g-col-xl-3} | ||
<img src="/img/tammy.jpg" width="200" height="200"/> | ||
::: | ||
|
||
::: {.g-col-12 .g-col-xl-8} | ||
- Tammy **needs** efficient ways to help users across the Hutch with analysis and access to healthcare data. | ||
- She **struggles** to rapidly distribute data needed by researchers. Because there is no governed system to provision access and allow researchers to access data on their own and write their own queries, dataset development falls on Tammy | ||
- **We can help** her by creating a central platform that unifies data from disparate sources in a open (non-proprietary), common data model and provides modern data science tooling for statistical/ML model development and production, so that Tammy can spend more of her time using her skillset in statistical modeling and data science to build data products (e.g., NLP systems, predictive models) that address the data needs of multiple groups at once. | ||
::: | ||
::: | ||
|
||
::: lightblue-highlight | ||
## Tammy needs a queriable centralized data repository | ||
|
||
Researchers at the Hutch need access to adult oncology program clinical data, and Tammy is here to help! She supports researchers with a wide range of data science experience and aims to create solutions that can be leveraged by researchers regardless of target or disease they study. Requests range from curated datasets for specific research projects, to reproducible reports/dashboards, to output from statistical/ML models and NLP systems. Tammy can’t always get them what they need because she **doesn’t have access to all the relevant clinical data assets** and doesn’t have a **PHI-secure, cloud-based computing environment for developing statistical/ML workflows**. Tammy is focused on improving the interoperability of datasets and increasing democratization of data (along with the Data Governance Analyst) so that **researchers who have data skills can query data directly rather than wait on Tammy** to send data for their projects. Rather than spend time writing code for these projects, she would like to focus on building workflows and data science products that can be leveraged across research groups. | ||
::: | ||
|
||
::: darkblue-highlight | ||
Collaborators: Data Engineers, Data Governance Analyst, Analytics Engineers, Clinical Analyst | ||
|
||
Downstream users: Clinical/Translational Researcher, Biostatistician, Program and Service Line Managers | ||
::: | ||
|
||
::: grid | ||
::: {.g-col-12 .g-col-xl-6} | ||
# Key Challenges | ||
|
||
- Understanding the landscape of clinical data applications at Fred Hutch, where data is stored, and how to acquire access | ||
|
||
- Local machines are not the best computing environment for clinical data science; some clinical databases cannot be accessed from a Mac and many computing environments for reproducible analysis cannot be re-created on Windows | ||
|
||
- Educating and nudging researchers towards best practices for clinical data science | ||
|
||
- Lack of self-service tools for researchers mean that the Tammy spends more time building one-off datasets rather than data science tools that can be used by many researchers | ||
|
||
- There is no way to clearly attach information about data use agreements and access permissions to each dataset or project | ||
|
||
- Availability of time and staff limits pace and volume of help provided | ||
|
||
- There is no unified system with all the relevant data; data must be collated from multiple systems | ||
::: | ||
|
||
::: {.g-col-12 .g-col-xl-6} | ||
# Needs and Wants | ||
|
||
- An efficient way to store and retrieve past models/queries for future reference | ||
|
||
- A more efficient way to access multimodal clinical data that... | ||
|
||
- is PHI-approved | ||
|
||
- displays information about provenance, lineage, and data governance (e.g., whether a column contains PHI, what access restrictions are on the data) | ||
|
||
<!-- --> | ||
|
||
- supports best practices for dataset documentation | ||
|
||
- Cloud computing environments for managing statistical/machine learning workflows | ||
|
||
- Secure platform to publish and share deliverables (e.g. Quarto/Jupyter notebooks, dashboards, datasets) | ||
|
||
- A way to help users help themselves to expand capacity of the department | ||
::: | ||
::: | ||
|
||
# Types of data used | ||
|
||
- Structured and unstructured data from the current EHR (Epic/Clarity) and historical EHR systems (ORCA/Cerner, etc.) | ||
|
||
<!-- --> | ||
|
||
- Cancer registry data (CNeXT) | ||
|
||
<!-- --> | ||
|
||
- Sunquest lab data | ||
|
||
<!-- --> | ||
|
||
- Mosaiq radiation oncology data | ||
|
||
<!-- --> | ||
|
||
- OnCore Clinical Trials Management (CTMS) system data | ||
|
||
<!-- --> | ||
|
||
- Pyxis medication administration | ||
|
||
<!-- --> | ||
|
||
- Gateway transplant and immunotherapy data | ||
|
||
<!-- --> | ||
|
||
- Novel, non-clinically reported data is relevant such as research use only genetic assay results | ||
|
||
<!-- --> | ||
|
||
- Survey and case report form type datasets | ||
|
||
- Validated lists of genomic data such as tumor mutations or structural variants | ||
|
||
<div> | ||
|
||
Image attribution: "[Women In Tech - 53](https://www.flickr.com/photos/136629440@N06/22344625928)" by [wocintechchat.com](https://www.flickr.com/photos/136629440@N06) is licensed under [CC BY 2.0](https://creativecommons.org/licenses/by/2.0/?ref=openverse). | ||
|
||
</div> |