Skip to content

auberginekenobi/pedpancan_ecdna

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pedpancan_ecdna

Contains working code, scripts and data for the Chavez Lab pedpancan project. Not yet public.
Tested on an Apple M2 Pro chip and 16Gb RAM running macOS Sonoma 14.5.

Installation

(Almost) all code is in jupyter notebook format. Packages and dependencies are installed using conda. For instructions on how to set up conda on your workstation, see Setting up your workstation. Dependencies should be clearly indicated in the first cell of each notebook. To install a conda environment from a .yml file, run

## Create a new environment and install all packages
conda env create -f environment.yml

## If you're on a Mac with Apple silicon, edgeR and others need to be installed using intel architecture:
CONDA_SUBDIR=osx-64 conda env create -f differential-expression.yml
conda activate differential-expression
conda config --env --set subdir osx-64

## Link the environment to your base jupyter installation
# R environments:
conda activate myenvironment
Rscript -e "IRkernel::installspec(name = 'myenvironment', displayname = 'myenvironment')"
conda deactivate

# python environments:
conda activate myenvironment
python -m ipykernel install --user --name myenvironment --display-name "myenvironment"
conda deactivate

Dataset

The data for this publication are organized in the Supplementary Tables. Code for generating and reading the Suppl. Tbls. is in data_imports.py.

To read the Suppl. Tbls.:

# import data_imports.py
import sys
sys.path.append('../src')
from data_imports import *

patients = import_patients()
biosamples = import_biosamples()
amplicons = import_amplicons()
genes = import_genes()

To generate the Suppl. Tbls. from source data:

biosamples = generate_biosamples_table()
patients = generate_patient_table(biosamples)
amplicons = generate_amplicon_table(biosamples)
genes = generate_gene_table(biosamples)

To generate the Suppl. Tables, the following source files are required:

  • data/source/AmpliconClassifier/pedpancan_summary_map.txt # list of all biosamples analyzed. Generated by AmpliconClassifier/ampclasslib/make_input.py.
  • data/source/AmpliconClassifier/pedpancan_amplicon_classification_profiles.tsv # Amplicon classifications. Generated by AmpliconClassifier/amplicon_classifier.py.
  • data/source/AmpliconClassifier/pedpancan_gene_list.tsv # Amplified genes. Generated by ibid.
  • data/source/pedpancan_mapping.xlsx # Compiled by the authors from St Jude and DKFZ ontologies.
  • data/local/sjcloud/SAMPLE_INFO_2022-03-02.tsv # File metadata generated by the St. Jude Cloud upon file provision.
  • data/local/opentarget/histologies.tsv # File metadata from the OpenPBTA project (source).
  • data/source/cavatica/X01-biosample-metadata.tsv # File metadata for CBTN dataset. Compiled using the CAVATICA API. See cavatica-api.ipynb.
  • data/source/cavatica/X00-biosample-metadata.tsv # Ibid.
  • data/source/cavatica/PNOC-biosample-metadata.tsv # Ibid.

TODO from S. Danovi

  • Include multiome data analysis
  • Is it ecDNA or just amplification?

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •