Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output all network nodes and edges. #7

Open
3 of 6 tasks
j-andrews7 opened this issue May 1, 2023 · 0 comments
Open
3 of 6 tasks

Output all network nodes and edges. #7

j-andrews7 opened this issue May 1, 2023 · 0 comments

Comments

@j-andrews7
Copy link
Contributor

j-andrews7 commented May 1, 2023

In some sort of format that makes sense. For now, going to copy the CRC program output formats so that downstream reporting/viz can be used with both. File names from the example data/run that I used are in parentheses.

  • List of all node genes (genes marked by a (super)enhancer).
  • List of all TF node genes (TF genes marked by a superenhancer).
  • Full edge table (A549_CRCs_EDGE_TABLE.txt):
SOURCE	TARGET	CHROM	START	STOP	REGION_ID	TF_INTERACTION
ARNTL	ABHD2	chr15	89119809	89119818	Peak_382	0
ARNTL	AC002066.1	chr7	116354919	116354928	10_Peak_18966_lociStitched	0
ARNTL	AC004585.1	chr17	40518775	40518784	Peak_411	0
ARNTL	AC004585.1	chr17	40543266	40543275	Peak_411	0
ARNTL	AC004585.1	chr17	40551758	40551767	Peak_411	0
ARNTL	AC004585.1	chr17	40552457	40552466	Peak_411	0

This represents ARNTL motifs found in enhancers (and/or promoters?) of the genes in the "TARGET" column. This can be used to create dynamically network plots from one or more TFs as selected by the user, optionally limiting to TF-TF interactions as designated in the last column (1 indicates that it's a TF-TF regulatory interaction).

  • (Super) enhancer BED files (A549_CRCs_ENHANCER_TABLE.txt). Currently, this format looks like:
ENHANCER_ID	CHROM	START	STOP	GENE_LIST
68_Peak_17995_lociStitched	chr5	58960811	59309427	PDE4D
31_Peak_11846_lociStitched	chr5	59459122	59621436	PDE4D
25_Peak_83063_lociStitched	chr5	172827953	172935578	ERGIC1,RPL26L1,ATP6V0E1,RF00019,AC008429.1
4_Peak_37549_lociStitched	chr15	98842601	98908710	IGF1R
Peak_411	chr17	40504195	40561897	AC004585.1,AC018629.1,TNS4

Last column is gene assignments. I kind of hate this format, switching to a BED-like format would be easier to worth with. Maybe add another column for whether it's a TF or not, which is currently subsetted into the (A549_CRCs_ENHANCER_TF_TABLE.txt) file.

  • List of self loops (A549_CRCs_SELF_LOOPS.txt), these are just TFs that have a motif in one of their own enhancers.
SOX2
JUN
EGR1
  • List of genes and their associated enhancers, and their TF designation (A549_CRCs_GENE_SUMMARY.txt). In the example data, it's only the SE-associated genes as the SEs were the only enhancers I provided:
GENE	TF	ENHANCER_LIST
AAK1	0	18_Peak_55661_lociStitched
ABCC1	0	16_Peak_3421_lociStitched
ABCC2	0	1_Peak_127_lociStitched
ABCC3	0	Peak_101,Peak_450,Peak_84

I feel it may be worth having two enhancer columns - one for super enhancers and one for "typical" enhancers.

This is still a WIP, will add more in a bit.

@j-andrews7 j-andrews7 changed the title Output all nodes and edges. Output all network nodes and edges. May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants