forked from poldrack/LatentStructure
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
122 lines (66 loc) · 4.35 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
This repository contains code and relevant files for the Latent Structure
modeling project described in:
Poldrack RA, Mumford JA, Schonberg T, Kalar D, Barman B, Yarkoni T (2012). Discovering relations between mind, brain, and mental disorders using topic mapping. PLOS Computational Biology, in press.
Preprint available at:
http://talyarkoni.org/papers/Poldrack_et_al_in_press_PLoS_CompBio.pdf
Guide to subdirectories:
CCA - contains results from CCA analysis
NIF-Disorders - contains files used for disorders topic modeling
clustering - contains files used for clustering of disorders
cogatlas - contains files used for cogatlas topic modeling
src: contains the source files used for all analyses. these require the following
external libraries:
http://numpy.scipy.org/
http://scikit-learn.org/stable/
http://rpy.sourceforge.net/rpy2.html
http://nipy.sourceforge.net/nibabel/
also note that src/utils needs to be in the python path
1_db_foci_to_image.py - obtains coordinates from neurosynth database and creates images
- uses utils/tal_to_mni.py
1.1_merge_peakfiles.sh - creates merged image and computes mask of voxels with activation
on at least 1% of papers
1.2_extract_nidag_text.py - extract full text of each article from database
2_get_cogat_concepts.py - read cognitive atlas concepts from RDF and get loading for each document
3_create_disorder_list.py - load NIF dysfunction ontology, grab synonyms, and add
additional missing terms
4_get_disorders_NIF.py - get loading for each disorder term from corpus
5_mk_cogatlas_docs.py - make documents based on cogatlas loadings
6_mk_disorders_docs.py - make documents based on disorder loadings
7_mk_pickled_data.py - created a pickle with the full dataset to make loading easier
8_mk_8fold_cogatlas.py - make files for 8-fold CV
9_mk_8fold_disorders.py - make files for 8-fold CV
10_mk_8fold_cogatlas_mallet_data.py - make mallet data from 8-fold files
11_mk_8fold_disorders_mallet_data.py - make mallet data from 8-fold files
12_mk_8fold_cogatlas_topic_models.py - create scripts to run mallet jobs
13_mk_8fold_disorders_topic_models.py - create scripts to run mallet jobs
14_get_best_topic_likelihood.py - check likelihoods to get get dimenstionality
15.1_get_disorders_dimensionality.py - generate additional topic models to get disorder dimensionality
15.3_get_best_disorder_dimensionality.py - get dimensionality that has unique topic dists
15_run_topic_models.py - generate scripts to run final topic models
16_mk_cogatlas_loadingdata.py - load topic data and save loadingdata.txt
17_mk_disorders_loadingdata.py - load topic data and save loadingdata.txt
18_mk_all_chisq_maps.py - make all chisquare maps - uses utils/mk_chisq_maps_filter.py
18.2_mk_6mm_chisq_maps.sh - make 6mm versions of topic
19_mk_slice_images_cogatlas.py - make slice images using p values
20_mk_slice_images_disorders.py - make slice images using p values
22_mk_latexreport_cogatlas.py - create latex report
23_mk_latexreport_disorders.py - create latex report
24_run_CCA_nonneg.py - run CCA analysis
25_cluster_disorders.R - run clustering on disorders data
26_make_cognitive_topic_figure.py - generate figure 2 from initial submission
27_make_disorders_topic_figure.py - generate figure 3 from initial submission
28_mk_8fold_fulltext_topic_models.py - create scripts to run fulltext topic models
28.1_mk_8fold_fulltext_mallet_data.py - make 8-fold data for full text analysis
28.2_run_all_fulltext_nfold.sh - run mallet jobs on 8-fold full text
28.3_get_best_dimensionality.py - get best dimensionality for full text
30_get_wordhist.py - get histograms of # of docs for each filtered paper
31_count_locations.py - count # of locations reported across all papers
32_doc_topic_hist.py - make histograms of docs/topic and topics/doc (for Figure 3)
33_plot_likelihood.py - plot empirical likelihood as function of # of topics (for Figure 2)
34_run_CCA_on_topics.py - run CCA directly on topic distributions
35_mk_fold1_loadingdata.py - get loadingdata for fold1 for each ntopics
36.1_mk_slice_corr_images.disorders.py - make images using correlation rather than p value
36_mk_slice_corr_images.cogatlas.py - make images using correlation rather than p value (for figure
37_make_cognitive_topic_corr_figure.py - make Figure 4
38_make_disorder_topic_corr_figure.py - make Figure 6
39_get_topic_hierarchy.py - create graphviz .dot file for topic hierarchy (Figure 5)