-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy pathimportant-bioc-features.Rmd
130 lines (98 loc) · 5.53 KB
/
important-bioc-features.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# Important _Bioconductor_ Package Development Features {#important-bioconductor-package-development-features}
## biocViews {#biocviews}
Packages added to the _Bioconductor_ Project require a `biocViews:`
field in their DESCRIPTION file. The field name "biocViews" is
case-sensitive and must begin with a lower-case 'b'.
biocViews terms are "keywords" used to describe a given package. They
are broadly divided into four categories, representing the type of
packages present in the _Bioconductor_ Project
1. Software
2. Annotation Data
3. Experiment Data
4. Workflow
biocViews are available for the [release][release-biocviews] and [devel][biocViews] branches of
_Bioconductor_. The devel branch has a check box under the tree
structure which, when checked, displays biocViews that are defined but
not used by any package, in addition to biocViews that are in use. See also
[description section](#description-biocviews)
### Motivation {#biocviews-motivation}
One can use biocViews for two broad purposes.
1. A researcher might want to identify all packages in the
_Bioconductor_ Project which are related to a specific purpose.
For example, one may want to look for all packages related to "Copy
Number Variants".
2. During development, a package contributor can "tag" their package
with biocViews so that when someone looking for packages (like in
scenario 1) can easily find their package.
### biocViews during new package development {#biocviews-pkg-devel}
Visit the ['devel' biocViews][biocViews] when you are in the process of
adding biocViews to your new package. Identify as many terms as
appropriate from the hierarchy. Prefer 'leaf' terms at the end of the
hierarchy, over more inclusive terms. Remember to check the box
displaying all available terms.
Please Note:
1. Your package will belong to only one part of _Bioconductor_ Project
(Software, Annotation Data, Experiment Data, Workflow), so choose only
biocViews from that category.
2. biocViews listed in your package must match exactly (e.g.,
spelling, capitalization) the terms in the biocViews hierarchy.
When you submit your new package for review , your package is checked
and built by the _Bioconductor_ Project. We check the following for
biocViews:
1. Package contributor has added biocViews.
2. biocViews are valid.
3. Package contributor has added biocViews from only one of the categories.
If you receive a "RECOMMENDED" direction for any of these biocViews
after you have submitted your package, you can try correcting them on
your own following the directions given here or ask your package
reviewer for more information.
If a developer thinks a biocViews term should be added to the current acceptable
list, please email [email protected] requesting the new biocView, under
which hierarchy the term should be placed, and the justification for the new
term.
## Common Bioconductor Methods and Classes {#reusebioc}
We strongly recommend reusing existing methods for importing data, and
reusing established classes for representing data. Here are some
suggestions for importing different file types and commonly used
_Bioconductor_ classes. For more classes and functionality also try
searching in [biocViews][] for your data type.
### Importing data {#commonimport}
+ GTF, GFF, BED, BigWig, etc., -- `r BiocStyle::Biocpkg("rtracklayer")` `::import()`
+ VCF -- `r BiocStyle::Biocpkg("VariantAnnotation")` `::readVcf()`
+ SAM / BAM -- `r BiocStyle::Biocpkg("Rsamtools")` `::scanBam()`,
`r BiocStyle::Biocpkg("GenomicAlignments")` `::readGAlignment*()`
+ FASTA -- `r BiocStyle::Biocpkg("Biostrings")` `::readDNAStringSet()`
+ FASTQ -- `r BiocStyle::Biocpkg("ShortRead")` `::readFastq()`
+ MS data (XML-based and mgf formats) -- `r BiocStyle::Biocpkg("Spectra")` `::Spectra()`,
`r BiocStyle::Biocpkg("Spectra")` `::Spectra(source = MsBackendMgf::MsBackendMgf())`
### Common Classes {#commonclass}
+ Rectangular feature x sample data --
`r BiocStyle::Biocpkg("SummarizedExperiment")` `::SummarizedExperiment()`
(RNAseq count matrix, microarray, ...)
+ Genomic coordinates -- `r BiocStyle::Biocpkg("GenomicRanges")` `::GRanges()`
(1-based, closed interval)
+ Genomic coordinates from multiple samples --
`r BiocStyle::Biocpkg("GenomicRanges")` `::GRangesList()`
+ Ragged genomic coordinates -- `r BiocStyle::Biocpkg("RaggedExperiment")`
`::RaggedExperiment()`
+ DNA / RNA / AA sequences -- `r BiocStyle::Biocpkg("Biostrings")`
`::*StringSet()`
+ Gene sets -- `r BiocStyle::Biocpkg("BiocSet")` `::BiocSet()`,
`r BiocStyle::Biocpkg("GSEABase")` `::GeneSet()`,
`r BiocStyle::Biocpkg("GSEABase")` `::GeneSetCollection()`
+ Multi-omics data --
`r BiocStyle::Biocpkg("MultiAssayExperiment")` `::MultiAssayExperiment()`
+ Single cell data --
`r BiocStyle::Biocpkg("SingleCellExperiment")` `::SingleCellExperiment()`
+ Mass spec data -- `r BiocStyle::Biocpkg("Spectra")` `::Spectra()`
+ File formats -- `r BiocStyle::Biocpkg("BiocIO")` `` ::`BiocFile-class` ``
In general, a package will not be accepted if it does not show interoperability
with the current [_Bioconductor_][] ecosystem.
## Vignette {#bioc-vignette}
Every submitted Bioconductor package should have at least one Rmd (preferred) or
Rnw vignette, ideally utilizing `BiocStyle::html_document` as output
rendering. This should include evaluated R package code and a detailed
introduction/abstract section that provides motivation for inclusion in
Bioconductor and when appropriate a review and comparison to existing
Bioconductor packages with similar functionality or scope. See [vignette
documentation section](#vignettes) for more details.