-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matrioskas approach #4
Comments
Done
|
@brunocontrerasmoreira , for info Impact of the choice of the N top sequencesSo far we tested 3 values for the number of top sequences to keep : 250, 500, 1000. ROR B with 250 top -ranking PBM spotsROR B with 500 top -ranking PBM spotsROR B with 1000 top -ranking PBM spots |
Choice of the normalisation methodPBM data are provided with 2 normalisation methods: SD or QNZC (z-scores). The signal intensities and the ranking of the spots show important differences depending on the signal normalisation method. We tested motif discovery with both approaches. For NACC2, the motif discovery results are quite different. With SD normalisation, the sequence logos show reasonably good motifs : Motifs discovered in the 1000 top-ranking spots of NACC2 QNZS datasetMotifs discovered in the 1000 top-ranking spots of NACC2 SD datasetAlbeit both datasets return significant motifs, with SD the logos show high error bars and very irregular successions of high- and low-scoring columns in terms of information content. |
This effect seems to depend on the TF : it is not observed with RORB or TIGD3 |
Some data types associate a score to each sequence. This is for example the case for PBM, CSH, ...
We could
oligo-analysis
) successively in the top 100, 200, 300, 500, 1000, 2000, ... sequencesPBM example
CHS
The text was updated successfully, but these errors were encountered: