-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hicPCA and compartmentalization. Setting interaction range. #647
Comments
Hi Jun, so you basically need a I need to check the source and the mathematical properties to see if this is possible. Best, Joachim |
Yes! thank you - Jun |
I had the same issue here: https://www.cell.com/cell/pdf/S0092-8674(20)30621-8.pdf (early drosophila embryos with low compartmentalization). We ended-up computing PCA for submatrices of pre-defined size (near the diagonal, tried different sizes) with a sliding window approach and stitched the resulting eigen vectors. The advantage of sliding window is that you can compare if the eigenvector "signal" is similar or not between overlapping regions, and they most likely are very similar, so it's safe to stitch the eigenvector values of adjacent windows. The method is described in the supplementary of that paper. I might have a jupyternotebook about this lying around. |
Oh man. This is exactly what we were looking for! Finding the proper values for sliding window (a,a+w) will be important for our system, c elegans. Conceptually I can understand the method section, but I come from biology background, so I will definitely have a lot of trouble executing the method section. Could you share with us if you find that notebook? Thanks! - Jun |
This is a very, very bad idea. PCA is not numerical addition/multiplication. It is a matrix transformation, so when you do a subset of your data, you change the space that you are operating in, and therefore the decomposition (which is what the PCA is, essentially) get modified in unpredictable ways. A Compartment analysis of a subset of a chromosome does not necessarily retain the Compartment done on the whole chromosome arm. There are algorithm to approximate PCAs for large matrix. This is not the way at all. See for yourself. I did a 8x8 vs 4x4 example.
|
I am not sure if this about our tool or the latest comment on this issue. If you are talking about how we handle hicPCA after hicAdjustmatrix , this is not about the size of matrix but about dealing with the biological question we are trying to answer. The reason to remove a region of chromosome is to remove a bias from the not very well assembled parts or very repetitive centromeric regions. If those regions well be kept, we will introduce variation which are nothing but artifact. Of course is up to the user to be careful about what part of genome they are removing. |
Thanks for your contribution, but I don't get your point. A different input will lead to a different result, that's quite clear. However, even if the result is not 1:1 comparable to a PCA on the full matrix, and also if the result is something else in mathematical terms, it might still be a valid result in terms of biological interpretation. It's then just a different algorithm. You always need to verify your results with orthogonal data, and if that shows it's valid, the approach can be used. Therefore I am not sure what kind of issue you are raising here. |
Hi,
Thank you for the amazing tool.
I was wondering if it is possible to compute hicPCA or hicCompartmentalization on matrix with limited interaction range (ex. use interactions between loci less than 5mb).
This largely comes from problem that is specific our system: the segments of the compartments are super tiny, and the checker-board pattern falls off quickly with increasing genomic distance. So we have to use fairly small bin size (no greater than 10kb) for PCA to work, but this also increases noise in long-range interactions.
Heres a snap shot of 2mb region.
We've dealt with this issue by combination of sequencing more reads and subseting matrix using hicAdjustMatrix, but we don't like that we are computing PCA for different regions of chromosomes separately. So, I was hoping if you guys had some insights on this.
Thank you,
Jun
The text was updated successfully, but these errors were encountered: