Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interpretation of output values #24

Open
edg1983 opened this issue Dec 10, 2021 · 1 comment
Open

interpretation of output values #24

edg1983 opened this issue Dec 10, 2021 · 1 comment

Comments

@edg1983
Copy link

edg1983 commented Dec 10, 2021

Hi,

I've used your pre-compiled index files to compute mappability with -K 150 assuming this is a good approach to compute expected mappability for 150bp reads sequencing (I've tried also -K 100 and -K 75 and the considerations below still valid).

In the resulting BED file, I see that computed values have a range 0-0.5 or 1, with no values between 0.5 and 1. Is this expected?
Are the output values actual mappability values so lower values correspond to regions difficult to map? In this case, why there are no values between 0.5 and 1?

If low values are associated with mapping problems and the computed values are correct (thus most values are < 0.5), any suggestion on a threshold to define difficult-to-map regions for variant filtering?

Thanks!

Edoardo

@cpockrandt
Copy link
Owner

Hi @edg1983,

yes, it is correct, that there are no values between 1.0 and 0.5. The mappability value is the multiplicative inverse of the number of occurrences of a k-mer. A value of 1.0 means it is unique in the genome, 0.5 means it occurs twice, and 0.33 means it occurs three times in the genome.

So your assumption is correct: lower values represent regions that are more repetitive, hence more difficult to map.

I don't have a magic threshold number, but the section on Mappability and SNP calling might be of interest for you.

Christopher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants