Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invariant Sites Removal #9

Open
RachelxGray opened this issue Dec 17, 2024 · 1 comment
Open

Invariant Sites Removal #9

RachelxGray opened this issue Dec 17, 2024 · 1 comment

Comments

@RachelxGray
Copy link

Hi,

Thank you for the package! I am working with a non-model species and projecting putative hybrids onto the PCA space of source individuals (since they are recent hybrids I have turned off scaling for genetic drift). I have created my input .traw from a vcf of SNPs, and so I'm a bit confused why so many of my sites are being removed by the filter for invariant sites.

Please could you explain how it is deciding to remove these? Are they within / between groups or the projected individuals? I have checked my vcf for invariant sites and there aren't any. I have posted my code and output below. Thanks for your help :)

Code:

library(smartsnp)
my_groups <- c(1:140)
my_ancient <- c(1:96)
numSamples = nrow(read.table("RefsHybs.fam"))
head(numSamples)
pcaR2 <- smart_pca(snp_data = "RefsHybs_genotypeMatrix.traw", sample_group = my_groups, sample_project = my_ancient,missing_value = NA, scaling="none")

Output:

Checking argument options selected...
Argument options are correct...
Loading data...
Imported 2523831 SNP by 140 sample genotype matrix
Time elapsed: 0h 0m 2s
Filtering data...
2523831 SNPs included in PCA computation
44 samples included in PCA computation
96 samples projected after PCA computation
Completed data filtering
Time elapsed: 0h 0m 3s
Scanning for invariant SNPs...
Scan complete: removed 1017534 invariant SNPs
Time elapsed: 0h 0m 5s
Checking for missing values...
1474461 SNPs contain missing values
Imputing SNPs with missing values...
Imputation with means completed: 7770780 missing values imputed
Time elapsed: 0h 0m 6s
Note: SNP-based scaling not used
Computing singular value decomposition using RSpectra...
Completed singular value decomposition using RSpectra
Time elapsed: 0h 0m 8s
Extracting eigenvalues and eigenvectors...
Eigenvalues and eigenvectors extracted
Time elapsed: 0h 0m 8s
Projecting ancient samples onto PCA space
PCA space = PC1PC2
Completed ancient sample projection
96 ancient samples projected
Time elapsed: 0h 0m 14s
Tabulating PCA outputs...
Completed tabulations of PCA outputs...
@ChristianHuber
Copy link
Owner

Hi! Not sure why it's filtering out so many sites. Invariant sites are defined as sites with only one allele across all analyzed individuals. If you send me your traw and fam file (or a subset of it), I could have a closer look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants