Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lift over error: invalid literal for int() with base 10: '2_KI270773v1_alt' #47

Open
Dazcam opened this issue May 19, 2021 · 0 comments

Comments

@Dazcam
Copy link

Dazcam commented May 19, 2021

I'm trying to use sumstats.py lift to lift hg19 SNPs in 5 GWAS sumstats files over to hg38. I have already run `sumstats.py csv' to standardise these files.

SNP	CHR	BP	PVAL	A1	A2	N	Z	OR	BETA	SE
rs11579922	1	1036860	.1662	A	C	50914	-1.3868004	.97278	-.02759733	.0199
rs11579015	1	1036959	.1067	T	C	49514	-1.6133769	.96435	-.03630098	.0225
rs11260592	1	1037303	.1716	T	C	50914	-1.3683987	.97287	-.02750481	.0201
rs11260593	1	1037313	.169	A	G	50914	-1.3730014	.97278	-.02759733	.0201
rs66622470	1	1038088	.1659	C	G	50914	1.3867192	1.02798	.02759571	.0199

However, I'm getting the following error for 2 out of the 5 files so far - the others are still running:

Traceback (most recent call last):
  File "python_convert/sumstats.py", line 2212, in <module>
    args.func(args, log)
  File "python_convert/sumstats.py", line 1375, in make_lift
    df.loc[index, cols.CHR] = int(lifted[0][0][3:])
ValueError: invalid literal for int() with base 10: '2_KI270773v1_alt'
Analysis finished at Tue May 18 18:02:06 2021
Total time elapsed: 2.0h:7.0m:48.60999999999967s

This appears to relate to entries in the the 'hg19ToHg38.over.chain.gz' file as there are no alt_chrs in the original GWAS sumstat files. There are 114 alt_chrs in total.

I'm wondering if there is a way around this, i.e. can I add a parameter to ignore/deal with these loci? What exactly does `--keep-bad-snps' do? I'm reluctant to do this without knowing fully what it does.

Interestingly, this error does not arise when I use the standard liftover tool, but using that means I need to generate bed files first. sumstats.py would be the neatest option for me.

Here is my code:

rule lift_over:
    input:   SCRATCH + GWAS_DIR + "GWAS_sumstats_standardised/{GWAS}_hg19_withZ_sumstats.tsv"
    output:  SCRATCH + GWAS_DIR + "GWAS_sumstats_standardised/{GWAS}_hg38_sumstats.tsv"
    message: "Formatting {input} sumstats"
    log:     SCRATCH + "logs/lift_over/{GWAS}_hg38.log"
    params:  SCRATCH + GWAS_DIR + "hg19ToHg38.over.chain.gz"
    shell:
             """

             python python_convert/sumstats.py lift \
             --sumstats {input} \
             --out {output} \
             --chain-file {params} \
             --log {log}

             """

I could also remove these entries from the chain file, but I thought I'd ask if there is a way to deal with them before proceeding.

Many Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant