Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

biomcmc_realloc error #4

Open
baojianfan opened this issue Jul 4, 2022 · 2 comments
Open

biomcmc_realloc error #4

baojianfan opened this issue Jul 4, 2022 · 2 comments

Comments

@baojianfan
Copy link

Hello the tatajuba team,

Thank you for developing this very nice tool. I installed tatajuba:1.0.4--h7132678_1 through Singularity. After launching the tool, it successfully completed to build index files for fasta but failed in processing the fastq files.

Here is my command line:
cd /mnt/premiumfileshare/bfan/projects/homopolymer/RD-5729/tatajuba/ docker run -v pwd:pwd-wpwd quay.io/biocontainers/tatajuba:1.0.4--h7132678_1 tatajuba \ -p -k 28 -i 25 -V \ -g /mnt/premiumfileshare/bfan/projects/homopolymer/RD-5729/tatajuba/GRCh37_latest_genomic.gff.gz \ -f /mnt/premiumfileshare/bfan/projects/homopolymer/RD-5729/tatajuba/GRCh37_latest_genomic.fna.gz \ -o /mnt/premiumfileshare/bfan/projects/homopolymer/RD-5729/tatajuba/ \ /sg/seqData/private/targeted/illumina/SG_wetlab/xGen/CES_Immuno_v4/20191031_NE-1885_aFFPE/00a_S13_R1_001.fastq.gz \ /sg/seqData/private/targeted/illumina/SG_wetlab/xGen/CES_Immuno_v4/20191031_NE-1885_aFFPE/00a_S13_R2_001.fastq.gz \ /sg/seqData/private/targeted/illumina/SG_wetlab/xGen/CES_Immuno_v4/20191031_NE-1885_aFFPE/10a_S1_R1_001.fastq.gz \ /sg/seqData/private/targeted/illumina/SG_wetlab/xGen/CES_Immuno_v4/20191031_NE-1885_aFFPE/10a_S1_R2_001.fastq.gz

The error message is:

[warning] Decreasing number of threads to match number of samples
tatajuba 1.0.4
Reference genome fasta file: /mnt/premiumfileshare/bfan/projects/homopolymer/RD-5729/tatajuba/GRCh37_latest_genomic.fna.gz
Reference GFF3 file prefix: /mnt/premiumfileshare/bfan/projects/homopolymer/RD-5729/tatajuba/GRCh37_latest_genomic
Output directory: /mnt/premiumfileshare/bfan/projects/homopolymer/RD-5729/tatajuba/
Number of samples: 2 (paired-end)
Max distance per flanking k-mer: 1
Levenshtein distance for merging: 2
Flanking k-mer size (context): 28
Min tract length to consider: 4
Min depth of tract lengths: 25
Remove biased tracts: yes
Number of threads (requested or optimised): 2
Assuming paired-end samples: their file names should be consecutive (no file name check is conducted)
Read GFF3 reference genome in 3196.322516 secs

processing paired files /sg/seqData/private/targeted/illumina/SG_wetlab/xGen/CES_Immuno_v4/20191031_NE-1885_aFFPE/00a_S13_R1_001.fastq.gz and /sg/seqData/private/targeted/illumina/SG_wetlab/xGen/CES_Immuno_v4/20191031_NE-1885_aFFPE/00a_S13_R2_001.fastq.gz
processing paired files /sg/seqData/private/targeted/illumina/SG_wetlab/xGen/CES_Immuno_v4/20191031_NE-1885_aFFPE/10a_S1_R1_001.fastq.gz and /sg/seqData/private/targeted/illumina/SG_wetlab/xGen/CES_Immuno_v4/20191031_NE-1885_aFFPE/10a_S1_R2_001.fastq.gz
[ error ] biomcmc_realloc error on pointer 0x35100EF0 of 0 bites

[note to developers] If you want to debug me, set a breakpoint on function biomcmc_error()

Could you please let me know your comments?

Thanks,
Baojian

@leomrtns
Copy link
Member

leomrtns commented Jul 4, 2022

Hello! thanks for the kind words, and for trying tatajuba!
Apparently for at least one of your samples tatajuba did not find any quality homopolymer tract (HT). This may be due to the Min depth of tract lengths: 25 which means it will exclude all HTs that are not seen at least 25 times (with exact same context and tract length, in other words a k-mer of length 60).

This seems quite a strict coverage depth. Maybe you can try again with -i 4 to see if more HTs are kept. It may also be that there are no HTs at all in the fastq files. I added a more informative warning in such cases, and tatajuba now tries to exclude these samples instead of exiting. (curerntly available only in source code and not on conda/singularity)

I assume this is the problem since from your report it exited before doing the BWA mapping. However I notice you are trying to use a human reference genome, which was not its original intent — I am not sure it can handle such large genomes. Certainly it will be slow (it will need to create the index once and then map all samples against it). Just loading the GFF took 50 minutes!

@baojianfan
Copy link
Author

baojianfan commented Jul 5, 2022

Hi Leonardo,

Thank you for your prompt response. I changed the coverage threshold as you suggested but there was similar error as before. Here is the error message:

[warning] Decreasing number of threads to match number of samples
tatajuba 1.0.4
Reference genome fasta file: /mnt/premiumfileshare/bfan/projects/homopolymer/RD-5729/tatajuba/GRCh37_latest_genomic.fna.gz
Reference GFF3 file prefix: /mnt/premiumfileshare/bfan/projects/homopolymer/RD-5729/tatajuba/GRCh37_latest_genomic
Output directory: /mnt/premiumfileshare/bfan/projects/homopolymer/RD-5729/tatajuba/
Number of samples: 2 (paired-end)
Max distance per flanking k-mer: 1
Levenshtein distance for merging: 2
Flanking k-mer size (context): 28
Min tract length to consider: 4
Min depth of tract lengths: 4
Remove biased tracts: yes
Number of threads (requested or optimised): 2
Assuming paired-end samples: their file names should be consecutive (no file name check is conducted)
Read GFF3 reference genome in 46.612328 secs

processing paired files /sg/seqData/private/targeted/illumina/SG_wetlab/xGen/CES_Immuno_v4/20191031_NE-1885_aFFPE/00a_S13_R1_001.fastq.gz and /sg/seqData/private/targeted/illumina/SG_wetlab/xGen/CES_Immuno_v4/20191031_NE-1885_aFFPE/00a_S13_R2_001.fastq.gz
processing paired files /sg/seqData/private/targeted/illumina/SG_wetlab/xGen/CES_Immuno_v4/20191031_NE-1885_aFFPE/10a_S1_R1_001.fastq.gz and /sg/seqData/private/targeted/illumina/SG_wetlab/xGen/CES_Immuno_v4/20191031_NE-1885_aFFPE/10a_S1_R2_001.fastq.gz
[ error ] biomcmc_realloc error on pointer 0xC0090EF0 of 0 bites

Probably it was caused by the fact that tatajuba could not handle human reference genome. In that case, do you know if there are other tools available for handling human reference genome?

Thanks,
Baojian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants