Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align Template Issues-Demo Data and Failure to Respect Config #4

Open
cfljam opened this issue Sep 25, 2016 · 3 comments
Open

Align Template Issues-Demo Data and Failure to Respect Config #4

cfljam opened this issue Sep 25, 2016 · 3 comments
Labels

Comments

@cfljam
Copy link

cfljam commented Sep 25, 2016

When i run the alignment template there is a some zombie data I presume from the default/demo data coming through in the output

My config file looks like this:

sample file rep read experiment date comments
Pool1 C95VLANXX-2143-01-11-1_L007_R1.fastq.gz 1 R1 Req_10471_HighHealthAlleles 1/01/16 HighVitC/High Fruit Weight
Pool1 C95VLANXX-2143-01-11-1_L007_R2.fastq.gz 1 R2 Req_10471_HighHealthAlleles 1/01/16 HighVitC/High Fruit Weight
Pool1 C95VLANXX-2143-01-11-1_L008_R1.fastq.gz 2 R1 Req_10471_HighHealthAlleles 1/01/16 HighVitC/High Fruit Weight
Pool1 C95VLANXX-2143-01-11-1_L008_R2.fastq.gz 2 R2 Req_10471_HighHealthAlleles 1/01/16 HighVitC/High Fruit Weight
Pool2 C95VLANXX-2143-02-11-1_L007_R1.fastq.gz 1 R1 Req_10471_HighHealthAlleles 1/01/16 HighVitC/Low Fruit Weight
Pool2 C95VLANXX-2143-02-11-1_L007_R2.fastq.gz 1 R2 Req_10471_HighHealthAlleles 1/01/16 HighVitC/Low Fruit Weight
Pool2 C95VLANXX-2143-02-11-1_L008_R1.fastq.gz 2 R1 Req_10471_HighHealthAlleles 1/01/16 HighVitC/Low Fruit Weight
Pool2 C95VLANXX-2143-02-11-1_L008_R2.fastq.gz 2 R2 Req_10471_HighHealthAlleles 1/01/16 HighVitC/Low Fruit Weight
Pool3 C95VLANXX-2143-03-11-1_L007_R1.fastq.gz 1 R1 Req_10471_HighHealthAlleles 1/01/16 LowVitC/High Fruit Weight
Pool3 C95VLANXX-2143-03-11-1_L007_R2.fastq.gz 1 R2 Req_10471_HighHealthAlleles 1/01/16 LowVitC/High Fruit Weight
Pool3 C95VLANXX-2143-03-11-1_L008_R1.fastq.gz 2 R1 Req_10471_HighHealthAlleles 1/01/16 LowVitC/High Fruit Weight
Pool3 C95VLANXX-2143-03-11-1_L008_R2.fastq.gz 2 R1 Req_10471_HighHealthAlleles 1/01/16 LowVitC/High Fruit Weight
Pool4 C95VLANXX-2143-04-11-1_L007_R1.fastq.gz 1 R2 Req_10471_HighHealthAlleles 1/01/16 LowVitC/Low Fruit Weight
Pool4 C95VLANXX-2143-04-11-1_L007_R2.fastq.gz 1 R1 Req_10471_HighHealthAlleles 1/01/16 LowVitC/Low Fruit Weight
Pool4 C95VLANXX-2143-04-11-1_L008_R1.fastq.gz 2 R2 Req_10471_HighHealthAlleles 1/01/16 LowVitC/Low Fruit Weight
Pool4 C95VLANXX-2143-04-11-1_L008_R2.fastq.gz 2 R1 Req_10471_HighHealthAlleles 1/01/16 LowVitC/Low Fruit Weight

but there is spurious mystery output:

(py3r-env) [19:49][cfljam@aklppf31:align (master)] $ ls 240.add_read_group_id/ -lh
total 52G
-rw-rw-r--. 1 cfljam powerplant  6.9K Sep 25 13:48 add_read_group_id_HW1_1.bai
-rw-rw-r--. 1 cfljam powerplant  114K Sep 25 13:48 add_read_group_id_HW1_1.bam
-rw-rw-r--. 1 cfljam powerplant  6.9K Sep 25 13:48 add_read_group_id_HW2_2.bai
-rw-rw-r--. 1 cfljam powerplant  113K Sep 25 13:48 add_read_group_id_HW2_2.bam
-rw-rw-r--. 1 cfljam powerplant  956K Sep 25 18:04 add_read_group_id_Pool1_1.bai
-rw-rw-r--. 1 cfljam powerplant  5.6G Sep 25 18:04 add_read_group_id_Pool1_1.bam
-rw-rw-r--. 1 cfljam powerplant  1.6M Sep 25 19:04 add_read_group_id_Pool1_2.bai
-rw-rw-r--. 1 cfljam powerplant  8.6G Sep 25 19:04 add_read_group_id_Pool1_2.bam
-rw-rw-r--. 1 cfljam powerplant  942K Sep 25 16:40 add_read_group_id_Pool2_1.bai
-rw-rw-r--. 1 cfljam powerplant  4.7G Sep 25 16:40 add_read_group_id_Pool2_1.bam
-rw-rw-r--. 1 cfljam powerplant  1.6M Sep 25 19:07 add_read_group_id_Pool2_2.bai
-rw-rw-r--. 1 cfljam powerplant  8.3G Sep 25 19:07 add_read_group_id_Pool2_2.bam
-rw-rw-r--. 1 cfljam powerplant  974K Sep 25 16:21 add_read_group_id_Pool3_1.bai
-rw-rw-r--. 1 cfljam powerplant  4.4G Sep 25 16:21 add_read_group_id_Pool3_1.bam
-rw-rw-r--. 1 cfljam powerplant  993K Sep 25 17:13 add_read_group_id_Pool4_1.bai
-rw-rw-r--. 1 cfljam powerplant  6.5G Sep 25 17:13 add_read_group_id_Pool4_1.bam
-rw-rw-r--. 1 cfljam powerplant 1004K Sep 25 16:32 add_read_group_id_Pool5_1.bai
-rw-rw-r--. 1 cfljam powerplant  4.9G Sep 25 16:32 add_read_group_id_Pool5_1.bam

2 issues

  1. What are the HW1 and HW2 files doing in here??? they from demo data
  2. The config file lists 4 pool samples x 2 reps but Pool3_2 has become (non-existent) Pool5 rep 1
@cfljam cfljam added the bug label Sep 25, 2016
@hdzierz
Copy link
Contributor

hdzierz commented Oct 19, 2016

@cfljam

Can you point me to your notebook?

Thanks

Helge

@cfljam
Copy link
Author

cfljam commented Oct 23, 2016

I have reproduced this at /workspace/cfljam/HighHealth/PoolSeq/alignEA/

with CL

$NXF_HOME/nextflow run \
    PlantandFoodResearch/VariantAnalysis/align.nf \
    --genus 'Actinidia' \
    --input_dir $INPUTDIR\
    --genome $EAFASTA \
    --design ./design.config \
    --output_dir $EAOUTPUTDIR

Notebook visible at /workspace/cfljam/HighHealth/PoolSeq/2016-10-23AlignPoolsEANextFlow.html/workspace/cfljam/HighHealth/PoolSeq/2016-10-23AlignPoolsEANextFlow.html

Config file is same as https://github.com/Actinidia/HighHealth/blob/master/PoolSeq/design.config

@cfljam cfljam changed the title Default Config Data Issues Align Template Issues-Demo Data and Failure to Respect Config Oct 23, 2016
@cfljam
Copy link
Author

cfljam commented Nov 2, 2016

Here is another possible issue:

ad position: EA01_02_scaffold378265:1
  INFO  2016-10-24 19:59:19     MarkDuplicates  Tracking 425633 as yet unmatched pairs. 3160 records in RAM.
  INFO  2016-10-24 20:12:15     MarkDuplicates  Read 116811705 records. 0 pairs never matched.
  INFO  2016-10-24 20:12:27     MarkDuplicates  After buildSortedReadEndLists freeMemory: 2309320448; totalMemory: 17644388352; maxMemory: 30542397440
  INFO  2016-10-24 20:12:27     MarkDuplicates  Will retain up to 954449920 duplicate indices before spilling to disk.
  [Mon Oct 24 20:12:49 NZDT 2016] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 1,837.42 minutes.
  Runtime.totalMemory()=24965545984
  To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
  Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at htsjdk.samtools.util.SortingLongCollection.<init>(SortingLongCollection.java:112)
        at picard.sam.markduplicates.MarkDuplicates.generateDuplicateIndexes(MarkDuplicates.java:570)
        at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:195)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants