Skip to content

Commit

Permalink
Fix optim chunk size bug in step 6 (very large datasets overflow hdf5…
Browse files Browse the repository at this point in the history
… max chunksize 4GB limit)
  • Loading branch information
isaacovercast committed Aug 22, 2016
1 parent 0a50347 commit 898d1a5
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions ipyrad/assemble/cluster_across.py
Original file line number Diff line number Diff line change
Expand Up @@ -411,6 +411,15 @@ def build_h5_array(data, samples, ipyclient):
chunks = 2000
if data.nloci > 500000:
chunks = 5000

## Number of elements in hdf5 chunk may not exceed 4GB
## This is probably not actually optimal, to have such
## enormous chunk sizes, could probably explore efficiency
## of smaller chunk sizes on very very large datasets
chunklen = chunks * len(samples) * maxlen * 4
if chunklen> 4000000000:
chunks = int(round(4000000000/(len(samples) * maxlen * 4)))

data.chunks = chunks
LOGGER.info("data.nloci is %s", data.nloci)
LOGGER.info("chunks is %s", data.chunks)
Expand Down

0 comments on commit 898d1a5

Please sign in to comment.