Skip to content

Commit

Permalink
Fix an incorrect memory assumption for hash join statistics (#1184)
Browse files Browse the repository at this point in the history
When a hashjoin's hashtable runs out of allowed memory, it divides the hash
table into batches, and stores each batch to disk. When using EXPLAIN ANALYZE,
an additional description is created for the number of batches and the memory
they used. (With normal EXPLAIN, the request is not executed, and accordingly,
batches are not created). To store information about the disk space occupied by
a batch, the HashJoinTableStats structure is used, which has a batchstats
member of type HashJoinBatchStats, which is an array of structures for each
batch.

Usually, the number of batches is calculated at the beginning of hashjoin, but
sometimes the calculation algorithm may miss. Then, the
ExecHashIncreaseNumBatches() is used, which increases the current number of
batches, multiplying the previous one by two, and allocating additional memory
to the structures that need it.

Each structure of type HashJoinBatchStats takes 80 bytes, opposed to 8 bytes
for a pointer. Since memory for batchstats member is allocated for the
structures themselves, and not pointers to them, the size of realloc() call
for batchstats in ExecHashIncreaseNumBatches() can become more than
MaxAllocSize and cause an error.

In the beginning of ExecHashIncreaseNumBatches(), there is a check that
assumes that each item in any of the reallocated arrays would not be more than 8
bytes (sizeof(void *)). oldnbatch is equal to nbatch / 2. When we take the
size of the HashJoinBatchStats into account, we may notice that this check is
not correct specifically for this allocation, and is off by 10 times of the
value of the faulty repalloc() call.

The issue is that batchstats grows faster than other arrays containing
pointers. So this check does not prevent the faulty statistics batches
reallocation. When we are nearing MaxAllocSize with
nbatch * sizeof(stats->batchstats[0]), and try to reallocate the batchstats
member, we will encounter an error, because allocation size requested will be
more than MaxAllocSize.

Use repalloc_huge() instead of repalloc() for statistics, to allow
requesting memory up to:
MaxAllocSize / ((sizeof(void *) * 2) = 67 108 863;
67 108 863 * 80 = 5 368 709 040, which is slightly less than 5GB.

On a 32-bit machines, nbatch * sizeof(stats->batchstats[0]) could exceed
MaxAllocHugeSize, which is 2 ^ 32 = 4GB. Add a separate check and stop if
nbatch > MaxAllocHugeSize / sizeof(HashJoinBatchStats) to avoid integer
overflow.

On 64-bit machines, maximum allowed amount of batches is unchanged.

(cherry picked from commit 2cd2c76)
  • Loading branch information
bandetto authored and RekGRpth committed Jan 20, 2025
1 parent ab19e25 commit aa6df3a
Showing 1 changed file with 14 additions and 1 deletion.
15 changes: 14 additions & 1 deletion src/backend/executor/nodeHash.c
Original file line number Diff line number Diff line change
Expand Up @@ -1041,6 +1041,10 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
if (oldnbatch > Min(INT_MAX / 2, MaxAllocSize / (sizeof(void *) * 2)))
return;

/* avoid repalloc_huge overflow on 32 bit systems */
if (stats && oldnbatch > MaxAllocHugeSize / (sizeof(HashJoinBatchStats) * 2))
return;

/* A reusable hash table can only respill during first pass */
AssertImply(hashtable->hjstate->reuse_hashtable, hashtable->first_pass);

Expand Down Expand Up @@ -1078,8 +1082,17 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
{
Size sz = nbatch * sizeof(stats->batchstats[0]);

/*
* We use repalloc_huge because the condition in the beginning assumes
* that oldnbatches and nbatches will be used to index values of size 8
* or less (size of void*), but the size of HashJoinBatchStats
* structure is 80 bytes, so even if we pass that check, this still
* might not fit in the MaxAllocSize. The maximum amount of memory we
* can request here is slightly less than 5GB, estimated for
* MaxAllocSize = 1GB.
*/
stats->batchstats =
(HashJoinBatchStats *) repalloc(stats->batchstats, sz);
(HashJoinBatchStats *) repalloc_huge(stats->batchstats, sz);
sz = (nbatch - stats->nbatchstats) * sizeof(stats->batchstats[0]);
memset(stats->batchstats + stats->nbatchstats, 0, sz);
stats->nbatchstats = nbatch;
Expand Down

0 comments on commit aa6df3a

Please sign in to comment.