From aa6df3a95a1a73418a1eb7bb589eaba2e59aa4c0 Mon Sep 17 00:00:00 2001 From: Vladimir Sarmin Date: Fri, 26 Jan 2024 08:17:54 +0300 Subject: [PATCH] Fix an incorrect memory assumption for hash join statistics (#1184) When a hashjoin's hashtable runs out of allowed memory, it divides the hash table into batches, and stores each batch to disk. When using EXPLAIN ANALYZE, an additional description is created for the number of batches and the memory they used. (With normal EXPLAIN, the request is not executed, and accordingly, batches are not created). To store information about the disk space occupied by a batch, the HashJoinTableStats structure is used, which has a batchstats member of type HashJoinBatchStats, which is an array of structures for each batch. Usually, the number of batches is calculated at the beginning of hashjoin, but sometimes the calculation algorithm may miss. Then, the ExecHashIncreaseNumBatches() is used, which increases the current number of batches, multiplying the previous one by two, and allocating additional memory to the structures that need it. Each structure of type HashJoinBatchStats takes 80 bytes, opposed to 8 bytes for a pointer. Since memory for batchstats member is allocated for the structures themselves, and not pointers to them, the size of realloc() call for batchstats in ExecHashIncreaseNumBatches() can become more than MaxAllocSize and cause an error. In the beginning of ExecHashIncreaseNumBatches(), there is a check that assumes that each item in any of the reallocated arrays would not be more than 8 bytes (sizeof(void *)). oldnbatch is equal to nbatch / 2. When we take the size of the HashJoinBatchStats into account, we may notice that this check is not correct specifically for this allocation, and is off by 10 times of the value of the faulty repalloc() call. The issue is that batchstats grows faster than other arrays containing pointers. So this check does not prevent the faulty statistics batches reallocation. When we are nearing MaxAllocSize with nbatch * sizeof(stats->batchstats[0]), and try to reallocate the batchstats member, we will encounter an error, because allocation size requested will be more than MaxAllocSize. Use repalloc_huge() instead of repalloc() for statistics, to allow requesting memory up to: MaxAllocSize / ((sizeof(void *) * 2) = 67 108 863; 67 108 863 * 80 = 5 368 709 040, which is slightly less than 5GB. On a 32-bit machines, nbatch * sizeof(stats->batchstats[0]) could exceed MaxAllocHugeSize, which is 2 ^ 32 = 4GB. Add a separate check and stop if nbatch > MaxAllocHugeSize / sizeof(HashJoinBatchStats) to avoid integer overflow. On 64-bit machines, maximum allowed amount of batches is unchanged. (cherry picked from commit 2cd2c76c3731a8648167072f0b95ce93a876d95e) --- src/backend/executor/nodeHash.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c index 29ca405de7f7..652ce7cfcbe2 100644 --- a/src/backend/executor/nodeHash.c +++ b/src/backend/executor/nodeHash.c @@ -1041,6 +1041,10 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable) if (oldnbatch > Min(INT_MAX / 2, MaxAllocSize / (sizeof(void *) * 2))) return; + /* avoid repalloc_huge overflow on 32 bit systems */ + if (stats && oldnbatch > MaxAllocHugeSize / (sizeof(HashJoinBatchStats) * 2)) + return; + /* A reusable hash table can only respill during first pass */ AssertImply(hashtable->hjstate->reuse_hashtable, hashtable->first_pass); @@ -1078,8 +1082,17 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable) { Size sz = nbatch * sizeof(stats->batchstats[0]); + /* + * We use repalloc_huge because the condition in the beginning assumes + * that oldnbatches and nbatches will be used to index values of size 8 + * or less (size of void*), but the size of HashJoinBatchStats + * structure is 80 bytes, so even if we pass that check, this still + * might not fit in the MaxAllocSize. The maximum amount of memory we + * can request here is slightly less than 5GB, estimated for + * MaxAllocSize = 1GB. + */ stats->batchstats = - (HashJoinBatchStats *) repalloc(stats->batchstats, sz); + (HashJoinBatchStats *) repalloc_huge(stats->batchstats, sz); sz = (nbatch - stats->nbatchstats) * sizeof(stats->batchstats[0]); memset(stats->batchstats + stats->nbatchstats, 0, sz); stats->nbatchstats = nbatch;