Обсуждение: BRIN: Prevent the heapblk overflow during index summarization on very large tables resulting in an infinite loop
Previously, heapBlk was defined as an unsigned 32-bit integer. When incremented
by pagesPerRange on very large tables, it could wrap around, causing the condition
heapBlk < nblocks to remain true indefinitely — resulting in an infinite loop.
This could cause the PostgreSQL backend to hang, consuming 100% CPU indefinitely
and preventing operations from completing on large tables.
The solution is straightforward — the data type of `heapBlk` has been changed
from a 32-bit integer to a 64-bit `BlockNumber` (int64), ensuring it can safely
handle extremely large tables without risk of overflow.
This was explained very nicely by Tomas Vondra[1] and below two solutions were
suggested.
i) Change to int64
ii) Tracking the prevHeapBlk
Among these two I feel using solution #1 would be more feasible(similar to previously used solution 4bc6fb57f774ea18187fd8565aad9994160bfc17[2]), though
other solution also works.
I’ve attached a patch with the changes for solution #1.
Kindly review it and share your feedback or suggestions — your input would be greatly appreciated.
Reference:
[1] https://www.postgresql.org/message-id/b8a4e04c-c091-056c-a379-11d35c7b2d8d%40enterprisedb.com
[2] https://github.com/postgres/postgres/commit/4bc6fb57f774ea18187fd8565aad9994160bfc17
Вложения
On Sun, 19 Oct 2025 at 19:03, sunil s <sunilfeb26@gmail.com> wrote: > Previously, heapBlk was defined as an unsigned 32-bit integer. When incremented > by pagesPerRange on very large tables, it could wrap around, causing the condition > heapBlk < nblocks to remain true indefinitely — resulting in an infinite loop. > This was explained very nicely by Tomas Vondra[1] and below two solutions were > > suggested. > i) Change to int64 > ii) Tracking the prevHeapBlk This is similar to what table_block_parallelscan_nextpage() has to do to avoid wrapping around when working with tables containing near 2^32 pages. I'd suggest using uint64 rather than int64 and also adding a comment to mention why that type is being used rather than BlockNumber. Something like: "We make use of uint64 for heapBlk as a BlockNumber could wrap for tables with close to 2^32 pages." You could move the declaration of heapBlk into the for loop line so that the comment about using uint64 is closer to the declaration. I suspect you will also want to switch to using uint64 for the "pageno" variable at the end of the loop. My compiler isn't warning about the "pageno = heapBlk;", but maybe other compilers will. Not for this patch, but I wonder why we switch memory contexts so often in that final tbm_add_page() loop. I think it would be better to switch once before the loop and back again after it. What's there seems a bit wasteful for any pagesPerRange > 1. David
I’ve addressed your comments and attached a rebased patch.
Вложения
On Mon, Oct 20, 2025 at 03:11:01PM +0530, sunil s wrote: > Thanks, David, for providing feedback on the changes. > I’ve addressed your comments and attached a rebased patch. Using signed or unsigned is not going to matter much at the end. We would be far from the count even if the number is signed. + * Since heapBlk is incremented by opaque->bo_pagesPerRange, it can exceed + * the maximum 32-bit limit (2^32) on very large tables, potentially causing + * the loop to become infinite. + * + * To prevent this overflow, the counter must use a 64-bit type, ensuring it + * can handle cases where nblocks approaches 2^32. 2^32 is mentioned twice. A simpler suggestion: Since heapBlk is incremented by opaque->bo_pagesPerRange, it could exceed the maximum 32-bit limit (2^32) on very large tables and wraparound. The counter must be 64 bits wide for this reason. Like totalpages, there is an argument about consistency based on the result type of bringetbitmap(). It's minor, still. It would be simpler to switch "pageno" to be 64-bit wide as well, rather than casting it back to BlockNumber. -- Michael
Вложения
On Tue, 21 Oct 2025 at 15:55, Michael Paquier <michael@paquier.xyz> wrote: > Using signed or unsigned is not going to matter much at the end. We > would be far from the count even if the number is signed. I'd leave it as uint64. There's no reason to mixup the signedness between these two variables. > + * Since heapBlk is incremented by opaque->bo_pagesPerRange, it can exceed > + * the maximum 32-bit limit (2^32) on very large tables, potentially causing > + * the loop to become infinite. > + * > + * To prevent this overflow, the counter must use a 64-bit type, ensuring it > + * can handle cases where nblocks approaches 2^32. > > 2^32 is mentioned twice. A simpler suggestion: > Since heapBlk is incremented by opaque->bo_pagesPerRange, it could > exceed the maximum 32-bit limit (2^32) on very large tables and > wraparound. The counter must be 64 bits wide for this reason. I wasn't a fan of that change either. I suggested "We make use of uint64 for heapBlk as a BlockNumber could wrap for tables with close to 2^32 pages.", but that's not what happened. > Like totalpages, there is an argument about consistency based on the > result type of bringetbitmap(). It's minor, still. I don't think totalpages being int64 is an argument to make the heapBlk int64. > It would be simpler to switch "pageno" to be 64-bit wide as well, > rather than casting it back to BlockNumber. I suggested that too, but ... I'm happy to finish this one off. I was leaving it for Tomas to comment, but I think he'll be busy with pgconf.eu for the next few days. David
On Tue, Oct 21, 2025 at 06:32:01PM +1300, David Rowley wrote: > I'd leave it as uint64. There's no reason to mixup the signedness > between these two variables. That's fine as well. > I'm happy to finish this one off. I was leaving it for Tomas to > comment, but I think he'll be busy with pgconf.eu for the next few > days. Cool, thanks. I'm leaving that stuff up to any of you, have fun. FWIW, I am on the same line as you. Your suggestions are better than what the proposed patch does, as far as I've looked. -- Michael
Вложения
On Tue, 21 Oct 2025 at 18:53, Michael Paquier <michael@paquier.xyz> wrote: > FWIW, I am on the same line as you. Your suggestions are better than > what the proposed patch does, as far as I've looked. Pushed with the agreed comment change and type change to pageno. David