BTMaxItemSize seems to be subtly incorrect

Поиск
Список
Период
Сортировка
От Robert Haas
Тема BTMaxItemSize seems to be subtly incorrect
Дата
Msg-id CA+Tgmoa7UBxivM7f6Ocx_qbq4=ky3uXc+WZNOBcVX+kvJvWOEA@mail.gmail.com
обсуждение исходный текст
Ответы Re: BTMaxItemSize seems to be subtly incorrect  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
We have these two definitions in the source code:

#define BTMaxItemSize(page) \
     MAXALIGN_DOWN((PageGetPageSize(page) - \
                    MAXALIGN(SizeOfPageHeaderData + \
                             3*sizeof(ItemIdData)  + \
                             3*sizeof(ItemPointerData)) - \
                    MAXALIGN(sizeof(BTPageOpaqueData))) / 3)
#define BTMaxItemSizeNoHeapTid(page) \
     MAXALIGN_DOWN((PageGetPageSize(page) - \
                    MAXALIGN(SizeOfPageHeaderData + 3*sizeof(ItemIdData)) - \
                    MAXALIGN(sizeof(BTPageOpaqueData))) / 3)

In my tests, PageGetPageSize(page) = 8192, SizeOfPageHeaderData = 24,
sizeof(ItemIdData) = 4, sizeof(ItemPointerData) = 6, and
sizeof(BTPageOpaqueData) = 16. Assuming MAXIMUM_ALIGNOF == 8, I
believe that makes BTMaxItemSize come out to 2704 and
BTMaxItemSizeNoHeapTid come out to 2712. I have no quibble with the
formula for BTMaxItemSizeNoHeapTid. It's just saying that if you
subtract out the page header data, the special space, and enough space
for 3 line pointers, you have a certain amount of space left (8152
bytes) and so a single item shouldn't use more than a third of that
(2717 bytes) but since items use up space in increments of MAXALIGN,
you have to round down to the next such multiple (2712 bytes).

But what's up with BTMaxItemSize? Here, the idea as I understand it is
that we might need to add a TID to the item, so it has to be small
enough to squeeze one in while still fitting under the limit. And at
first glance everything seems to be OK, because BTMaxItemSize comes
out to be 8 bytes less than BTMaxItemSizeNoHeapTid and that's enough
space to fit a heap TID for sure. However, it seems to me that the
formula calculates this as if those additional 6 bytes were being
separately added to the page header or the line pointer array, whereas
in reality they will be part of the tuple itself. I think that we
should be subtracting sizeof(ItemPointerData) at the very end, rather
than subtracting 3*sizeof(ItemPointerData) from the space available to
be divided by three.

To see why, suppose that sizeof(BTPageOpaqueData) were 24 rather than
16. Then we'd have:

BTMaxItemSize = MAXALIGN_DOWN((8192 - MAXALIGN(24 + 3 * 4 + 3 * 6) -
MAXALIGN(24)) / 3) = MAXALIGN_DOWN((8192 - MAXALIGN(54)  - 24) / 3) =
MAXALIGN_DOWN(2704) = 2704
BTMaxItemSizeNoHeapTid = MAXALIGN_DOWN((8192 - MAXALIGN(24 + 3 * 4) -
MAXALIGN(24)) / 3 = MAXALIGN_DOWN((8192 - MAXALIGN(36)  - 24) / 3) =
MAXALIGN_DOWN(2709) = 2704

That's a problem, because if in that scenario you allow three 2704
byte items that don't need a heap TID and later you find you need to
add a heap TID to one of those items, the result will be bigger than
2704 bytes, and then you can't fit three of them into a page.

Applying the attached patch and running 'make check' suffices to
demonstrate the problem for me:

diff -U3 /Users/rhaas/pgsql/src/test/regress/expected/vacuum.out
/Users/rhaas/pgsql/src/test/regress/results/vacuum.out
--- /Users/rhaas/pgsql/src/test/regress/expected/vacuum.out
2022-06-06 14:46:17.000000000 -0400
+++ /Users/rhaas/pgsql/src/test/regress/results/vacuum.out
2022-06-08 17:20:58.000000000 -0400
@@ -137,7 +137,9 @@
     repeat('1234567890',269));
 -- index cleanup option is ignored if VACUUM FULL
 VACUUM (INDEX_CLEANUP TRUE, FULL TRUE) no_index_cleanup;
+ERROR:  cannot insert oversized tuple of size 2712 on internal page
of index "no_index_cleanup_idx"
 VACUUM (FULL TRUE) no_index_cleanup;
+ERROR:  cannot insert oversized tuple of size 2712 on internal page
of index "no_index_cleanup_idx"
 -- Toast inherits the value from its parent table.
 ALTER TABLE no_index_cleanup SET (vacuum_index_cleanup = false);
 DELETE FROM no_index_cleanup WHERE i < 15;

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch
Следующее
От: Roberto C. Sánchez
Дата:
Сообщение: Re: Request for assistance to backport CVE-2022-1552 fixes to 9.6 and 9.4