[Patch][Bug] Incorrect usage of PageGetFreeSpace instead of PageGetHeapFreeSpace in heap_xlog_visible
| От | Alexey Makhmutov |
|---|---|
| Тема | [Patch][Bug] Incorrect usage of PageGetFreeSpace instead of PageGetHeapFreeSpace in heap_xlog_visible |
| Дата | |
| Msg-id | 66c011a7-2f26-4ab9-a5b0-1afcb8523367@postgrespro.ru обсуждение |
| Список | pgsql-hackers |
This is a spin-off from the thread https://www.postgresql.org/message-id/flat/596c4f1c-f966-4512-b9c9-dd8fbcaf0928%40postgrespro.ru to cover the second issue discussed in it. This issue is not more relevant for PG19, but is still present in previous versions. In issues seems to be just a small overlook in the ab7dbd6, which used 'PageGetFreeSpace' in 'heap_xlog_visible' function to calculate available space in page block instead of 'PageGetHeapFreeSpace'. The difference between these functions is that ‘PageGetHeapFreeSpace’ additionally checks that there is a space in line pointers section (i.e. checks that MaxHeapTuplesPerPage is not exceeded). If MaxHeapTuplesPerfPage items are already used on the page, then free space for this block need to be reported as 0, as no new tuple could be added to the page. Such case could be observed if page has large number of redirect slots created as result of HOT cleanups. Basically, every code working with heap pages should use ‘PageGetHeapFreeSpace’ rather than ‘PageGetFreeSpace’ and function ‘heap_xlog_visible’ is the only place in the code where ‘PageGetFreeSpace’ is used for heap pages. Usage of incorrect function could cause incorrect data being written to the FSM on replica: if block still have free space, but has already reached the MaxHeapTuplesPerPage limit, then it should be marked as unavailable for new rows in FSM, otherwise inserter will need to check and update its FSM data as well. This issue was already fixed in PG 19 as result of a881cc9, but is still present in previous versions. I think that situation with difference between results of these two functions is rarely observed, but it seems logical to just call the correct function in all cases. A patch is attached which could be applied on PG 14-17 to fix the issue (just replacing PageGetFreeSpace with PageGetHeapFreeSpace). For PG 18 the function was moved to the heapam_xlog.c instead of heapam.c, so patch may need a small adjustment (version attached to the previous thread could be used in this case). A small synthetic test case (test_case2.zip) is also attached. It could simulate a situation leading to the incorrect FSM data on the replica side. The archive contains script ‘test_prepare.sh’, which could be edited to specify path to PG installation and port numbers. Invocation of this script under the 'postgres' OS user creates two databases (master and replica), performs inserts and updates using the 'insert_hot1.sql' script to create block with large number of HOT updates chains and then checks state of the FSM data for the table on both the master and replica server. Without the fix script shows full block (0 free space) on master and partially filled block (~1150 bytes of free space) on replica. With the fix both master and replica correctly shows 0 free space in the block. To stop and cleanup created servers './test_prepare.sh --cleanup' command could be used. Thanks, Alexey
Вложения
В списке pgsql-hackers по дате отправления: