Microvacuum support for Hash Index

Поиск
Список
Период
Сортировка
От Ashutosh Sharma
Тема Microvacuum support for Hash Index
Дата
Msg-id CAE9k0PkRSyzx8dOnokEpUi2A-RFZK72WN0h9DEMv_ut9q6bPRw@mail.gmail.com
обсуждение исходный текст
Ответы Re: Microvacuum support for Hash Index  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Hi All,

I have added a microvacuum support for hash index access method and
attached is the v1 patch for the same. The patch basically takes care
of the following things:

1. Firstly, it changes the marking of dead tuples from
'tuple-at-a-time' to 'page-at-a-time' during hash index scan. For this
we accumulate the heap tids and offset of all the hash index tuples if
it is pointed by kill_prior_tuple during scan and then mark all
accumulated tids as LP_DEAD either while stepping from one page to
another (assuming the scan in both forward and backward direction) or
during end of the hash index scan or during rescan.

2. Secondly, when inserting tuple into hash index table, if not enough
space is found on a current page then it ensures that we first clean
the dead tuples if found in the current hash index page before moving
to the next page in a bucket chain or going for a bucket split. This
basically increases the page reusability and reduces the number of
page splits, thereby reducing the overall size of hash index table.

I have compared the hash index size with and without my patch
(microvacuum_hash_index_v1.patch attached with this mail) on a high
end machine at various scale factors and the results are shown below.
For testing this, i have created hash index (pgbench_accounts_aid) on
aid column of 'pgbench_accounts' table instead of primary key and the
results shown in below table are for the same. The patch
(pgbench.patch) having these changes is also attached with this mail.
Moreover, I am using my own script file (file_hash_kill_prior_tuple)
for updating the index column with pgbench read-write command. The
script file 'file_hash_kill_prior_tuple' is also attached with this
mail.

Here are some initial test results showing the benefit of this patch:

postgresql.conf and pgbench settings:
autovacuum=off
client counts = 64
run time duration = 15 mins

./pgbench -c $threads -j $threads -T 900 postgres -f
~/file_hash_kill_prior_tuple

Scale Factor    hash index size @ start         HEAD        HEAD + Patch
10                   32 MB                                 579 MB      158 MB
50                   128 MB                               630 MB      350 MB
100                 256 MB                              1255 MB      635 MB
300                 1024 MB                            2233 MB      1093 MB


As shown in above result, at 10 scale factor the hash index size has
reduced by almost 4 times whereas at 50 and 300 scale factors it has
reduced by half with my patch. This basically proves that we can
reduce the hash index size to a good extent with this patch.

System specifications:
---------------------------------
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             8
NUMA node(s):          8
Vendor ID:             GenuineIntel


Note: The patch (microvacuum_hash_index_v1.patch) is prepared on top
of concurrent_hash_index_v8.patch-[1] and wal_hash_index_v5.1.patch[2]
for hash index.


[1] - https://www.postgresql.org/message-id/CAA4eK1%2BX%3D8sUd1UCZDZnE3D9CGi9kw%2Bkjxp2Tnw7SX5w8pLBNw%40mail.gmail.com
[2] -
https://www.postgresql.org/message-id/CAA4eK1KE%3D%2BkkowyYD0vmch%3Dph4ND3H1tViAB%2B0cWTHqjZDDfqg%40mail.gmail.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: issue with track_commit_timestamp and server restart
Следующее
От: Etsuro Fujita
Дата:
Сообщение: Re: Push down more full joins in postgres_fdw