[HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

Поиск
Список
Период
Сортировка
От Sokolov Yura
Тема [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
Дата
Msg-id 35960b8af917e9268881cd8df3f88320@postgrespro.ru
обсуждение исходный текст
Ответы Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit)  (Sokolov Yura <funny.falcon@postgrespro.ru>)
Список pgsql-hackers
Good day, every one.

In attempt to improve performance of YCSB on zipfian distribution,
it were found that significant time is spent in XidInMVCCSnapshot in
scanning snapshot->xip array. While overall CPU time is not too
noticable, it has measurable impact on scaleability.

First I tried to sort snapshot->xip in GetSnapshotData, and search in a
sorted array. But since snapshot->xip is not touched if no transaction
contention occurs, sorting xip always is not best option.

Then I sorted xip array on demand in XidInMVCCSnapshot only if
search in snapshot->xip occurs (ie lazy sorting). It performs much
better, but since it is O(NlogN), sort's execution time is noticable
for large number of clients.

Third approach (present in attached patch) is making hash table lazily
on first search in xip array.

Note: hash table is not built if number of "in-progress" xids is less
than 60. Tests shows, there is no big benefit from doing so (at least
on Intel Xeon).

For this letter I've tested with pgbench and random_exponential
updating rows from pgbench_tellers (scale=300, so 3000 rows in a table).
Scripts are in attached archive. With this test configuration, numbers
are quite close to numbers from YCSB benchmark workloada.

Test machine is 4xXeon CPU E7-8890 - 72cores (144HT), fsync=on,
synchronous_commit=off.

Results:
clients |   master | hashsnap
--------+----------+----------
      25 |    67652 |    70017
      50 |   102781 |   102074
      75 |    81716 |    79440
     110 |    68286 |    69223
     150 |    56168 |    60713
     200 |    45073 |    48880
     250 |    36526 |    40893
     325 |    28363 |    32497
     400 |    22532 |    26639
     500 |    17423 |    21496
     650 |    12767 |    16461
     800 |     9599 |    13483

(Note: if pgbench_accounts is updated (30000000 rows), then exponential
distribution behaves differently from zipfian with used parameter.)

Remarks:
- it could be combined with "Cache data in GetSnapshotData"
   https://commitfest.postgresql.org/14/553/
- if CSN ever landed, then there will be no need in this optimization at 
all.

PS.
Excuse me for following little promotion of lwlock patch
https://commitfest.postgresql.org/14/1166/

clients |   master | hashsnap | hashsnap+lwlock
--------+----------+----------+--------------
      25 |    67652 |    70017 |   127601
      50 |   102781 |   102074 |   134545
      75 |    81716 |    79440 |   128655
     110 |    68286 |    69223 |   110420
     150 |    56168 |    60713 |    86715
     200 |    45073 |    48880 |    68266
     250 |    36526 |    40893 |    56638
     325 |    28363 |    32497 |    45704
     400 |    22532 |    26639 |    38247
     500 |    17423 |    21496 |    32668
     650 |    12767 |    16461 |    25488
     800 |     9599 |    13483 |    21326

With regards,
-- 
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] Persistent wait event sets and socket changes
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] Update comments in nodeModifyTable.c