Re: Improve heapgetpage() performance, overhead from serializable

Поиск
Список
Период
Сортировка
От Zhang Mingli
Тема Re: Improve heapgetpage() performance, overhead from serializable
Дата
Msg-id 4281c260-ec5e-40aa-bcc9-5e3888ca4a48@Spark
обсуждение исходный текст
Ответ на Improve heapgetpage() performance, overhead from serializable  (Andres Freund <andres@anarazel.de>)
Ответы Re: Improve heapgetpage() performance, overhead from serializable  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

Regards,
Zhang Mingli
On Jul 16, 2023 at 09:57 +0800, Andres Freund <andres@anarazel.de>, wrote:
Hi,

Several loops which are important for query performance, like heapgetpage()'s
loop over all tuples, have to call functions like
HeapCheckForSerializableConflictOut() and PredicateLockTID() in every
iteration.

When serializable is not in use, all those functions do is to to return. But
being situated in a different translation unit, the compiler can't inline
(without LTO at least) the check whether serializability is needed. It's not
just the function call overhead that's noticable, it's also that registers
have to be spilled to the stack / reloaded from memory etc.

On a freshly loaded pgbench scale 100, with turbo mode disabled, postgres
pinned to one core. Parallel workers disabled to reduce noise. All times are
the average of 15 executions with pgbench, in a newly started, but prewarmed
postgres.

SELECT * FROM pgbench_accounts OFFSET 10000000;
HEAD:
397.977

removing the HeapCheckForSerializableConflictOut() from heapgetpage()
(incorrect!), to establish the baseline of what serializable costs:
336.695

pulling out CheckForSerializableConflictOutNeeded() from
HeapCheckForSerializableConflictOut() in heapgetpage(), and avoiding calling
HeapCheckForSerializableConflictOut() in the loop:
339.742

moving the loop into a static inline function, marked as pg_always_inline,
called with static arguments for always_visible, check_serializable:
326.546

marking the always_visible, !check_serializable case likely():
322.249

removing TestForOldSnapshot() calls, which we pretty much already decided on:
312.987


FWIW, there's more we can do, with some hacky changes I got the time down to
273.261, but the tradeoffs start to be a bit more complicated. And 397->320ms
for something as core as this, is imo worth considering on its own.




Now, this just affects the sequential scan case. heap_hot_search_buffer()
shares many of the same pathologies. I find it a bit harder to improve,
because the compiler's code generation seems to switch between good / bad with
changes that seems unrelated...


I wonder why we haven't used PageIsAllVisible() in heap_hot_search_buffer() so
far?


Greetings,

Andres Freund

LGTM and I have a fool question: 

if (likely(all_visible))
{
if (likely(!check_serializable))
scan->rs_ntuples = heapgetpage_collect(scan, snapshot, page, buffer,
   block, lines, 1, 0);
else
scan->rs_ntuples = heapgetpage_collect(scan, snapshot, page, buffer,
   block, lines, 1, 1);
}
else
{
if (likely(!check_serializable))
scan->rs_ntuples = heapgetpage_collect(scan, snapshot, page, buffer,
   block, lines, 0, 0);
else
scan->rs_ntuples = heapgetpage_collect(scan, snapshot, page, buffer,
   block, lines, 0, 1);


Does it make sense to combine if else condition and put it to the incline function’s param?

Like:
scan->rs_ntuples = heapgetpage_collect(scan, snapshot, page, buffer,
   block, lines, all_visible, check_serializable);











В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Requiring recovery.signal or standby.signal when recovering with a backup_label
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: logicalrep_message_type throws an error