Re: Prefetch the next tuple's memory during seqscans

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Prefetch the next tuple's memory during seqscans
Дата
Msg-id 20221102172544.hoszrut7tfepc3dc@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Prefetch the next tuple's memory during seqscans  (Andres Freund <andres@anarazel.de>)
Ответы Re: Prefetch the next tuple's memory during seqscans  (Andres Freund <andres@anarazel.de>)
Re: Prefetch the next tuple's memory during seqscans  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
Hi,

On 2022-11-01 20:00:43 -0700, Andres Freund wrote:
> I suspect that prefetching in heapgetpage() would provide gains as well, at
> least for pages that aren't marked all-visible, pretty common in the real
> world IME.

Attached is an experimental patch/hack for that. It ended up being more
beneficial to make the access ordering more optimal than prefetching the tuple
contents, but I'm not at all sure that's the be-all-end-all.


I separately benchmarked pinning the CPU and memory to the same socket,
different socket and interleaving memory.

I did this for HEAD, your patch, your patch and mine.

BEGIN; DROP TABLE IF EXISTS large; CREATE TABLE large(a int8 not null, b int8 not null default '0', c int8); INSERT
INTOlarge SELECT generate_series(1, 50000000);COMMIT;
 


server is started with
local: numactl --membind 1 --physcpubind 10
remote: numactl --membind 0 --physcpubind 10
interleave: numactl --interleave=all --physcpubind 10

benchmark stared with:
psql -qX -f ~/tmp/prewarm.sql && \
    pgbench -n -f ~/tmp/seqbench.sql -t 1 -r > /dev/null && \
    perf stat -e task-clock,LLC-loads,LLC-load-misses,cycles,instructions -C
    10 \
    pgbench -n -f ~/tmp/seqbench.sql -t 3 -r

seqbench.sql:
SELECT count(*) FROM large WHERE c IS NOT NULL;
SELECT sum(a), sum(b), sum(c) FROM large;
SELECT sum(c) FROM large;

branch            memory        time s   miss %
head              local         31.612   74.03
david             local         32.034   73.54
david+andres      local         31.644   42.80
andres            local         30.863   48.05

head              remote        33.350   72.12
david             remote        33.425   71.30
david+andres      remote        32.428   49.57
andres            remote        30.907   44.33

head              interleave    32.465   71.33
david             interleave    33.176   72.60
david+andres      interleave    32.590   46.23
andres            interleave    30.440   45.13

It's cool seeing how doing optimizing heapgetpage seems to pretty much remove
the performance difference between local / remote memory.


It makes some sense that David's patch doesn't help in this case - without
all-visible being set the tuple headers will have already been pulled in for
the HTSV call.

I've not yet experimented with moving the prefetch for the tuple contents from
David's location to before the HTSV. I suspect that might benefit both
workloads.

Greetings,

Andres Freund

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Error for row-level triggers with transition tables on partitioned tables
Следующее
От: Andres Freund
Дата:
Сообщение: Re: spinlock support on loongarch64