Re: Prefetch the next tuple's memory during seqscans

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: Prefetch the next tuple's memory during seqscans
Дата	2 ноября 2022 г. 20:25:44
Msg-id	20221102172544.hoszrut7tfepc3dc@awork3.anarazel.de обсуждение исходный текст
Ответ на	Re: Prefetch the next tuple's memory during seqscans (Andres Freund <andres@anarazel.de>)
Ответы	Re: Prefetch the next tuple's memory during seqscans Re: Prefetch the next tuple's memory during seqscans
Список	pgsql-hackers

Дерево обсуждения

Hi,

On 2022-11-01 20:00:43 -0700, Andres Freund wrote:
> I suspect that prefetching in heapgetpage() would provide gains as well, at
> least for pages that aren't marked all-visible, pretty common in the real
> world IME.

Attached is an experimental patch/hack for that. It ended up being more
beneficial to make the access ordering more optimal than prefetching the tuple
contents, but I'm not at all sure that's the be-all-end-all.

I separately benchmarked pinning the CPU and memory to the same socket,
different socket and interleaving memory.

I did this for HEAD, your patch, your patch and mine.

BEGIN; DROP TABLE IF EXISTS large; CREATE TABLE large(a int8 not null, b int8 not null default '0', c int8); INSERT
INTOlarge SELECT generate_series(1, 50000000);COMMIT;

server is started with
local: numactl --membind 1 --physcpubind 10
remote: numactl --membind 0 --physcpubind 10
interleave: numactl --interleave=all --physcpubind 10

benchmark stared with:
psql -qX -f ~/tmp/prewarm.sql && \
    pgbench -n -f ~/tmp/seqbench.sql -t 1 -r > /dev/null && \
    perf stat -e task-clock,LLC-loads,LLC-load-misses,cycles,instructions -C
    10 \
    pgbench -n -f ~/tmp/seqbench.sql -t 3 -r

seqbench.sql:
SELECT count(*) FROM large WHERE c IS NOT NULL;
SELECT sum(a), sum(b), sum(c) FROM large;
SELECT sum(c) FROM large;

branch            memory        time s   miss %
head              local         31.612   74.03
david             local         32.034   73.54
david+andres      local         31.644   42.80
andres            local         30.863   48.05

head              remote        33.350   72.12
david             remote        33.425   71.30
david+andres      remote        32.428   49.57
andres            remote        30.907   44.33

head              interleave    32.465   71.33
david             interleave    33.176   72.60
david+andres      interleave    32.590   46.23
andres            interleave    30.440   45.13

It's cool seeing how doing optimizing heapgetpage seems to pretty much remove
the performance difference between local / remote memory.

It makes some sense that David's patch doesn't help in this case - without
all-visible being set the tuple headers will have already been pulled in for
the HTSV call.

I've not yet experimented with moving the prefetch for the tuple contents from
David's location to before the HTSV. I suspect that might benefit both
workloads.

Greetings,

Andres Freund

Вложения

prefetch-heapgetpage.diff

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 02 ноября 2022 г., 20:20:28
Сообщение: Re: Error for row-level triggers with transition tables on partitioned tables

Следующее

От: Andres Freund
Дата: 02 ноября 2022 г., 20:27:06
Сообщение: Re: spinlock support on loongarch64

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Prefetch the next tuple's memory during seqscans

Вложения

Предыдущее

Следующее