Re: AIO v2.5

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: AIO v2.5
Дата	14 июля 21:44:21
Msg-id	dhohimismkcohflmvd6hkrfwgy2c7v5g4gcg5wnfrd3rtxpzxy@aqdwtknayaub обсуждение исходный текст
Ответ на	Re: AIO v2.5 (Tomas Vondra <tomas@vondra.me>)
Ответы	Re: AIO v2.5
Список	pgsql-hackers

Дерево обсуждения

On 2025-07-13 20:04:51 +0200, Tomas Vondra wrote:
> On 7/11/25 23:03, Tomas Vondra wrote:
> > ...
> >
> > e) indexscan regression (ryzen-indexscan-uniform-pg17-checksums.png)
> >
> > There's an interesting difference difference I noticed in the run with
> > checksums on PG17. The full PDF is available here:
> >
> > https://github.com/tvondra/iomethod-tests/blob/run2-17-checksums-on/ryzen-rows-cold-32GB-16-unscaled.pdf
> >
> > The interesting thing is that PG17 indexscans on uniform dataset got a
> > little bit faster. In the attached PDF it's exactly on par with PG18,
> > but here it got a bit faster. Which makes no sense, if it has to also
> > verify checksums. I haven't had time to investigate this yet.
>
> I was intrigued by this, so I looked into this today.
>
> TL;DR I believe it was caused by something in the filesystem or even the
> storage devices, making the "PG17" data directory (or maybe even just
> the "uniform" table) a bit faster.
>
> I started by reproducing the behavior with an indexscan matching 10% of
> the rows, and it was very easy to reproduce the difference shows on the
> chart (all timings in milliseconds):
>
> PG17: 14112.800 ms
> PG18: 21612.090 ms

That's a decidedly nontrivial difference.

Did you keep any metrics from those runs? E.g. whether there were larger IOs
or such?

> This was perfectly reproducible, affecting the whole table (not just one
> part of it), etc. At some point I recalled that I might have initialized
> the databases in slightly different ways - one by running the SQL, the
> other one by pg_dump/pg_restore (likely with multiple jobs).

I guess that's an INSERT ...  SELECT vs COPY?

Which one was the faster one?

If this ever re-occurs, it might be interesting to look at the fragmentation
of the underlying files with filefrag.

> I couldn't think of any other difference between the data directories,
> so I simply reloaded them by pg_restore (from the same dump). Which
> however made them both slow :O

So that suggests that COPY is the slow case, interesting.

One potentially relevant factor could be that parallel COPY into logged tables
currently leads to really sub-optimal write patterns, due to us writing back
buffers one-by-one, interspersed by WAL writes and file extensions. I know how
that affects write speed, but it's not entirely obvious how it would affect
read speed...

> And it didn't matter how many jobs are used, or anything else I tried.
> But every now and then an instance (17 or 18) happened to be fast
> (~14000 ms). Consistently, for all queries on the table, not randomly.
>
> In the end I recreated the (ext4) filesystem, loaded the databases and
> now both instances are fast. I have no idea what the root cause was, and
> I assume recreating the filesystem destroyed all the evidence.

Besides differences in filesystem level fragmentation, another potential
theory is that the SSDs were internally more fragmented. Occasionally
dumping/restoring the data could allow the drive to do internal wear
leveling before the new data is loaded, leading to a better layout.

I found that I get more consistent benchmark performance if I delete as much
of the data as possible, run fstrim -v -a and then load the data. And do
another round of fstrim.

Greetings,

Andres Freund

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: AIO v2.5