Re: Performance degradation on concurrent COPY into a single relation in PG16.

Поиск
Список
Период
Сортировка
От Jakub Wartak
Тема Re: Performance degradation on concurrent COPY into a single relation in PG16.
Дата
Msg-id CAKZiRmyQ76T83FCsQxNDxq_mf8fcwE4O=yZk8re0GVfJDS1mhg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Performance degradation on concurrent COPY into a single relation in PG16.  (Andres Freund <andres@anarazel.de>)
Ответы Re: Performance degradation on concurrent COPY into a single relation in PG16.  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Mon, Jul 10, 2023 at 6:24 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2023-07-03 11:53:56 +0200, Jakub Wartak wrote:
> > Out of curiosity I've tried and it is reproducible as you have stated : XFS
> > @ 4.18.0-425.10.1.el8_7.x86_64:
> >...
> > According to iostat and blktrace -d /dev/sda -o - | blkparse -i - output ,
> > the XFS issues sync writes while ext4 does not, xfs looks like constant
> > loop of sync writes (D) by kworker/2:1H-kblockd:
>
> That clearly won't go well.  It's not reproducible on newer systems,
> unfortunately :(. Or well, fortunately maybe.
>
>
> I wonder if a trick to avoid this could be to memorialize the fact that we
> bulk extended before and extend by that much going forward? That'd avoid the
> swapping back and forth.

I haven't seen this thread [1] "Question on slow fallocate", from XFS
mailing list being mentioned here (it was started by Masahiko), but I
do feel it contains very important hints even challenging the whole
idea of zeroing out files (or posix_fallocate()). Please especially
see Dave's reply. He also argues that posix_fallocate() !=
fallocate().  What's interesting is that it's by design and newer
kernel versions should not prevent such behaviour, see my testing
result below.

All I can add is that this those kernel versions (4.18.0) seem to very
popular across customers (RHEL, Rocky) right now and that I've tested
on most recent available one (4.18.0-477.15.1.el8_8.x86_64) using
Masahiko test.c and still got 6-7x slower time when using XFS on that
kernel. After installing kernel-ml (6.4.2) the test.c result seems to
be the same (note it it occurs only when 1st allocating space, but of
course it doesnt if the same file is rewritten/"reallocated"):

[root@rockyora ~]# uname -r
6.4.2-1.el8.elrepo.x86_64
[root@rockyora ~]# time ./test test.0 0
total   200000
fallocate       0
filewrite       200000

real    0m0.405s
user    0m0.006s
sys     0m0.391s
[root@rockyora ~]# time ./test test.0 1
total   200000
fallocate       200000
filewrite       0

real    0m0.137s
user    0m0.005s
sys     0m0.132s
[root@rockyora ~]# time ./test test.1 1
total   200000
fallocate       200000
filewrite       0

real    0m0.968s
user    0m0.020s
sys     0m0.928s
[root@rockyora ~]# time ./test test.2 2
total   200000
fallocate       100000
filewrite       100000

real    0m6.059s
user    0m0.000s
sys     0m0.788s
[root@rockyora ~]# time ./test test.2 2
total   200000
fallocate       100000
filewrite       100000

real    0m0.598s
user    0m0.003s
sys     0m0.225s
[root@rockyora ~]#

iostat -x reports during first "time ./test test.2 2" (as you can see
w_awiat is not that high but it accumulates):
Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              0.00 15394.00      0.00    122.02     0.00    13.00
0.00   0.08    0.00    0.05   0.75     0.00     8.12   0.06 100.00
dm-0             0.00 15407.00      0.00    122.02     0.00     0.00
0.00   0.00    0.00    0.06   0.98     0.00     8.11   0.06 100.00

So maybe that's just a hint that you should try on slower storage
instead? (I think that on NVMe this issue would be hardly noticeable
due to low IO latency, not like here)

-J.

[1] - https://www.spinics.net/lists/linux-xfs/msg73035.html



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: add non-option reordering to in-tree getopt_long