BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
Дата
Msg-id 17717-6c50eb1c7d23a886@postgresql.org
обсуждение исходный текст
Ответы Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17717
Logged by:          Gunnar L
Email address:      postgresql@taljaren.se
PostgreSQL version: 15.0
Operating system:   Ubuntu Linux
Description:

We have observed a significant slowdown in vacuumdb performance between
different versions of postgresql. And possibly also a memory issue.

We run a specific data model, where each customer has its own schema with
its own set of tables. Each database server hosts 16 databases, each
containing around 250 customer schemas. Due to postgres creating a new file
for each database object, we end up with around 5 million files on each
database server. This may or may not be related to the issue we're seeing
(new algorithms with new time complexity?)

We upgraded from postgresql 9.5 to postgresql 13, and noticed a significant
slowdown in how vacuumdb performs. Before, we could run a vacuumdb -a -z
each night, taking around 2 hours to complete. After the upgrade, we see a
constant 100% CPU utilization during the vacuumdb process (almost no I/O
activity), and vacuumdb cannot complete within a reasonable time. We're able
to vacuum about 3-4 databases each night.

We are able to recreate this issue, using a simple bash script to generate a
similar setup.

From local testing, here are our findings:

Concerning speed:
* Version 9.5, 10, 11 are fast  (9.5 slower than 10 and 11)
* Version 12, 13, 14 are very, very slow
* Version 15 is faster (a lot faster than 12,13,14) but not nearly as fast
as 10 or 11.

Concerning memory usage:
* Version 15 is using a lot more shared memory OR it might not be releasing
it properly after vacuuming a db.

These are the timings for vacuuming the 16 dbs.

Version   Seconds   Completed
------------------------------
9.5       412       16/16
10        178       16/16
11        166       16/16
12       8319        1/16 or 2/16 (manually aborted)
13      18853        3/16 or 4/16 (manually aborted)
14      16857        3/16 or 4/16 (manually aborted)
15        617        1/16 (crashed!)
15       4158        6/16 (crashed! --shm-size=256mb)
15       9500       16/16 (--shm-size=4096mb)

The timing of the only successful run for postgres 15 is somewhat flaky,
since the machine was suspended for about 1-1.5 hours so 9500 is only an
estimate, but the first run (1 db completed in 10 minutes) gives that it is
faster than 12-14 but slower than 10 and 11 (3 minutes to complete
everything)


The following describes our setup
This is the script (called  setup.sh)  we’re using to populate the databases
(we give a port number as parameter)

##### start of setup.sh
export PGPASSWORD=mysecretpassword
PORT=$1

echo ""> tables_$PORT.sql
for schema in `seq -w 1 250`; do
    echo "create schema schema$schema;" >> tables_$PORT.sql
    for table in `seq -w 1 500`; do
        echo "create table schema$schema.table$table (id int);" >>
tables_$PORT.sql
    done
done

echo "Setting up db: 01"
createdb -h localhost -U postgres -p $PORT  db01
psql -q -h localhost -U postgres -p $PORT db01 -f tables_$PORT.sql

# This seems to be the fastest way to create the databases
for db in `seq -w 2 16`; do
    echo "Setting up db: $db"
    createdb -h localhost -U postgres -p $PORT --template db01 db$db
done
####### end of setup.sh



To execute a test for a particular postgres version (in this example PG
9.5), we run the following. It will setup PG 9.5 on port 15432.

docker run --rm --name pg95 -e POSTGRES_PASSWORD=mysecretpassword -p
15432:5432 -d postgres:9.5
./setup.sh 15432
date; time docker exec -it pg95 bash -c "vacuumdb -a -z -U postgres"; date

(The date commands are added to keep track of when tasks were started).





Here are complete set of commands and output and  comments 
(We use different ports for different versions of PG)

date; time docker exec -it pg95 bash -c "vacuumdb -a -z -U postgres"; date
(The date commands since it takes some time to run)









time docker exec -it pg95 bash -c "vacuumdb -a -z -U postgres"
vacuumdb: vacuuming database "db01"
…<snip>...
vacuumdb: vacuuming database "db16"
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"

real    6m52,070s
user    0m0,048s
sys    0m0,029s


time docker exec -it pg10 bash -c "vacuumdb -a -z -U postgres"
vacuumdb: vacuuming database "db01"
…<snip>...
vacuumdb: vacuuming database "db16"
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"

real    2m58,354s
user    0m0,043s
sys    0m0,013s





time docker exec -it pg11 bash -c "vacuumdb -a -z -U postgres"
vacuumdb: vacuuming database "db01"
…<snip>...
vacuumdb: vacuuming database "db16"
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"

real    2m46,181s
user    0m0,047s
sys    0m0,012s




date; time docker exec -it pg12 bash -c "vacuumdb -a -z -U postgres"; date
lör 10 dec 2022 18:57:43 CET
vacuumdb: vacuuming database "db01"
vacuumdb: vacuuming database "db02"
^CCancel request sent
vacuumdb: error: vacuuming of table "schema241.table177" in database "db02"
failed: ERROR:  canceling statement due to user request

real    138m39,600s
user    0m0,177s
sys    0m0,418s
lör 10 dec 2022 21:16:22 CET




date;time docker exec -it pg13 bash -c "vacuumdb -a -z -U postgres"
lör 10 dec 2022 07:22:32 CET
vacuumdb: vacuuming database "db01"
vacuumdb: vacuuming database "db02"
vacuumdb: vacuuming database "db03"
vacuumdb: vacuuming database "db04"
^CCancel request sent

real    314m13,172s
user    0m0,551s
sys    0m0,663s
lör 10 dec 2022 12:37:03 CET



date;time docker exec -it pg14 bash -c "vacuumdb -a -z -U postgres"; date
lör 10 dec 2022 14:15:37 CET
vacuumdb: vacuuming database "db01"
vacuumdb: vacuuming database "db02"
vacuumdb: vacuuming database "db03"
vacuumdb: vacuuming database "db04"
^CCancel request sent

real    280m57,172s
user    0m0,586s
sys    0m0,559s
lör 10 dec 2022 18:56:34 CET



date;time docker exec -it pg15 bash -c "vacuumdb -a -z -U postgres"; date
lör 10 dec 2022 12:50:25 CET
vacuumdb: vacuuming database "db01"
vacuumdb: vacuuming database "db02"
vacuumdb: error: processing of database "db02" failed: ERROR:  could not
resize shared memory segment "/PostgreSQL.2952321776" to 27894720 bytes: No
space left on device

real    10m17,913s
user    0m0,030s
sys    0m0,049s
lör 10 dec 2022 13:00:43 CET

# it was faster, but we need  to extend shared memory to make it work


docker run --rm --name pg15 --shm-size=256mb -e
POSTGRES_PASSWORD=mysecretpassword -p 55555:5432 -d postgres:15

date;time docker exec -it pg15 bash -c "vacuumdb -a -z -U postgres"; date
mån 12 dec 2022 08:56:17 CET
vacuumdb: vacuuming database "db01"
…<snip>...
vacuumdb: vacuuming database "db07"
vacuumdb: error: processing of database "db07" failed: ERROR:  could not
resize shared memory segment "/PostgreSQL.1003084622" to 27894720 bytes: No
space left on device

real    69m18,345s
user    0m0,217s
sys    0m0,086s
mån 12 dec 2022 10:05:36 CET



docker run --rm --name pg15 --shm-size=4096mb -e
POSTGRES_PASSWORD=mysecretpassword -p 55555:5432 -d postgres:15

date;time docker exec -it pg15 bash -c "vacuumdb -a -z -U postgres"; date
mån 12 dec 2022 11:16:11 CET
vacuumdb: vacuuming database "db01"
…<snip>...
vacuumdb: vacuuming database "db16"
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"

real    232m46,168s
user    0m0,227s
sys    0m0,467s
mån 12 dec 2022 15:08:57 CET



Here is the hardware that was used
AMD Ryzen 7 PRO 5850U with Radeon Graphics
8 Cores, 16 threads

$ free
               total        used        free      shared  buff/cache
available
Mem:        28562376     5549716      752624     1088488    22260036
21499752
Swap:         999420      325792      673628

Disk:    NVMe device, Samsung SSD 980 1TB


В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17716: walsender process hang while decoding 'DROP PUBLICATION' XLOG
Следующее
От: Bowen Shi
Дата:
Сообщение: Re: BUG #17716: walsender process hang while decoding 'DROP PUBLICATION' XLOG