Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
От | Tom Lane |
---|---|
Тема | Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue) |
Дата | |
Msg-id | 3439037.1671130590@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue) (PG Bug reporting form <noreply@postgresql.org>) |
Ответы |
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue) |
Список | pgsql-bugs |
PG Bug reporting form <noreply@postgresql.org> writes: > We run a specific data model, where each customer has its own schema with > its own set of tables. Each database server hosts 16 databases, each > containing around 250 customer schemas. Due to postgres creating a new file > for each database object, we end up with around 5 million files on each > database server. This may or may not be related to the issue we're seeing > (new algorithms with new time complexity?) > We upgraded from postgresql 9.5 to postgresql 13, and noticed a significant > slowdown in how vacuumdb performs. Before, we could run a vacuumdb -a -z > each night, taking around 2 hours to complete. After the upgrade, we see a > constant 100% CPU utilization during the vacuumdb process (almost no I/O > activity), and vacuumdb cannot complete within a reasonable time. We're able > to vacuum about 3-4 databases each night. I poked into this a little bit. On HEAD, watching things with "perf" identifies vac_update_datfrozenxid() as the main time sink. It's not hard to see why: that does a seqscan of pg_class, and it's invoked at the end of each vacuum() call. So if you try to vacuum each table in the DB separately, you're going to end up spending O(N^2) time in often-useless rescans of pg_class. This isn't a huge problem in ordinary-sized DBs, but with 125000 small tables in the DB it becomes the dominant cost. > Concerning speed: > * Version 9.5, 10, 11 are fast (9.5 slower than 10 and 11) > * Version 12, 13, 14 are very, very slow > * Version 15 is faster (a lot faster than 12,13,14) but not nearly as fast > as 10 or 11. The reason for the v12 performance change is that up through v11, "vacuumdb -a -z" would just issue "VACUUM (ANALYZE);" in each DB. So vac_update_datfrozenxid only ran once. Beginning in v12 (commit e0c2933a7), vacuumdb issues a separate VACUUM command for each targeted table, which causes the problem. I'm not sure why there's a performance delta from 14 to 15. It doesn't look like vacuumdb itself had any material changes, so we must have done something different on the backend side. This may indicate that there's another O(N^2) behavior that we got rid of in v15. Anyway, that change isn't bad, so I did not poke into it too much. Conclusions: * As a short-term fix, you could try using vacuumdb from v11 with the newer servers. Or just do "psql -c 'vacuum analyze'" and not bother with vacuumdb at all. (On HEAD, with this example database, 'vacuum analyze' takes about 7 seconds per DB for me, versus ~10 minutes using vacuumdb.) * To fix vacuumdb properly, it might be enough to get it to batch VACUUMs, say by naming up to 1000 tables per command instead of just one. I'm not sure how that would interact with its parallelization logic, though. It's not really solving the O(N^2) issue either, just pushing it further out. * A better idea, though sadly not very back-patchable, could be to expose a VACUUM option to control whether it runs vac_update_datfrozenxid, so that vacuumdb can do that just once at the end. Considering that vac_update_datfrozenxid requires an exclusive lock, the current behavior is poison for parallel vacuuming quite aside from the O(N^2) issue. This might tie into some work Peter G. has been pursuing, too. regards, tom lane
В списке pgsql-bugs по дате отправления:
Следующее
От: Peter GeogheganДата:
Сообщение: Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)