Обсуждение: Performance issue after migration from 9.4 to 15
Hi list, We have recently migrated from our old PostgreSQL 9.4 server to a PostgreSQL 15 server. Even though the new machine has moreresources, we see a considerable decrease in the performance of some of our heavier queries, and I have no idea whereI should start tuning. ? Old database: PostgreSQL 9.4.26 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit New database: PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit I tested the same query against the old and the new database. Old: https://explain.dalibo.com/plan/b7d7gab73f6c7274#plan/node/21 New: https://explain.dalibo.com/plan/84h16cg1f19266be#plan/node/34 Particularly interesting are the sequential scans. In the old plan, we have node #21, which took 32 seconds. Almost all ofthe time goes into actual I/O. In the new plan, the corresponding node is #34. It took 55 seconds, but less than 1 secondwas actual I/O (because most of the data was already in memory). Why did this step still take about twice the time? There is another Seq Scan of the same table. Node #10 in the old plan took 3 seconds, whereas the corresponding node #21in the new plan took more than 2 minutes (of which less than 2 seconds was actual I/O). Am I misreading the plans? If not, I have no idea why the sequential scans take so much longer in the new database, eventhough the I/O is even faster than before. The configuration was left almost unchanged, with only some adjustments due to changes between the versions. As far as Ican tell, none of these changes is performance related. Can anybody give me a hint into which direction I should investigate further? Thanks, Christian
Hi, I will definitely give it a try; however, I do not understand how this could explain the observed performance changes. Asfar as I understand, an outdated/non-optimized index would not be relevant in a sequential scan, and outdated statisticscould cause a non-optimal plan to be chosen, but they should not affect the performance of a single sequentialscan. Best, Christian From: Inzamam Shafiq <inzamam.shafiq@hotmail.com> Sent: Tuesday, April 11, 2023 1:25 PM To: Christian Schröder <Christian.Schroeder@solvians.com> Subject: Re: Performance issue after migration from 9.4 to 15 Hi, How did you upgrade the databse? if it is done by pg_upgrade then I think you have to do the following, Try re-indexing the database Update/analyze statistics Regards, Inzamam Shafiq Sr. DBA ________________________________________ From: Christian Schröder <mailto:Christian.Schroeder@solvians.com> Sent: Tuesday, April 11, 2023 3:58 PM To: mailto:pgsql-general@lists.postgresql.org <mailto:pgsql-general@lists.postgresql.org> Subject: Performance issue after migration from 9.4 to 15 Hi list, We have recently migrated from our old PostgreSQL 9.4 server to a PostgreSQL 15 server. Even though the new machine has moreresources, we see a considerable decrease in the performance of some of our heavier queries, and I have no idea whereI should start tuning. ? Old database: PostgreSQL 9.4.26 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit New database: PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit I tested the same query against the old and the new database. Old: https://explain.dalibo.com/plan/b7d7gab73f6c7274#plan/node/21 New: https://explain.dalibo.com/plan/84h16cg1f19266be#plan/node/34 Particularly interesting are the sequential scans. In the old plan, we have node #21, which took 32 seconds. Almost all ofthe time goes into actual I/O. In the new plan, the corresponding node is #34. It took 55 seconds, but less than 1 secondwas actual I/O (because most of the data was already in memory). Why did this step still take about twice the time? There is another Seq Scan of the same table. Node #10 in the old plan took 3 seconds, whereas the corresponding node #21in the new plan took more than 2 minutes (of which less than 2 seconds was actual I/O). Am I misreading the plans? If not, I have no idea why the sequential scans take so much longer in the new database, eventhough the I/O is even faster than before. The configuration was left almost unchanged, with only some adjustments due to changes between the versions. As far as Ican tell, none of these changes is performance related. Can anybody give me a hint into which direction I should investigate further? Thanks, Christian
On Tue, 11 Apr 2023 at 23:03, Christian Schröder <Christian.Schroeder@solvians.com> wrote: > We have recently migrated from our old PostgreSQL 9.4 server to a PostgreSQL 15 server. Even though the new machine hasmore resources, we see a considerable decrease in the performance of some of our heavier queries, and I have no idea whereI should start tuning. ? Using pg_upgrade? Did you run ANALYZE? If not then you may be suffering from lack of statistics leading to bad plans. > Old: https://explain.dalibo.com/plan/b7d7gab73f6c7274#plan/node/21 > New: https://explain.dalibo.com/plan/84h16cg1f19266be#plan/node/34 I'm not well versed in looking at JSON output plans. You might get a better overall response posting to https://explain.depesz.com/. Personally, I always look for raw output, and at least with depesz, that's text-based. > Particularly interesting are the sequential scans. In the old plan, we have node #21, which took 32 seconds. Almost allof the time goes into actual I/O. In the new plan, the corresponding node is #34. It took 55 seconds, but less than 1second was actual I/O (because most of the data was already in memory). Why did this step still take about twice the time? Perhaps your 15 server is under more load than 9.4 due to all concurrent plans being slower from bad statistics? Load averages might be a good indicator. (I assume the server is busy due to the "Workers Launched": 0) > Am I misreading the plans? If not, I have no idea why the sequential scans take so much longer in the new database, eventhough the I/O is even faster than before. Looks that way to me too. > Can anybody give me a hint into which direction I should investigate further? Probably just run ANALYZE on the database in question. David
David Rowley <dgrowleyml@gmail.com> writes: > On Tue, 11 Apr 2023 at 23:03, Christian Schröder > <Christian.Schroeder@solvians.com> wrote: >> Particularly interesting are the sequential scans. In the old plan, we have node #21, which took 32 seconds. Almost allof the time goes into actual I/O. In the new plan, the corresponding node is #34. It took 55 seconds, but less than 1second was actual I/O (because most of the data was already in memory). Why did this step still take about twice the time? > Perhaps your 15 server is under more load than 9.4 due to all > concurrent plans being slower from bad statistics? Load averages might > be a good indicator. (I assume the server is busy due to the "Workers > Launched": 0) I think the extra time is due to useless overhead from trying and failing to parallelize: the leader has to do all the work, but there's probably overhead added anyway. 9.4 of course knew nothing of parallelism. My guess is that the OP is trying to run with a large number of backends and has not raised the max number of parallel workers to match. It does look like the stats might need updating (since 9.4's rowcount estimate is OK and 15's less so) but that is not why we see "Workers planned: 2, Workers launched: 0". Either provision enough parallel workers to fix that, or disable parallelism. regards, tom lane
Hi all, Thanks for all your tips! I have meanwhile run "ANALYZE" on all tables. Either that was as important as everybody tells me 😉 or it is just a coincidence(e.g., fewer other queries on the database). In any case, the performance has indeed improved considerably. Iwill observe the situation tomorrow in the morning, when there is typically more load on the system. I also noticed that, after analyzing, the sequential scan node shows fewer columns in the "output" section (https://explain.dalibo.com/plan/b8bfa5a3d2dc33bc#plan/node/18).I am not sure if this has an impact on the performance ofthis particular operation. I will also follow the suggestion and increase the number of parallel workers. The new machine has enough cores, so thisshould be possible. Best, Christian -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Tuesday, April 11, 2023 4:09 PM To: David Rowley <dgrowleyml@gmail.com> Cc: Christian Schröder <Christian.Schroeder@solvians.com>; pgsql-general@lists.postgresql.org Subject: Re: Performance issue after migration from 9.4 to 15 David Rowley <dgrowleyml@gmail.com> writes: > On Tue, 11 Apr 2023 at 23:03, Christian Schröder > <Christian.Schroeder@solvians.com> wrote: >> Particularly interesting are the sequential scans. In the old plan, we have node #21, which took 32 seconds. Almost allof the time goes into actual I/O. In the new plan, the corresponding node is #34. It took 55 seconds, but less than 1second was actual I/O (because most of the data was already in memory). Why did this step still take about twice the time? > Perhaps your 15 server is under more load than 9.4 due to all > concurrent plans being slower from bad statistics? Load averages might > be a good indicator. (I assume the server is busy due to the "Workers > Launched": 0) I think the extra time is due to useless overhead from trying and failing to parallelize: the leader has to do all the work,but there's probably overhead added anyway. 9.4 of course knew nothing of parallelism. My guess is that the OP is trying to run with a large number of backends and has not raised the max number of parallel workersto match. It does look like the stats might need updating (since 9.4's rowcount estimate is OK and 15's less so)but that is not why we see "Workers planned: 2, Workers launched: 0". Either provision enough parallel workers to fix that, or disable parallelism. regards, tom lane
On 4/11/23 07:41, Christian Schröder wrote: > Hi all, > Thanks for all your tips! > I have meanwhile run "ANALYZE" on all tables. Either that was as important as everybody tells me 😉 or it is just a coincidence(e.g., fewer other queries on the database). It is: https://www.postgresql.org/docs/current/pgupgrade.html Statistics Because optimizer statistics are not transferred by pg_upgrade, you will be instructed to run a command to regenerate that information at the end of the upgrade. You might need to set connection parameters to match your new cluster. -- Adrian Klaver adrian.klaver@aklaver.com