Обсуждение: pg_upgrade —link does it remove table bloat
On 2/13/20 4:31 AM, Jason Ralph wrote: > When using the pg_upgrade link method to upgrade Postgres a major > version. Let’s say 9.3 to 11.6 on Centos Linux. Will table bloat carry > over to the new version. I know using —link will use hard link pointers > to the new data. So I assume all table bloat will carry over to the new > version. I also know that pg_upgrade will reset statistics, so does the > table remain bloated but statistics show otherwise? Can Someone please > help me answer this? Or link where it’s outlined in the manual. Thanks > as always. Well table bloat and table statistics are two different things. Bloat is the accumulation of dead or potentially dead tuples whose space has not been marked as available for reuse by VACUUM or whose space has been returned to the OS with VACUUM FULL. For more information see: https://www.postgresql.org/docs/12/routine-vacuuming.html I would think it would not matter if the files where copied or linked if the space was being held open as result of regular VACUUM. Statistics are just that statistics collected about the distribution of values in the table. For more information see: https://www.postgresql.org/docs/12/monitoring-stats.html https://www.postgresql.org/docs/12/planner-stats-details.html They are collected as part of the autovacuum process or by running ANALYZE by itself or with VACUUM. FYI, pg_upgrade does not automatically update the statistics, it just writes a script that you can then run manually to do that: https://www.postgresql.org/docs/12/pgupgrade.html 14.Statistics Because optimizer statistics are not transferred by pg_upgrade, you will be instructed to run a command to regenerate that information at the end of the upgrade. You might need to set connection parameters to match your new cluster. > > Jason Ralph -- Adrian Klaver adrian.klaver@aklaver.com
>Well table bloat and table statistics are two different things. Bloat is the accumulation of dead or potentially dead tupleswhose space has not been marked > as available for reuse by VACUUM or whose space has been returned to the OS withVACUUM FULL. For more information see: Thanks for the helpful response @Adrian Klaver, Let me try to rephrase my question, If a table has bloat before the upgrade, autvacuum was not aggressive enough, once pg_upgrade is complete, the same tablewill contain the same amount of bloat(dead tuples)? Meaning its not the same as pg_dump / pg_restore since it’s a hardlink to the previous data location. Pg_upgrade with link will not recreate the table. Jason Ralph -----Original Message----- From: Adrian Klaver <adrian.klaver@aklaver.com> Sent: Thursday, February 13, 2020 10:46 AM To: Jason Ralph <jralph@affinitysolutions.com>; pgsql-general@lists.postgresql.org Subject: Re: pg_upgrade —link does it remove table bloat On 2/13/20 4:31 AM, Jason Ralph wrote: > When using the pg_upgrade link method to upgrade Postgres a major > version. Let’s say 9.3 to 11.6 on Centos Linux. Will table bloat carry > over to the new version. I know using —link will use hard link > pointers to the new data. So I assume all table bloat will carry over > to the new version. I also know that pg_upgrade will reset statistics, > so does the table remain bloated but statistics show otherwise? Can > Someone please help me answer this? Or link where it’s outlined in the > manual. Thanks as always. Well table bloat and table statistics are two different things. Bloat is the accumulation of dead or potentially dead tupleswhose space has not been marked as available for reuse by VACUUM or whose space has been returned to the OS with VACUUMFULL. For more information see: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fdocs%2F12%2Froutine-vacuuming.html&data=01%7C01%7Cjralph%40affinitysolutions.com%7Cf3d88565688a48b9aa2908d7b09bd2b5%7Cfbf1a257f1104fc19456b0ca3038c6f0%7C1&sdata=sjxZiW0rGprUUjZUESwMH4ZHpMjYZ%2BB65t1mbqS0ah4%3D&reserved=0 I would think it would not matter if the files where copied or linked if the space was being held open as result of regularVACUUM. Statistics are just that statistics collected about the distribution of values in the table. For more information see: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fdocs%2F12%2Fmonitoring-stats.html&data=01%7C01%7Cjralph%40affinitysolutions.com%7Cf3d88565688a48b9aa2908d7b09bd2b5%7Cfbf1a257f1104fc19456b0ca3038c6f0%7C1&sdata=lJmE9TwHc%2FKbcjJLWOMWlO%2FDGrNSv6SovQoRoJrKbyA%3D&reserved=0 https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fdocs%2F12%2Fplanner-stats-details.html&data=01%7C01%7Cjralph%40affinitysolutions.com%7Cf3d88565688a48b9aa2908d7b09bd2b5%7Cfbf1a257f1104fc19456b0ca3038c6f0%7C1&sdata=mYiAy2XOBfe%2FKSec3OK%2FjniVga9jNNFkjyKDPJBzjVE%3D&reserved=0 They are collected as part of the autovacuum process or by running ANALYZE by itself or with VACUUM. FYI, pg_upgrade doesnot automatically update the statistics, it just writes a script that you can then run manually to do that: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fdocs%2F12%2Fpgupgrade.html&data=01%7C01%7Cjralph%40affinitysolutions.com%7Cf3d88565688a48b9aa2908d7b09bd2b5%7Cfbf1a257f1104fc19456b0ca3038c6f0%7C1&sdata=ZDGMLC6QYa%2FCDDc0p7PDukm9Hwr8NYiKBC7DUQ6tI3o%3D&reserved=0 14.Statistics Because optimizer statistics are not transferred by pg_upgrade, you will be instructed to run a command to regenerate thatinformation at the end of the upgrade. You might need to set connection parameters to match your new cluster. > > Jason Ralph -- Adrian Klaver adrian.klaver@aklaver.com This message contains confidential information and is intended only for the individual named. If you are not the named addresseeyou should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if youhave received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteedto be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete,or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contentsof this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copyversion.
On 2/13/20 10:04 AM, Jason Ralph wrote: >> Well table bloat and table statistics are two different things. Bloat is the accumulation of dead or potentially deadtuples whose space has not been marked > as available for reuse by VACUUM or whose space has been returned to the OSwith VACUUM FULL. For more information see: > > Thanks for the helpful response @Adrian Klaver, > Let me try to rephrase my question, > If a table has bloat before the upgrade, autvacuum was not aggressive enough, once pg_upgrade is complete, the same tablewill contain the same amount of bloat(dead tuples)? Meaning its not the same as pg_dump / pg_restore since it’s a hardlink to the previous data location. Pg_upgrade with link will not recreate the table. Yes pg_upgrade is a transfer of the binary data(sort of) whereas pg_dump/restore is a logical transfer. So pg_upgrade via copy/clone/link will not recreate user tables. For a good overview see IMPLEMENTATION file in src: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/bin/pg_upgrade/IMPLEMENTATION;h=69fcd70a7c526d9f27b43a49f71e9b591a6b889f;hb=HEAD7 > > Jason Ralph -- Adrian Klaver adrian.klaver@aklaver.com
>There is more than one type of statistics though. Stats on the distribution of data is easily recreated with analyze table_name or analyzing the whole >database. What about the stats on how many rows have been inserted or updated since the last (auto)vacuum and that will be used to trigger autovacuum? >Are those set back to zero by an upgrade? I would assume usage counts like how many times an index scan has been done would be reset, but if the >numbers in pg_stat_user_tables like n_tup_upd or n_tup_del are zero'd out during an upgrade, than it would seem like a manual vacuum would always be a >good idea to ensure a table wasn't 99% of the way to needing one and then the stats got reset by upgrading.
I agree @Michael Lewis, thank you for this comment.
I am thinking a vacuum full is what I am going to need. Or pg_dump / pg_restore. I have tuned auto vacuum after the upgrade to be aggressive, it finishes fine after a couple hours on a large table, statistics look good on the pg_stat_user_tables. However, when I run the bloat check from the wiki https://wiki.postgresql.org/wiki/Show_database_bloat it still shows bloat. Thinking it may be left over from before the pg_upgrade and auto vacuum tuning.
Best,
Jason Ralph
From: Michael Lewis <mlewis@entrata.com>
Sent: Thursday, February 13, 2020 1:02 PM
To: Adrian Klaver <adrian.klaver@aklaver.com>
Cc: Jason Ralph <jralph@affinitysolutions.com>; pgsql-general@lists.postgresql.org
Subject: Re: pg_upgrade —link does it remove table bloat
There is more than one type of statistics though. Stats on the distribution of data is easily recreated with analyze table_name or analyzing the whole database. What about the stats on how many rows have been inserted or updated since the last (auto)vacuum and that will be used to trigger autovacuum? Are those set back to zero by an upgrade? I would assume usage counts like how many times an index scan has been done would be reset, but if the numbers in pg_stat_user_tables like n_tup_upd or n_tup_del are zero'd out during an upgrade, than it would seem like a manual vacuum would always be a good idea to ensure a table wasn't 99% of the way to needing one and then the stats got reset by upgrading.
On 2/13/20 11:07 AM, Jason Ralph wrote: > >There is more than one type of statistics though. Stats on the > distribution of data is easily recreated with analyze table_name or > analyzing the whole >database. What about the stats on how many rows > have been inserted or updated since the last (auto)vacuum and that will > be used to trigger autovacuum? >Are those set back to zero by an > upgrade? I would assume usage counts like how many times an index scan > has been done would be reset, but if the >numbers in pg_stat_user_tables > like n_tup_upd or n_tup_del are zero'd out during an upgrade, than it > would seem like a manual vacuum would always be a >good idea to ensure a > table wasn't 99% of the way to needing one and then the stats got reset > by upgrading. > > I agree @Michael Lewis <mailto:mlewis@entrata.com>, thank you for this > comment. > > I am thinking a vacuum full is what I am going to need. Or pg_dump / > pg_restore. I have tuned auto vacuum after the upgrade to be aggressive, > it finishes fine after a couple hours on a large table, statistics look > good on the pg_stat_user_tables. However, when I run the bloat check > from the wiki https://wiki.postgresql.org/wiki/Show_database_bloat it > still shows bloat. Thinking it may be left over from before the > pg_upgrade and auto vacuum tuning. What values are you getting? The script you are using comes from this: https://bucardo.org/check_postgres/check_postgres.pl.html#bloat "If you want to output the bloat ratio instead (how many times larger the relation is compared to how large it should be),..." So I'm pretty sure bloat is where tbloat > 1.0. > > Best, > > *Jason Ralph* > > *From:* Michael Lewis <mlewis@entrata.com> > *Sent:* Thursday, February 13, 2020 1:02 PM > *To:* Adrian Klaver <adrian.klaver@aklaver.com> > *Cc:* Jason Ralph <jralph@affinitysolutions.com>; > pgsql-general@lists.postgresql.org > *Subject:* Re: pg_upgrade —link does it remove table bloat > > There is more than one type of statistics though. Stats on the > distribution of data is easily recreated with analyze table_name or > analyzing the whole database. What about the stats on how many rows have > been inserted or updated since the last (auto)vacuum and that will be > used to trigger autovacuum? Are those set back to zero by an upgrade? I > would assume usage counts like how many times an index scan has been > done would be reset, but if the numbers in pg_stat_user_tables like > n_tup_upd or n_tup_del are zero'd out during an upgrade, than it would > seem like a manual vacuum would always be a good idea to ensure a table > wasn't 99% of the way to needing one and then the stats got reset by > upgrading. > > This message contains confidential information and is intended only for > the individual named. If you are not the named addressee you should not > disseminate, distribute or copy this e-mail. Please notify the sender > immediately by e-mail if you have received this e-mail by mistake and > delete this e-mail from your system. E-mail transmission cannot be > guaranteed to be secure or error-free as information could be > intercepted, corrupted, lost, destroyed, arrive late or incomplete, or > contain viruses. The sender therefore does not accept liability for any > errors or omissions in the contents of this message, which arise as a > result of e-mail transmission. If verification is required please > request a hard-copy version. -- Adrian Klaver adrian.klaver@aklaver.com
@Adrian Klaver, I was concerned with the 1.4 value of tbloat and wastedbytes value, then again the last autovacuum was at 2020-02-13 02:25:22.533372-05and I took this snapshot at 3:44PMEST. So it may be ok, what do you think? current_database | schemaname | tablename | tbloat | wastedbytes | iname | ibloat | wastedibytes notimportant | public | members | 1.4 | 3080314880 | members_cobrid | 0.2 | 0 notimportant=# select * from pg_stat_user_tables where relname = 'members'; -[ RECORD 1 ]-------+------------------------------ relid | 2045245 schemaname | public relname | members seq_scan | 55065 seq_tup_read | 201069350222 idx_scan | 5349501175 idx_tup_fetch | 7201402647 n_tup_ins | 910616 n_tup_upd | 46730942 n_tup_del | 1 n_tup_hot_upd | 41845682 n_live_tup | 18262438 n_dead_tup | 14740 n_mod_since_analyze | 2476 last_vacuum | 2019-10-13 01:01:40.587534-04 last_autovacuum | 2020-02-13 02:25:22.533372-05 last_analyze | 2019-10-13 01:01:41.916929-04 last_autoanalyze | 2020-02-13 13:44:46.273096-05 vacuum_count | 15 autovacuum_count | 92 analyze_count | 15 autoanalyze_count | 243 Jason Ralph -----Original Message----- From: Adrian Klaver <adrian.klaver@aklaver.com> Sent: Thursday, February 13, 2020 3:19 PM To: Jason Ralph <jralph@affinitysolutions.com>; Michael Lewis <mlewis@entrata.com> Cc: pgsql-general@lists.postgresql.org Subject: Re: pg_upgrade —link does it remove table bloat On 2/13/20 11:07 AM, Jason Ralph wrote: > >There is more than one type of statistics though. Stats on the > distribution of data is easily recreated with analyze table_name or > analyzing the whole >database. What about the stats on how many rows > have been inserted or updated since the last (auto)vacuum and that > will be used to trigger autovacuum? >Are those set back to zero by an > upgrade? I would assume usage counts like how many times an index scan > has been done would be reset, but if the >numbers in > pg_stat_user_tables like n_tup_upd or n_tup_del are zero'd out during > an upgrade, than it would seem like a manual vacuum would always be a > >good idea to ensure a table wasn't 99% of the way to needing one and > then the stats got reset by upgrading. > > I agree @Michael Lewis <mailto:mlewis@entrata.com>, thank you for this > comment. > > I am thinking a vacuum full is what I am going to need. Or pg_dump / > pg_restore. I have tuned auto vacuum after the upgrade to be > aggressive, it finishes fine after a couple hours on a large table, > statistics look good on the pg_stat_user_tables. However, when I run > the bloat check from the wiki > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki > .postgresql.org%2Fwiki%2FShow_database_bloat&data=01%7C01%7Cjralph%40affinitysolutions.com%7C6ead29e3b0bf4238f50e08d7b0c1fd0b%7Cfbf1a257f1104fc19456b0ca3038c6f0%7C1&sdata=d8kSilB5eqJGWujd7tzlX5xaRhm5Z335G34MAO6%2BHOY%3D&reserved=0 itstill shows bloat. Thinking it may be left over from before the pg_upgrade and auto vacuum tuning. What values are you getting? The script you are using comes from this: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbucardo.org%2Fcheck_postgres%2Fcheck_postgres.pl.html%23bloat&data=01%7C01%7Cjralph%40affinitysolutions.com%7C6ead29e3b0bf4238f50e08d7b0c1fd0b%7Cfbf1a257f1104fc19456b0ca3038c6f0%7C1&sdata=XugOdnmx%2BcElhEKTPqL30cjIKEDUyYHl8WvD75r82G8%3D&reserved=0 "If you want to output the bloat ratio instead (how many times larger the relation is compared to how large it should be),..." So I'm pretty sure bloat is where tbloat > 1.0. > > Best, > > *Jason Ralph* > > *From:* Michael Lewis <mlewis@entrata.com> > *Sent:* Thursday, February 13, 2020 1:02 PM > *To:* Adrian Klaver <adrian.klaver@aklaver.com> > *Cc:* Jason Ralph <jralph@affinitysolutions.com>; > pgsql-general@lists.postgresql.org > *Subject:* Re: pg_upgrade —link does it remove table bloat > > There is more than one type of statistics though. Stats on the > distribution of data is easily recreated with analyze table_name or > analyzing the whole database. What about the stats on how many rows > have been inserted or updated since the last (auto)vacuum and that > will be used to trigger autovacuum? Are those set back to zero by an > upgrade? I would assume usage counts like how many times an index scan > has been done would be reset, but if the numbers in > pg_stat_user_tables like n_tup_upd or n_tup_del are zero'd out during > an upgrade, than it would seem like a manual vacuum would always be a > good idea to ensure a table wasn't 99% of the way to needing one and > then the stats got reset by upgrading. > > This message contains confidential information and is intended only > for the individual named. If you are not the named addressee you > should not disseminate, distribute or copy this e-mail. Please notify > the sender immediately by e-mail if you have received this e-mail by > mistake and delete this e-mail from your system. E-mail transmission > cannot be guaranteed to be secure or error-free as information could > be intercepted, corrupted, lost, destroyed, arrive late or incomplete, > or contain viruses. The sender therefore does not accept liability for > any errors or omissions in the contents of this message, which arise > as a result of e-mail transmission. If verification is required please > request a hard-copy version. -- Adrian Klaver adrian.klaver@aklaver.com This message contains confidential information and is intended only for the individual named. If you are not the named addresseeyou should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if youhave received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteedto be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete,or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contentsof this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copyversion.
On 2/13/20 12:45 PM, Jason Ralph wrote: > @Adrian Klaver, > I was concerned with the 1.4 value of tbloat and wastedbytes value, then again the last autovacuum was at 2020-02-13 02:25:22.533372-05and I took this snapshot at 3:44PMEST. So it may be ok, what do you think? What is your concern, storage space or performance of queries? If performance then an EXPLAIN ANALYZE on a query will help show whether there is an issue or not. > > > current_database | schemaname | tablename | tbloat | wastedbytes | iname | ibloat | wastedibytes > notimportant | public | members | 1.4 | 3080314880 | members_cobrid | 0.2 | 0 > > > > > > notimportant=# select * from pg_stat_user_tables where relname = 'members'; > -[ RECORD 1 ]-------+------------------------------ > relid | 2045245 > schemaname | public > relname | members > seq_scan | 55065 > seq_tup_read | 201069350222 > idx_scan | 5349501175 > idx_tup_fetch | 7201402647 > n_tup_ins | 910616 > n_tup_upd | 46730942 > n_tup_del | 1 > n_tup_hot_upd | 41845682 > n_live_tup | 18262438 > n_dead_tup | 14740 > n_mod_since_analyze | 2476 > last_vacuum | 2019-10-13 01:01:40.587534-04 > last_autovacuum | 2020-02-13 02:25:22.533372-05 > last_analyze | 2019-10-13 01:01:41.916929-04 > last_autoanalyze | 2020-02-13 13:44:46.273096-05 > vacuum_count | 15 > autovacuum_count | 92 > analyze_count | 15 > autoanalyze_count | 243 > > Jason Ralph > -- Adrian Klaver adrian.klaver@aklaver.com
>What is your concern, storage space or performance of queries? >If performance then an EXPLAIN ANALYZE on a query will help show whether there is an issue or not. @Adrian Klaver I guess I am after both, I would like to squeeze the most performance out of my tables and installation aspossible. Also I would like to save space wherever possible. I have received a *bloat* load of information in this thread,so thanks. 😊 @Michael Lewis thanks for the idea of pg_repack, this looks awesome and I cant wait to test it. Jason Ralph -----Original Message----- From: Adrian Klaver <adrian.klaver@aklaver.com> Sent: Thursday, February 13, 2020 6:21 PM To: Jason Ralph <jralph@affinitysolutions.com>; Michael Lewis <mlewis@entrata.com> Cc: pgsql-general@lists.postgresql.org Subject: Re: pg_upgrade —link does it remove table bloat On 2/13/20 12:45 PM, Jason Ralph wrote: > @Adrian Klaver, > I was concerned with the 1.4 value of tbloat and wastedbytes value, then again the last autovacuum was at 2020-02-13 02:25:22.533372-05and I took this snapshot at 3:44PMEST. So it may be ok, what do you think? What is your concern, storage space or performance of queries? If performance then an EXPLAIN ANALYZE on a query will help show whether there is an issue or not. > > > current_database | schemaname | tablename | tbloat | wastedbytes | iname | ibloat | wastedibytes > notimportant | public | members | 1.4 | 3080314880 | members_cobrid | 0.2 | 0 > > > > > > notimportant=# select * from pg_stat_user_tables where relname = > 'members'; -[ RECORD 1 ]-------+------------------------------ > relid | 2045245 > schemaname | public > relname | members > seq_scan | 55065 > seq_tup_read | 201069350222 > idx_scan | 5349501175 > idx_tup_fetch | 7201402647 > n_tup_ins | 910616 > n_tup_upd | 46730942 > n_tup_del | 1 > n_tup_hot_upd | 41845682 > n_live_tup | 18262438 > n_dead_tup | 14740 > n_mod_since_analyze | 2476 > last_vacuum | 2019-10-13 01:01:40.587534-04 > last_autovacuum | 2020-02-13 02:25:22.533372-05 > last_analyze | 2019-10-13 01:01:41.916929-04 > last_autoanalyze | 2020-02-13 13:44:46.273096-05 > vacuum_count | 15 > autovacuum_count | 92 > analyze_count | 15 > autoanalyze_count | 243 > > Jason Ralph > -- Adrian Klaver adrian.klaver@aklaver.com This message contains confidential information and is intended only for the individual named. If you are not the named addresseeyou should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if youhave received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteedto be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete,or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contentsof this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copyversion.