Обсуждение: unexpected result for wastedbytes query after vacuum full
Hi all,
This question is somewhat related to my previous question:
https://www.postgresql.org/message-id/0871fcf35ceb4caa8a2204ca9c38e330%40USEPRDEX1.corp.kns.com
I was attempting to measure the benefit of doing a VACUUM FULL on my database. I was using the query found here:
https://wiki.postgresql.org/wiki/Show_database_bloat
However, I got an unexpected result in that the “wastedbytes” value actually increased for some tables after doing the vacuum.
Before VACUUM FULL:
current_database | schemaname | tablename | tbloat | wastedbytes | iname | ibloat | wastedibytes
------------------+----------------+---------------------------+--------+-------------+-----------------------------------------------------------------+--------+--------------
postgres | myschema | mytableA | 1.1 | 74440704 | myindex1 | 0.2 | 0
postgres | myschema | mytableA | 1.1 | 74440704 | myindex2 | 0.2 | 0
postgres | myschema | mytableA | 1.1 | 74440704 | myindex3 | 0.2 | 0
postgres | myschema | mytableA | 1.1 | 74440704 | myindex4 | 0.2 | 0
postgres | myschema | mytableB | 1.0 | 63324160 | myindex5 | 0.0 | 0
...
After VACUUM FULL:
current_database | schemaname | tablename | tbloat | wastedbytes | iname | ibloat | wastedibytes
------------------+----------------+---------------------------+--------+-------------+-----------------------------------------------------------------+--------+--------------
postgres | myschema | mytableA | 1.1 | 74506240 | myindex4 | 0.2 | 0
postgres | myschema | mytableA | 1.1 | 74506240 | myindex3 | 0.2 | 0
postgres | myschema | mytableA | 1.1 | 74506240 | myindex2 | 0.2 | 0
postgres | myschema | mytableA | 1.1 | 74506240 | myindex1 | 0.2 | 0
postgres | myschema | mytableB | 1.0 | 63332352 | myindex5 | 0.0 | 0
...
This is the schema for mytableA above:
Column | Type | Modifiers
---------------+-----------------------------+----------------------------------------------------------------
colA | integer | not null default nextval('myschema.myseq'::regclass)
colB | integer |
colC | integer |
colD | timestamp without time zone |
colE | json |
colF | integer |
colG | integer |
I was wondering if the fact that we use a json column could be interfering with the wastedbytes calculation. Can anyone explain how wastedbytes could increase from a vacuum?
Thanks,
Mike
This email is non-binding, is subject to contract, and neither Kulicke and Soffa Industries, Inc. nor its subsidiaries (each and collectively “K&S”) shall have any obligation to you to consummate the transactions herein or to enter into any agreement, other than in accordance with the terms and conditions of a definitive agreement if and when negotiated, finalized and executed between the parties. This email and all its contents are protected by International and United States copyright laws. Any reproduction or use of all or any part of this email without the express written consent of K&S is prohibited.
On Fri, Dec 06, 2019 at 05:18:20PM +0000, Mike Schanne wrote: > Hi all, > > This question is somewhat related to my previous question: > https://www.postgresql.org/message-id/0871fcf35ceb4caa8a2204ca9c38e330%40USEPRDEX1.corp.kns.com > > I was attempting to measure the benefit of doing a VACUUM FULL on my database. I was using the query found here: > https://wiki.postgresql.org/wiki/Show_database_bloat > > However, I got an unexpected result in that the "wastedbytes" value actually increased for some tables after doing thevacuum. > I was wondering if the fact that we use a json column could be interfering with the wastedbytes calculation. Can anyoneexplain how wastedbytes could increase from a vacuum? Is it due to dropped columns, like Tom explained here ? https://www.postgresql.org/message-id/18375.1520723971%40sss.pgh.pa.us
Yes, the additional bitmap could certainly explain the increase. Thanks, Mike -----Original Message----- From: Justin Pryzby [mailto:pryzby@telsasoft.com] Sent: Friday, December 06, 2019 6:29 PM To: Mike Schanne Cc: pgsql-performance@postgresql.org Subject: Re: unexpected result for wastedbytes query after vacuum full On Fri, Dec 06, 2019 at 05:18:20PM +0000, Mike Schanne wrote: > Hi all, > > This question is somewhat related to my previous question: > https://www.postgresql.org/message-id/0871fcf35ceb4caa8a2204ca9c38e330%40USEPRDEX1.corp.kns.com > > I was attempting to measure the benefit of doing a VACUUM FULL on my database. I was using the query found here: > https://wiki.postgresql.org/wiki/Show_database_bloat > > However, I got an unexpected result in that the "wastedbytes" value actually increased for some tables after doing thevacuum. > I was wondering if the fact that we use a json column could be interfering with the wastedbytes calculation. Can anyoneexplain how wastedbytes could increase from a vacuum? Is it due to dropped columns, like Tom explained here ? https://www.postgresql.org/message-id/18375.1520723971%40sss.pgh.pa.us ________________________________ This email is non-binding, is subject to contract, and neither Kulicke and Soffa Industries, Inc. nor its subsidiaries (eachand collectively “K&S”) shall have any obligation to you to consummate the transactions herein or to enter into anyagreement, other than in accordance with the terms and conditions of a definitive agreement if and when negotiated, finalizedand executed between the parties. This email and all its contents are protected by International and United Statescopyright laws. Any reproduction or use of all or any part of this email without the express written consent of K&Sis prohibited.
Hi all,
This question is somewhat related to my previous question:
https://www.postgresql.org/message-id/0871fcf35ceb4caa8a2204ca9c38e330%40USEPRDEX1.corp.kns.com
I was attempting to measure the benefit of doing a VACUUM FULL on my database. I was using the query found here:
https://wiki.postgresql.org/wiki/Show_database_bloat
However, I got an unexpected result in that the “wastedbytes” value actually increased for some tables after doing the vacuum.
Before VACUUM FULL:
current_database | schemaname | tablename | tbloat | wastedbytes | iname | ibloat | wastedibytes
------------------+----------------+---------------------------+--------+-------------+-----------------------------------------------------------------+--------+--------------
postgres | myschema | mytableA | 1.1 | 74440704 | myindex1 | 0.2 | 0
postgres | myschema | mytableA | 1.1 | 74440704 | myindex2 | 0.2 | 0
postgres | myschema | mytableA | 1.1 | 74440704 | myindex3 | 0.2 | 0
postgres | myschema | mytableA | 1.1 | 74440704 | myindex4 | 0.2 | 0
postgres | myschema | mytableB | 1.0 | 63324160 | myindex5 | 0.0 | 0
...
After VACUUM FULL:
current_database | schemaname | tablename | tbloat | wastedbytes | iname | ibloat | wastedibytes
------------------+----------------+---------------------------+--------+-------------+-----------------------------------------------------------------+--------+--------------
postgres | myschema | mytableA | 1.1 | 74506240 | myindex4 | 0.2 | 0
postgres | myschema | mytableA | 1.1 | 74506240 | myindex3 | 0.2 | 0
postgres | myschema | mytableA | 1.1 | 74506240 | myindex2 | 0.2 | 0
postgres | myschema | mytableA | 1.1 | 74506240 | myindex1 | 0.2 | 0
postgres | myschema | mytableB | 1.0 | 63332352 | myindex5 | 0.0 | 0
...
This is the schema for mytableA above:
Column | Type | Modifiers
---------------+-----------------------------+----------------------------------------------------------------
colA | integer | not null default nextval('myschema.myseq'::regclass)
colB | integer |
colC | integer |
colD | timestamp without time zone |
colE | json |
colF | integer |
colG | integer |
I was wondering if the fact that we use a json column could be interfering with the wastedbytes calculation. Can anyone explain how wastedbytes could increase from a vacuum?
--
This query uses the column statistics to estimate bloat. AFAIK, json columns don't have statistics, so the estimation can't be relied on (for this specific table at least).
On Tue, Dec 10, 2019 at 11:43 AM Guillaume Lelarge <guillaume@lelarge.info> wrote:This query uses the column statistics to estimate bloat. AFAIK, json columns don't have statistics, so the estimation can't be relied on (for this specific table at least).This was true prior to 9.5 (for xml at least, I don't know about json), but should not be true from that release onward. But still the difference between 74440704 and 74506240, this does seem to me to be straining at a gnat to swallow a camel.