Обсуждение: unexpected result for wastedbytes query after vacuum full

Поиск
Список
Период
Сортировка

unexpected result for wastedbytes query after vacuum full

От
Mike Schanne
Дата:

Hi all,

 

This question is somewhat related to my previous question:

https://www.postgresql.org/message-id/0871fcf35ceb4caa8a2204ca9c38e330%40USEPRDEX1.corp.kns.com

 

I was attempting to measure the benefit of doing a VACUUM FULL on my database.  I was using the query found here:

 

https://wiki.postgresql.org/wiki/Show_database_bloat

 

However, I got an unexpected result in that the “wastedbytes” value actually increased for some tables after doing the vacuum. 

 

Before VACUUM FULL:

current_database |   schemaname   |         tablename         | tbloat | wastedbytes |                              iname                              | ibloat | wastedibytes

------------------+----------------+---------------------------+--------+-------------+-----------------------------------------------------------------+--------+--------------

postgres         | myschema | mytableA                      |    1.1 |    74440704 | myindex1                                                         |    0.2 |            0

postgres         | myschema | mytableA                      |    1.1 |    74440704 | myindex2                                                         |    0.2 |            0

postgres         | myschema | mytableA                      |    1.1 |    74440704 | myindex3                                                         |    0.2 |            0

postgres         | myschema | mytableA                      |    1.1 |    74440704 | myindex4                                                         |    0.2 |            0

postgres         | myschema | mytableB                      |    1.0 |    63324160 | myindex5                                                         |    0.0 |            0

...

After VACUUM FULL:

  current_database |   schemaname   |         tablename         | tbloat | wastedbytes |                              iname                             | ibloat | wastedibytes

------------------+----------------+---------------------------+--------+-------------+-----------------------------------------------------------------+--------+--------------

postgres         | myschema | mytableA                      |    1.1 |    74506240 | myindex4                                                          |    0.2 |            0

postgres         | myschema | mytableA                      |    1.1 |    74506240 | myindex3                                                          |    0.2 |            0

postgres         | myschema | mytableA                      |    1.1 |    74506240 | myindex2                                                          |    0.2 |            0

postgres         | myschema | mytableA                      |    1.1 |    74506240 | myindex1                                                          |    0.2 |            0

postgres         | myschema | mytableB                      |    1.0 |    63332352 | myindex5                                                          |    0.0 |            0

...

 

This is the schema for mytableA above:

 

    Column     |            Type             |                           Modifiers

---------------+-----------------------------+----------------------------------------------------------------

colA          | integer                     | not null default nextval('myschema.myseq'::regclass)

colB          | integer                     |

colC          | integer                     |

colD          | timestamp without time zone |

colE          | json                        |

colF          | integer                     |

colG          | integer                     |

 

I was wondering if the fact that we use a json column could be interfering with the wastedbytes calculation.  Can anyone explain how wastedbytes could increase from a vacuum?

 

Thanks,

Mike




This email is non-binding, is subject to contract, and neither Kulicke and Soffa Industries, Inc. nor its subsidiaries (each and collectively “K&S”) shall have any obligation to you to consummate the transactions herein or to enter into any agreement, other than in accordance with the terms and conditions of a definitive agreement if and when negotiated, finalized and executed between the parties. This email and all its contents are protected by International and United States copyright laws. Any reproduction or use of all or any part of this email without the express written consent of K&S is prohibited.

Re: unexpected result for wastedbytes query after vacuum full

От
Justin Pryzby
Дата:
On Fri, Dec 06, 2019 at 05:18:20PM +0000, Mike Schanne wrote:
> Hi all,
> 
> This question is somewhat related to my previous question:
> https://www.postgresql.org/message-id/0871fcf35ceb4caa8a2204ca9c38e330%40USEPRDEX1.corp.kns.com
> 
> I was attempting to measure the benefit of doing a VACUUM FULL on my database.  I was using the query found here:
> https://wiki.postgresql.org/wiki/Show_database_bloat
> 
> However, I got an unexpected result in that the "wastedbytes" value actually increased for some tables after doing
thevacuum.
 

> I was wondering if the fact that we use a json column could be interfering with the wastedbytes calculation.  Can
anyoneexplain how wastedbytes could increase from a vacuum?
 

Is it due to dropped columns, like Tom explained here ?
https://www.postgresql.org/message-id/18375.1520723971%40sss.pgh.pa.us



RE: unexpected result for wastedbytes query after vacuum full

От
Mike Schanne
Дата:
Yes, the additional bitmap could certainly explain the increase.

Thanks,
Mike

-----Original Message-----
From: Justin Pryzby [mailto:pryzby@telsasoft.com]
Sent: Friday, December 06, 2019 6:29 PM
To: Mike Schanne
Cc: pgsql-performance@postgresql.org
Subject: Re: unexpected result for wastedbytes query after vacuum full

On Fri, Dec 06, 2019 at 05:18:20PM +0000, Mike Schanne wrote:
> Hi all,
>
> This question is somewhat related to my previous question:
> https://www.postgresql.org/message-id/0871fcf35ceb4caa8a2204ca9c38e330%40USEPRDEX1.corp.kns.com
>
> I was attempting to measure the benefit of doing a VACUUM FULL on my database.  I was using the query found here:
> https://wiki.postgresql.org/wiki/Show_database_bloat
>
> However, I got an unexpected result in that the "wastedbytes" value actually increased for some tables after doing
thevacuum.
 

> I was wondering if the fact that we use a json column could be interfering with the wastedbytes calculation.  Can
anyoneexplain how wastedbytes could increase from a vacuum?
 

Is it due to dropped columns, like Tom explained here ?
https://www.postgresql.org/message-id/18375.1520723971%40sss.pgh.pa.us

________________________________

This email is non-binding, is subject to contract, and neither Kulicke and Soffa Industries, Inc. nor its subsidiaries
(eachand collectively “K&S”) shall have any obligation to you to consummate the transactions herein or to enter into
anyagreement, other than in accordance with the terms and conditions of a definitive agreement if and when negotiated,
finalizedand executed between the parties. This email and all its contents are protected by International and United
Statescopyright laws. Any reproduction or use of all or any part of this email without the express written consent of
K&Sis prohibited.
 

Re: unexpected result for wastedbytes query after vacuum full

От
Guillaume Lelarge
Дата:
Le ven. 6 déc. 2019 à 18:18, Mike Schanne <mschanne@kns.com> a écrit :

Hi all,

 

This question is somewhat related to my previous question:

https://www.postgresql.org/message-id/0871fcf35ceb4caa8a2204ca9c38e330%40USEPRDEX1.corp.kns.com

 

I was attempting to measure the benefit of doing a VACUUM FULL on my database.  I was using the query found here:

 

https://wiki.postgresql.org/wiki/Show_database_bloat

 

However, I got an unexpected result in that the “wastedbytes” value actually increased for some tables after doing the vacuum. 

 

Before VACUUM FULL:

current_database |   schemaname   |         tablename         | tbloat | wastedbytes |                              iname                              | ibloat | wastedibytes

------------------+----------------+---------------------------+--------+-------------+-----------------------------------------------------------------+--------+--------------

postgres         | myschema | mytableA                      |    1.1 |    74440704 | myindex1                                                         |    0.2 |            0

postgres         | myschema | mytableA                      |    1.1 |    74440704 | myindex2                                                         |    0.2 |            0

postgres         | myschema | mytableA                      |    1.1 |    74440704 | myindex3                                                         |    0.2 |            0

postgres         | myschema | mytableA                      |    1.1 |    74440704 | myindex4                                                         |    0.2 |            0

postgres         | myschema | mytableB                      |    1.0 |    63324160 | myindex5                                                         |    0.0 |            0

...

After VACUUM FULL:

  current_database |   schemaname   |         tablename         | tbloat | wastedbytes |                              iname                             | ibloat | wastedibytes

------------------+----------------+---------------------------+--------+-------------+-----------------------------------------------------------------+--------+--------------

postgres         | myschema | mytableA                      |    1.1 |    74506240 | myindex4                                                          |    0.2 |            0

postgres         | myschema | mytableA                      |    1.1 |    74506240 | myindex3                                                          |    0.2 |            0

postgres         | myschema | mytableA                      |    1.1 |    74506240 | myindex2                                                          |    0.2 |            0

postgres         | myschema | mytableA                      |    1.1 |    74506240 | myindex1                                                          |    0.2 |            0

postgres         | myschema | mytableB                      |    1.0 |    63332352 | myindex5                                                          |    0.0 |            0

...

 

This is the schema for mytableA above:

 

    Column     |            Type             |                           Modifiers

---------------+-----------------------------+----------------------------------------------------------------

colA          | integer                     | not null default nextval('myschema.myseq'::regclass)

colB          | integer                     |

colC          | integer                     |

colD          | timestamp without time zone |

colE          | json                        |

colF          | integer                     |

colG          | integer                     |

 

I was wondering if the fact that we use a json column could be interfering with the wastedbytes calculation.  Can anyone explain how wastedbytes could increase from a vacuum?

 


This query uses the column statistics to estimate bloat. AFAIK, json columns don't have statistics, so the estimation can't be relied on (for this specific table at least).


--
Guillaume.

Re: unexpected result for wastedbytes query after vacuum full

От
Jeff Janes
Дата:
On Tue, Dec 10, 2019 at 11:43 AM Guillaume Lelarge <guillaume@lelarge.info> wrote:

This query uses the column statistics to estimate bloat. AFAIK, json columns don't have statistics, so the estimation can't be relied on (for this specific table at least).

This was true prior to 9.5 (for xml at least, I don't know about json), but should not be true from that release onward.  But still the difference between 74440704 and 74506240, this does seem to me to be straining at a gnat to swallow a camel.

Cheers,

Jeff

Re: unexpected result for wastedbytes query after vacuum full

От
Guillaume Lelarge
Дата:
Le mar. 10 déc. 2019 à 20:48, Jeff Janes <jeff.janes@gmail.com> a écrit :
On Tue, Dec 10, 2019 at 11:43 AM Guillaume Lelarge <guillaume@lelarge.info> wrote:

This query uses the column statistics to estimate bloat. AFAIK, json columns don't have statistics, so the estimation can't be relied on (for this specific table at least).

This was true prior to 9.5 (for xml at least, I don't know about json), but should not be true from that release onward.  But still the difference between 74440704 and 74506240, this does seem to me to be straining at a gnat to swallow a camel.


I just checked, and you're right. There are less statistics with json, but the important ones (null_frac and avg_width) are available for json and jsonb datatypes. So the query should work even for tables using these datatypes.

Thanks for the information, that's very interesting. And I apologize for the noise.


--
Guillaume.