Обсуждение: Is it ever necessary to vacuum a table that only gets inserts/updates?

Поиск
Список
Период
Сортировка

Is it ever necessary to vacuum a table that only gets inserts/updates?

От
Jason Buberel
Дата:
Just wondering if there is ever a reason to vacuum a very large table (> 1B rows) containing rows that never has rows deleted.

Under what circumstance would the table benefit from a vacuum?

-jason

--
Jason L. Buberel
CTO, Altos Research
http://www.altosresearch.com/
650.603.0907

Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

От
John R Pierce
Дата:
On 11/16/11 4:24 PM, Jason Buberel wrote:
> Just wondering if there is ever a reason to vacuum a very large table
> (> 1B rows) containing rows that never has rows deleted.
>
> Under what circumstance would the table benefit from a vacuum?

no updates either?

you still want to do a vacuum analyze every so often to update the
statistics used by the planner.



--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast


Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

От
Tom Lane
Дата:
John R Pierce <pierce@hogranch.com> writes:
> On 11/16/11 4:24 PM, Jason Buberel wrote:
>> Just wondering if there is ever a reason to vacuum a very large table
>> (> 1B rows) containing rows that never has rows deleted.

> no updates either?

To clarify: in Postgres, an "update" means an insert and a delete.
So unless you mean that this table is insert-only, you certainly
still need vacuum.

> you still want to do a vacuum analyze every so often to update the
> statistics used by the planner.

If it's purely an insert-only table, such as a logging table, then in
principle you only need periodic ANALYZEs and not any VACUUMs.

VACUUM could still be worthwhile though, because (a) it will set commit
hint bits on all pages and (b) it will set visibility-map bits on all
pages.  An ANALYZE would only do those things for the random sample of
pages that it visits.  While neither of those things are critical, they
do offload work from future queries that would otherwise have to do that
work in-line.  So if you've got a maintenance window where the database
isn't answering queries anyway, it could be worthwhile to run a VACUUM
just to get those bits set.

            regards, tom lane

Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

От
Jason Buberel
Дата:
Thank you Tom & John.

In this case, there are no updates/deleted - only inserts. For now, I have set per-table autovacuum rules in order to minimize the frequency of vacuums but to ensure the statistics are updated frequently with analyze:

Table auto-vacuum VACUUM base threshold    500000000   
Table auto-vacuum VACUUM scale factor    0.3   
Table auto-vacuum ANALYZE base threshold    50000   
Table auto-vacuum ANALYZE scale factor    0.02   
Table auto-vacuum VACUUM cost delay    20   
Table auto-vacuum VACUUM cost limit    200   
 
     


On Wed, Nov 16, 2011 at 9:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
John R Pierce <pierce@hogranch.com> writes:
> On 11/16/11 4:24 PM, Jason Buberel wrote:
>> Just wondering if there is ever a reason to vacuum a very large table
>> (> 1B rows) containing rows that never has rows deleted.

> no updates either?

To clarify: in Postgres, an "update" means an insert and a delete.
So unless you mean that this table is insert-only, you certainly
still need vacuum.

> you still want to do a vacuum analyze every so often to update the
> statistics used by the planner.

If it's purely an insert-only table, such as a logging table, then in
principle you only need periodic ANALYZEs and not any VACUUMs.

VACUUM could still be worthwhile though, because (a) it will set commit
hint bits on all pages and (b) it will set visibility-map bits on all
pages.  An ANALYZE would only do those things for the random sample of
pages that it visits.  While neither of those things are critical, they
do offload work from future queries that would otherwise have to do that
work in-line.  So if you've got a maintenance window where the database
isn't answering queries anyway, it could be worthwhile to run a VACUUM
just to get those bits set.

                       regards, tom lane



--
Jason L. Buberel
CTO, Altos Research
http://www.altosresearch.com/
650.603.0907

Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

От
Craig Ringer
Дата:


On Nov 17, 2011 1:32 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>
> John R Pierce <pierce@hogranch.com> writes:
> > On 11/16/11 4:24 PM, Jason Buberel wrote:
> >> Just wondering if there is ever a reason to vacuum a very large table
> >> (> 1B rows) containing rows that never has rows deleted.
>
> > no updates either?
>
> To clarify: in Postgres, an "update" means an insert and a delete.
> So unless you mean that this table is insert-only, you certainly
> still need vacuum.
>
> > you still want to do a vacuum analyze every so often to update the
> > statistics used by the planner.
>
> If it's purely an insert-only table, such as a logging table, then in
> principle you only need periodic ANALYZEs and not any VACUUMs.
>

Won't a VACUUM FREEZE (or autovac equivalent) be necessary eventually, to handle xid wraparound? If so, doing it pre-emptively might help avoid a giant I/O load and work pause when its forced.

Or am I just confused?

Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

От
Tom Lane
Дата:
Craig Ringer <ringerc@ringerc.id.au> writes:
> On Nov 17, 2011 1:32 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> If it's purely an insert-only table, such as a logging table, then in
>> principle you only need periodic ANALYZEs and not any VACUUMs.

> Won't a VACUUM FREEZE (or autovac equivalent) be necessary eventually, to
> handle xid wraparound?

Sure, but if he's continually adding new rows, I don't see much point in
launching extra freeze operations.

            regards, tom lane

Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

От
Gavin Flower
Дата:
On 18/11/11 04:59, Tom Lane wrote:
> Craig Ringer<ringerc@ringerc.id.au>  writes:
>> On Nov 17, 2011 1:32 PM, "Tom Lane"<tgl@sss.pgh.pa.us>  wrote:
>>> If it's purely an insert-only table, such as a logging table, then in
>>> principle you only need periodic ANALYZEs and not any VACUUMs.
>> Won't a VACUUM FREEZE (or autovac equivalent) be necessary eventually, to
>> handle xid wraparound?
> Sure, but if he's continually adding new rows, I don't see much point in
> launching extra freeze operations.
>
>             regards, tom lane
>
Just curious...

Will the pattern of inserts be at all relevant?

For example random inserts compared to apending records.  I thought that
random inserts would lead to bloat, as there would be lots of blocks far
from the optimum fill factor.


Regards,
Gavin

Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

От
Adam Cornett
Дата:

On Fri, Nov 18, 2011 at 2:56 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
On 18/11/11 04:59, Tom Lane wrote:
Craig Ringer<ringerc@ringerc.id.au>  writes:
On Nov 17, 2011 1:32 PM, "Tom Lane"<tgl@sss.pgh.pa.us>  wrote:
If it's purely an insert-only table, such as a logging table, then in
principle you only need periodic ANALYZEs and not any VACUUMs.
Won't a VACUUM FREEZE (or autovac equivalent) be necessary eventually, to
handle xid wraparound?
Sure, but if he's continually adding new rows, I don't see much point in
launching extra freeze operations.

                       regards, tom lane

Just curious...

Will the pattern of inserts be at all relevant?

For example random inserts compared to apending records.  I thought that random inserts would lead to bloat, as there would be lots of blocks far from the optimum fill factor.


Regards,
Gavin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

I might be wrong (I'm sure Tom will correct me if so), but Postgres does not store tuples in an ordered format on disk, they are on disk in the order they are inserted, unless the table is re-ordered by cluster, which only does a one time sort.

Table bloat (and the table fill factor) are usually associated with deletes and updates.  If you delete a row, or update it so that it takes up less room (by say removing a large text value) then postgres could use the now free space on that page to store a new tuple.

-Adam

Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

От
Gavin Flower
Дата:
On 19/11/11 11:32, Adam Cornett wrote:

On Fri, Nov 18, 2011 at 2:56 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
On 18/11/11 04:59, Tom Lane wrote:
Craig Ringer<ringerc@ringerc.id.au>  writes:
On Nov 17, 2011 1:32 PM, "Tom Lane"<tgl@sss.pgh.pa.us>  wrote:
If it's purely an insert-only table, such as a logging table, then in
principle you only need periodic ANALYZEs and not any VACUUMs.
Won't a VACUUM FREEZE (or autovac equivalent) be necessary eventually, to
handle xid wraparound?
Sure, but if he's continually adding new rows, I don't see much point in
launching extra freeze operations.

                       regards, tom lane

Just curious...

Will the pattern of inserts be at all relevant?

For example random inserts compared to apending records.  I thought that random inserts would lead to bloat, as there would be lots of blocks far from the optimum fill factor.


Regards,
Gavin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

I might be wrong (I'm sure Tom will correct me if so), but Postgres does not store tuples in an ordered format on disk, they are on disk in the order they are inserted, unless the table is re-ordered by cluster, which only does a one time sort.

Table bloat (and the table fill factor) are usually associated with deletes and updates.  If you delete a row, or update it so that it takes up less room (by say removing a large text value) then postgres could use the now free space on that page to store a new tuple.

-Adam
HI Adam,

I suspect that you are right - noiw I come to think of it- I think I got caught out by the ghost of VSAM creeping up on me )You seriously do NOT want to know about IBM's VSAM!).


Regards,
Gavin

Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

От
Scott Marlowe
Дата:
On Sat, Nov 19, 2011 at 12:53 PM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:
> On 19/11/11 11:32, Adam Cornett wrote:
>
> On Fri, Nov 18, 2011 at 2:56 PM, Gavin Flower
> <GavinFlower@archidevsys.co.nz> wrote:
>>
>> On 18/11/11 04:59, Tom Lane wrote:
>>>
>>> Craig Ringer<ringerc@ringerc.id.au>  writes:
>>>>
>>>> On Nov 17, 2011 1:32 PM, "Tom Lane"<tgl@sss.pgh.pa.us>  wrote:
>>>>>
>>>>> If it's purely an insert-only table, such as a logging table, then in
>>>>> principle you only need periodic ANALYZEs and not any VACUUMs.
>>>>
>>>> Won't a VACUUM FREEZE (or autovac equivalent) be necessary eventually,
>>>> to
>>>> handle xid wraparound?
>>>
>>> Sure, but if he's continually adding new rows, I don't see much point in
>>> launching extra freeze operations.
>>>
>>>                        regards, tom lane
>>>
>> Just curious...
>>
>> Will the pattern of inserts be at all relevant?
>>
>> For example random inserts compared to apending records.  I thought that
>> random inserts would lead to bloat, as there would be lots of blocks far
>> from the optimum fill factor.
>>
>>
>> Regards,
>> Gavin
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
> I might be wrong (I'm sure Tom will correct me if so), but Postgres does not
> store tuples in an ordered format on disk, they are on disk in the order
> they are inserted, unless the table is re-ordered by cluster, which only
> does a one time sort.
> Table bloat (and the table fill factor) are usually associated with deletes
> and updates.  If you delete a row, or update it so that it takes up less
> room (by say removing a large text value) then postgres could use the now
> free space on that page to store a new tuple.
> -Adam
>
> HI Adam,
>
> I suspect that you are right - noiw I come to think of it- I think I got
> caught out by the ghost of VSAM creeping up on me )You seriously do NOT want
> to know about IBM's VSAM!).

Careful, on a list with as many old timers as this one, you may be
sending that message to the guy who wrote the original implementation.
 :)  I only go as far back as Rexx and JCL and RBase 5000, but never
used VSAM. ISAM yes.

Re: Is it ever necessary to vacuum a table that only gets inserts/updates?

От
Gavin Flower
Дата:
On 20/11/11 11:57, Scott Marlowe wrote:
On Sat, Nov 19, 2011 at 12:53 PM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:
On 19/11/11 11:32, Adam Cornett wrote:

On Fri, Nov 18, 2011 at 2:56 PM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:
On 18/11/11 04:59, Tom Lane wrote:
Craig Ringer<ringerc@ringerc.id.au>  writes:
On Nov 17, 2011 1:32 PM, "Tom Lane"<tgl@sss.pgh.pa.us>  wrote:
If it's purely an insert-only table, such as a logging table, then in
principle you only need periodic ANALYZEs and not any VACUUMs.
Won't a VACUUM FREEZE (or autovac equivalent) be necessary eventually,
to
handle xid wraparound?
Sure, but if he's continually adding new rows, I don't see much point in
launching extra freeze operations.

                       regards, tom lane

Just curious...

Will the pattern of inserts be at all relevant?

For example random inserts compared to apending records.  I thought that
random inserts would lead to bloat, as there would be lots of blocks far
from the optimum fill factor.


Regards,
Gavin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I might be wrong (I'm sure Tom will correct me if so), but Postgres does not
store tuples in an ordered format on disk, they are on disk in the order
they are inserted, unless the table is re-ordered by cluster, which only
does a one time sort.
Table bloat (and the table fill factor) are usually associated with deletes
and updates.  If you delete a row, or update it so that it takes up less
room (by say removing a large text value) then postgres could use the now
free space on that page to store a new tuple.
-Adam

HI Adam,

I suspect that you are right - noiw I come to think of it- I think I got
caught out by the ghost of VSAM creeping up on me )You seriously do NOT want
to know about IBM's VSAM!).
Careful, on a list with as many old timers as this one, you may be
sending that message to the guy who wrote the original implementation.:)  I only go as far back as Rexx and JCL and RBase 5000, but never
used VSAM. ISAM yes.
Brings back memories... AAAARRRGGGGHHHHHHH!!!!!!!!!!!!!!!!!

Many years ago (when I was not so old as I am now) I had a junior analyst/programmer, who I Had asked to insert about 20k ordered records from a tape file into a VSAM file where its primary was the sort key of the records on tape.  He wrote a COBOL program that took 7 hours to do so.  Now, he and a Systems Programmer had each been on a 5 day VSAM course, and they managed to optimise the download to take only half the time.  I went and looked at a COBOL manual for the first time in a few yeas (I was a Systems Analyst, and above 'mere' programming), and suggested they change the file type from 'RANDOM' to 'INDEX-SEQUENTIAL' - the download now took a mere 70 seconds!

At that I time I was adept at designing index sequential files on ICL mainframes, then VSAM on FACOM mainframes. So I avoided databases, especially as it involved another (rather snooty) team to do anything with a database, and program code had to be changed when migrating form development to UAT and then to production. Once they insisted I create a data model of a system I had designed with 5 files – after 4 hours overtime 2 members of that team and myself came up with a data model; that exactly matched the 5 files and fields I had used...

I left the mainframe world many years ago, and did not fall into the temptation to get back into COBOL programming for Y2K.

Now my favourite software stack is Linux/PostgreSQL/JBoss – and now I not only design systems, I am expected to code them too!