Обсуждение: VACUUM and read-mostly tables

Поиск
Список
Период
Сортировка

VACUUM and read-mostly tables

От
"Ian Westmacott"
Дата:
I have a near-real-time system writing into a Postgres 7.4.2 database
on the order of 340 million rows per day in about 300 million
transactions.  So we quickly bump up against the XID wrap-around
issue.  To address this, we divide the tables into 24-hour periods.
Once we roll over to a new period, the old tables are read-mostly.
We then run vacuum every 24 hours.

The problem is that we are writing rows every 1/15 second, 24x7.  There
is no down time.  I'm wondering if there is any way to avoid vacuuming
the old tables over and over.  The documentation seems to indicate that
a full vacuum is needed to avoid XID wrap-around.  Can vacuum freeze
help me?

Thanks,

    --Ian


Re: VACUUM and read-mostly tables

От
"Jim C. Nasby"
Дата:
We'll only answer if you do a write-up on your database. :P

Seriously, those are some seriously big numbers. What else is the
database doing? What hardware is it running on?
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: VACUUM and read-mostly tables

От
Tom Lane
Дата:
"Ian Westmacott" <ianw@intellivid.com> writes:
> The problem is that we are writing rows every 1/15 second, 24x7.  There
> is no down time.  I'm wondering if there is any way to avoid vacuuming
> the old tables over and over.  The documentation seems to indicate that
> a full vacuum is needed to avoid XID wrap-around.  Can vacuum freeze
> help me?

VACUUM FULL isn't really relevant.  VACUUM FREEZE on a particular table
should "fix" that table permanently, as long as you don't make any more
changes to it.  Keep in mind though that you still have to vacuum the
system catalogs often enough to avoid wraparound in them.  The real risk
here is in overlooking any one table.  You should probably use some kind
of automated vacuum driver ... have you looked at pg_autovacuum?

I believe Fujitsu is looking into what it'd take to make a variant
Postgres with 64-bit XIDs.  This'd probably imply also expanding CIDs,
OIDs, and some other things, so the space penalty is not to be sneezed
at ... but it might be worth it for installations like yours.

            regards, tom lane

Re: VACUUM and read-mostly tables

От
Ian Westmacott
Дата:
On Tue, 2005-04-05 at 00:41, Jim C. Nasby wrote:
> We'll only answer if you do a write-up on your database. :P
>
> Seriously, those are some seriously big numbers. What else is the
> database doing? What hardware is it running on?


We run on a dual 3.2GHz P4 with 2GB RAM, but are still
finalizing the storage hardware.  We've tried various
flavors of RAID, filesystems and volume management (and
are anxious to try out tablespaces in 8).  We've found
fragmentation to be our largest limiting factor.  XFS
helps with that, and seems to provide the highest
sustained throughput on raw tables, but its not the end
of the story since fragmentation is still high.

            --Ian



Re: VACUUM and read-mostly tables

От
Ian Westmacott
Дата:
On Tue, 2005-04-05 at 02:42, Tom Lane wrote:
> "Ian Westmacott" <ianw@intellivid.com> writes:
> > The problem is that we are writing rows every 1/15 second, 24x7.  There
> > is no down time.  I'm wondering if there is any way to avoid vacuuming
> > the old tables over and over.  The documentation seems to indicate that
> > a full vacuum is needed to avoid XID wrap-around.  Can vacuum freeze
> > help me?
>
> VACUUM FULL isn't really relevant.  VACUUM FREEZE on a particular table
> should "fix" that table permanently, as long as you don't make any more
> changes to it.  Keep in mind though that you still have to vacuum the
> system catalogs often enough to avoid wraparound in them.  The real risk
> here is in overlooking any one table.  You should probably use some kind
> of automated vacuum driver ... have you looked at pg_autovacuum?

We did look at pg_autovacuum.  We found the cost of row-
level statistics too high to be workable.  Looking at the
pg_autovacuum code, it appears that a full vacuum is
performed whenever the XIDs get old enough.  The problem
is that it is unpredictable when this will occur.

So we have started work on our own automated driver,
primarily for the analyze side of things at the moment,
but I would like to extend it to vacuum as well.  For
example, we could update the row-level statistics on a
less frequent basis than every insert.

But the question is whether vacuum freezing tables will
help me reduce the frequency of a full vacuum, or reduce
its cost when we do it?  That is, if more transactions
are frozen, will a full vacuum be more efficient
(primarily in the I/O)?

> I believe Fujitsu is looking into what it'd take to make a variant
> Postgres with 64-bit XIDs.  This'd probably imply also expanding CIDs,
> OIDs, and some other things, so the space penalty is not to be sneezed
> at ... but it might be worth it for installations like yours.

Thanks for the pointer, I'll take a look.


    --Ian




Re: VACUUM and read-mostly tables

От
Tom Lane
Дата:
Ian Westmacott <ianw@intellivid.com> writes:
> But the question is whether vacuum freezing tables will
> help me reduce the frequency of a full vacuum, or reduce
> its cost when we do it?  That is, if more transactions
> are frozen, will a full vacuum be more efficient
> (primarily in the I/O)?

I'm not sure if you are actually confused, or are just using confusing
terminology ... but there's a significant difference between VACUUM FULL
(ie, vacuum and try to reclaim space) and a database-wide vacuum.  I
*think* you are using "full vacuum" to mean "database-wide vacuum" but
it's not entirely clear.

Anyway, the frequency with which you have to do database-wide vacuums to
avoid XID wraparound is determined entirely by the rate at which you use
up XIDs.  Doing piecemeal VACUUM FREEZEs would reduce the amount of work
that needs to be done in the eventual database-wide vacuum, but it's not
clear that it'd be a net win given the added work of the extra VACUUM
scans.

Have you looked at whether you can slow down the rate of XID consumption
(ie, by bundling operations into larger transactions)?  That might be a
more useful route to limiting the costs involved.

            regards, tom lane

Re: VACUUM and read-mostly tables

От
"Jim C. Nasby"
Дата:
On Tue, Apr 05, 2005 at 11:13:06AM -0400, Ian Westmacott wrote:
> On Tue, 2005-04-05 at 00:41, Jim C. Nasby wrote:
> > We'll only answer if you do a write-up on your database. :P
> >
> > Seriously, those are some seriously big numbers. What else is the
> > database doing? What hardware is it running on?
>
>
> We run on a dual 3.2GHz P4 with 2GB RAM, but are still
> finalizing the storage hardware.  We've tried various
> flavors of RAID, filesystems and volume management (and
> are anxious to try out tablespaces in 8).  We've found
> fragmentation to be our largest limiting factor.  XFS
> helps with that, and seems to provide the highest
> sustained throughput on raw tables, but its not the end
> of the story since fragmentation is still high.

What else is the database doing besides the inserts?

And if UFS is available for linux you should might try it.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: VACUUM and read-mostly tables

От
Ian Westmacott
Дата:
On Tue, 2005-04-05 at 11:34, Tom Lane wrote:
> Ian Westmacott <ianw@intellivid.com> writes:
> > But the question is whether vacuum freezing tables will
> > help me reduce the frequency of a full vacuum, or reduce
> > its cost when we do it?  That is, if more transactions
> > are frozen, will a full vacuum be more efficient
> > (primarily in the I/O)?
>
> I'm not sure if you are actually confused, or are just using confusing
> terminology ... but there's a significant difference between VACUUM FULL
> (ie, vacuum and try to reclaim space) and a database-wide vacuum.  I
> *think* you are using "full vacuum" to mean "database-wide vacuum" but
> it's not entirely clear.

Yes, sorry, bad choice of words.  By "full vacuum" I meant
"database-wide".

> Anyway, the frequency with which you have to do database-wide vacuums to
> avoid XID wraparound is determined entirely by the rate at which you use
> up XIDs.  Doing piecemeal VACUUM FREEZEs would reduce the amount of work
> that needs to be done in the eventual database-wide vacuum, but it's not
> clear that it'd be a net win given the added work of the extra VACUUM
> scans.

But potentially I could spread the cost out over time,
making it less disruptive when it occurs, right?

> Have you looked at whether you can slow down the rate of XID consumption
> (ie, by bundling operations into larger transactions)?  That might be a
> more useful route to limiting the costs involved.

Yes, I'd like to explore that.  Essentially what happens
now is that a number of rows are written to each of about
two dozen tables, each with a COPY FROM STDIN, and then
there are a few INSERTS and UPDATES to boot.  As far as
we are concerned, all of this could be a single transaction.
My understanding is that all the inserts resulting from a
COPY are a single transaction, but is it possible to make
multiple COPYs a single transaction?  Or would I have to
do them all as individual INSERTs and make the whole thing
a single transaction?  Would that be more costly?

Thanks,

    --Ian



Re: VACUUM and read-mostly tables

От
Ian Westmacott
Дата:
On Tue, 2005-04-05 at 11:39, Jim C. Nasby wrote:
> On Tue, Apr 05, 2005 at 11:13:06AM -0400, Ian Westmacott wrote:
> > On Tue, 2005-04-05 at 00:41, Jim C. Nasby wrote:
> > > We'll only answer if you do a write-up on your database. :P
> > >
> > > Seriously, those are some seriously big numbers. What else is the
> > > database doing? What hardware is it running on?
> >
> >
> > We run on a dual 3.2GHz P4 with 2GB RAM, but are still
> > finalizing the storage hardware.  We've tried various
> > flavors of RAID, filesystems and volume management (and
> > are anxious to try out tablespaces in 8).  We've found
> > fragmentation to be our largest limiting factor.  XFS
> > helps with that, and seems to provide the highest
> > sustained throughput on raw tables, but its not the end
> > of the story since fragmentation is still high.
>
> What else is the database doing besides the inserts?

A proportionally small number of updates, no deletes, and
a set of moderately complex queries.

> And if UFS is available for linux you should might try it.

Would UFS help the fragmentation issue?  We have seen ext3
allocating blocks in 2-4 pages, while XFS manages 8-16
pages.

Thanks,

    --Ian



Re: VACUUM and read-mostly tables

От
Tom Lane
Дата:
Ian Westmacott <ianw@intellivid.com> writes:
> On Tue, 2005-04-05 at 11:34, Tom Lane wrote:
>> Have you looked at whether you can slow down the rate of XID consumption
>> (ie, by bundling operations into larger transactions)?  That might be a
>> more useful route to limiting the costs involved.

> Yes, I'd like to explore that.  Essentially what happens
> now is that a number of rows are written to each of about
> two dozen tables, each with a COPY FROM STDIN, and then
> there are a few INSERTS and UPDATES to boot.  As far as
> we are concerned, all of this could be a single transaction.
> My understanding is that all the inserts resulting from a
> COPY are a single transaction, but is it possible to make
> multiple COPYs a single transaction?

Sure.  Just wrap BEGIN/COMMIT around the whole mess.  There are only
a very small number of commands that can't be inside a transaction
block.

Keep in mind also that SELECT queries are also transactions;
so you might need to think about bundling read operations too.

            regards, tom lane

Re: VACUUM and read-mostly tables

От
"Jim C. Nasby"
Дата:
On Tue, Apr 05, 2005 at 01:56:05PM -0400, Ian Westmacott wrote:
> > And if UFS is available for linux you should might try it.
>
> Would UFS help the fragmentation issue?  We have seen ext3
> allocating blocks in 2-4 pages, while XFS manages 8-16
> pages.

Hrm... I'm not sure what the allocation stratedgy is. I guess this is a
case where it would be useful to tell postgresql to allocate space for a
table in large chunks.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: VACUUM and read-mostly tables

От
"Jim C. Nasby"
Дата:
On Tue, Apr 05, 2005 at 01:56:05PM -0400, Ian Westmacott wrote:
> > And if UFS is available for linux you should might try it.
>
> Would UFS help the fragmentation issue?  We have seen ext3
> allocating blocks in 2-4 pages, while XFS manages 8-16
> pages.

Actually, I spoke too soon in my last reply. UFS has done work to be
more intelligent about where it allocates space from, so it's very
possible that even if it's only allocating a block at a time that it
will be smart enough not to allocate nearby space for other things. I
believe there's also some tuning parameters that affect that behavior.
Of course, other linux filesystems might be tunable in that regard as
well.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"