Обсуждение: Catching up Production from Warm Standby after maintenance - Please help

Поиск

Список

Период

Сортировка

Catching up Production from Warm Standby after maintenance - Please help

От

Jennifer Spencer

Дата:

07 июля 2009 г., 02:38:33

Hello All -

I will soon be receiving a new production and warm-standby machine. Prior to now, we have only had one database machine and no warm standby.

We had planned to feed the standby with one backup and then use WAL files rsync'd over until either catastrophe or maintenance required on the primary machine. I will be able to properly close out my users on the primary, force a closeout of the log file (pg_switch_xlog), and shut it down. Then I can catch up and ingest the last WAL on my warm standby, let the users in, and have the warm standby become the new primary. All okay so far.

But. When the primary is vacuumed, re-indexed and all clean and shiny again, HOW do I catch up with data changes that happened in the meantime on my warm standby without corruption or >30 minutes of user down-time? I cannot re-ingest WALs from the warm standby into the cleaned up primary or I get a PANIC, and I don't have time to run a full backup on the warm standby and ingest it into the primary leaving everything down.

I know some of you must have tackled this before, so I am really hoping you can help me. I checked the archives but didn't see anything about how to recover the primary after the repair or maintenance had been completed. I was really hoping to use the logs on the warm standby to feed into the cleaned up primary to catch up, but it seems that's a hopeless idea since any vacuuming or other cleaning I may do changes my binary data files beyond the WAL's recognition.

Thanks in advance for your assistance.

Best regards,
Jennifer Spencer

Lauren found her dream laptop. Find the PC that’s right for you.

Re: Catching up Production from Warm Standby after maintenance - Please help

От

Scott Marlowe

Дата:

07 июля 2009 г., 05:54:30

On Mon, Jul 6, 2009 at 8:38 PM, Jennifer
Spencer<jenniferm411@hotmail.com> wrote:
> Hello All -
>
> I will soon be receiving a new production and warm-standby machine.  Prior
> to now, we have only had one database machine and no warm standby.
>
> We had planned to feed the standby with one backup and then use WAL files
> rsync'd over until either catastrophe or maintenance required on the primary
> machine.  I will be able to properly close out my users on the primary,
> force a closeout of the log file (pg_switch_xlog), and shut it down.  Then I
> can catch up and ingest the last WAL on my warm standby, let the users in,
> and have the warm standby become the new primary.  All okay so far.
>
> But.  When the primary is vacuumed, re-indexed and all clean and shiny
> again, HOW do I catch up with data changes that happened in the meantime on
> my warm standby without corruption or >30 minutes of user down-time?  I
> cannot re-ingest WALs from the warm standby into the cleaned up primary or I
> get a PANIC, and I don't have time to run a full backup on the warm standby
> and ingest it into the primary leaving everything down.

If you've moved on, so to speak, with the new primary, you restart the
old primary, now warm standby, the same way you initially created the
warm standby.  issue the start hot backup command to the primary, copy
over all the data dir and start shipping WAL files to it before you
start continuous recovery.

Re: Catching up Production from Warm Standby after maintenance - Please help

От

Jennifer Spencer

Дата:

07 июля 2009 г., 09:12:59

Hi Scott -

> > But. When the primary is vacuumed, re-indexed and all clean and shiny
> > again, HOW do I catch up with data changes that happened in the meantime on
> > my warm standby without corruption or >30 minutes of user down-time? I
> > cannot re-ingest WALs from the warm standby into the cleaned up primary or I
> > get a PANIC, and I don't have time to run a full backup on the warm standby
> > and ingest it into the primary leaving everything down.
>
> If you've moved on, so to speak, with the new primary, you restart the
> old primary, now warm standby, the same way you initially created the
> warm standby. issue the start hot backup command to the primary, copy
> over all the data dir and start shipping WAL files to it before you
> start continuous recovery.

If I do that, the primary will not be clean anymore. It will be as unvacuumed and index-bloated as the warm standby. Or am I missing something?

Thanks,
Jennifer

Insert movie times and more without leaving Hotmail®. See how.

Re: Catching up Production from Warm Standby after maintenance - Please help

От

Scott Mead

Дата:

07 июля 2009 г., 11:33:30

On Tue, Jul 7, 2009 at 5:12 AM, Jennifer Spencer <jenniferm411@hotmail.com> wrote:

>
> If you've moved on, so to speak, with the new primary, you restart the
> old primary, now warm standby, the same way you initially created the
> warm standby. issue the start hot backup command to the primary, copy
> over all the data dir and start shipping WAL files to it before you
> start continuous recovery.

If I do that, the primary will not be clean anymore. It will be as unvacuumed and index-bloated as the warm standby. Or am I missing something?

I think that Scott's point was that once you have brought the standby 'alive', you have no other option but to start over. Warm-Standby isn't for reindex type operations, i.e. it's a failover mechanism, not to be confused with a switchover mechanism which lets you move back and forth easily. Once you cut to the standby, you have to do a full re-sync to the old primary system. What you're looking for is a replication system like Slony.

Are indexing and vacuuming hurting so much that you can't do them online? Why not use 'create index concurrently' and set vacuum_cost_delay to help keep these operations from impacting your production system? What version of PG are you using?

-- Another Scott :-)

Re: Catching up Production from Warm Standby after maintenance - Please help

От

Jennifer Spencer

Дата:

07 июля 2009 г., 16:43:19

We are using v. 8.3.1 at present. We anticipate a terabyte of data each year starting in November, and I am concerned about what happens maintenance-wise a couple of years down the line. I think that we won't be able to do vacuuming/reindexing with the machine online and serving users if the database is over a certain size. Am I wrong?

Our set up allows the users to create and delete ad-hoc tables in their own namespaces (each user has his own schema in addition to some overall schemas for the project). Since Slony does not automatically handle "create table" and "drop table", I would have to incorporate that into the infrastructure APIs that create & drop tables and sequenes. I think it would be non-trivial to implement a full-on, total replication scenario. We are using Slony now for a select group of tables, but it's a separate API to add a table or remove it from replication.

What do you do when you have to do maintenance? Don't you take your primary offline and clean it? Or is this old-school thinking? I am coming from a Sybase environment, and previously I was able to use transaction logs to catch up post-maintenance.

It seems odd to me to have a fast, powerful machine left solely in warm standby recovery mode that cannot be used to alleviate the pressures of DB maintenance. My systems admin has done nothing but complain about a "wasted" machine - he does not see the value of having the standby. Of course, if it were Slony'd, we could use it, I suppose.

Thanks for your help,
Jennifer

Date: Tue, 7 Jul 2009 07:33:16 -0400
Subject: Re: [ADMIN] Catching up Production from Warm Standby after maintenance - Please help
From: scott.lists@enterprisedb.com
To: jenniferm411@hotmail.com
CC: scott.marlowe@gmail.com; pgsql-admin@postgresql.org

On Tue, Jul 7, 2009 at 5:12 AM, Jennifer Spencer <jenniferm411@hotmail.com> wrote:

>
> If you've moved on, so to speak, with the new primary, you restart the
> old primary, now warm standby, the same way you initially created the
> warm standby. issue the start hot backup command to the primary, copy
> over all the data dir and start shipping WAL files to it before you
> start continuous recovery.

If I do that, the primary will not be clean anymore. It will be as unvacuumed and index-bloated as the warm standby. Or am I missing something?

-- Another Scott :-)

Lauren found her dream laptop. Find the PC that’s right for you.

Re: Catching up Production from Warm Standby after maintenance - Please help

От

"Kevin Grittner"

Дата:

07 июля 2009 г., 16:53:16

Jennifer Spencer <jenniferm411@hotmail.com> wrote:

> I think that we won't be able to do vacuuming/reindexing with the
> machine online and serving users if the database is over a certain
> size.  Am I wrong?

Probably.  My first concern is to make sure you aren't doing VACUUM
FULL as part of your maintenance cycle.  That option is meant for
recovery from extreme bloat, and is sort of a "last resort".  Other
vacuums coexist with normal usage as long as you have things properly
configured for your environment.  You probably won't have a need to
reindex if you stay away from VACUUM FULL and otherwise avoid unusual
activity which causes index bloat; however, if you do need to reindex
without down time, there is CREATE INDEX CONCURRENTLY which can be
used to achieve that.

-Kevin

Re: Catching up Production from Warm Standby after maintenance - Please help

От

Jennifer Spencer

Дата:

07 июля 2009 г., 17:02:30

>> Am I wrong?
>
> Probably. My first concern is to make sure you aren't doing VACUUM
> FULL as part of your maintenance cycle. That option is meant for
> recovery from extreme bloat, and is sort of a "last resort".

Good - glad to be wrong about that! We do mostly inserts, no updates and very few deletes. We drop entire tables but don't delete often. We have very long rows, though. Do you think the above is a situation likely to create extreme bloat? My Sybase experience with extreme bloat was that it was caused by a three-field clustered index in a very long short-row table over time (~a year). This job doesn't use clustered indexes.

I thought we had to do vacuum full to avoid transaction ID wraparound/reset issues? We do have a lot of transactions, a whole lot.

Are you saying that most admins avoid VACUUM FULL as much as possible? What about XID?
Thanks,
Jennifer

Other
> vacuums coexist with normal usage as long as you have things properly
> configured for your environment. You probably won't have a need to
> reindex if you stay away from VACUUM FULL and otherwise avoid unusual
> activity which causes index bloat; however, if you do need to reindex
> without down time, there is CREATE INDEX CONCURRENTLY which can be
> used to achieve that.
>
> -Kevin

Windows Live™: Keep your life in sync. Check it out.

Re: Catching up Production from Warm Standby after maintenance - Please help

От

Scott Marlowe

Дата:

07 июля 2009 г., 17:13:05

On Tue, Jul 7, 2009 at 10:42 AM, Jennifer
Spencer<jenniferm411@hotmail.com> wrote:
> We are using v. 8.3.1 at present.

You should really update to the latest 8.3.x version.  It's simple and fast.

> We anticipate a terabyte of data each
> year starting in November, and I am concerned about what happens
> maintenance-wise a couple of years down the line.  I think that we won't be
> able to do vacuuming/reindexing with the machine online and serving users if
> the database is over a certain size.  Am I wrong?

Depends on how fast your IO subsystem is really.  Are you planning for
enough IO capacity for your uses?  Using high quality RAID controllers
with plenty of drives (spinning OR solid state)?

> Our set up allows the users to create and delete ad-hoc tables in their own
> namespaces (each user has his own schema in addition to some overall schemas
> for the project).  Since Slony does not automatically handle "create table"
> and "drop table", I would have to incorporate that into the infrastructure
> APIs that create & drop tables and sequenes.  I think it would be
> non-trivial to implement a full-on, total replication scenario.  We are
> using Slony now for a select group of tables, but it's a separate API to add
> a table or remove it from replication.

So, you're allowing users to do adhoc DDL on a production system? No
testing / staging environment beforehand?  You are asking for trouble
IMHO.  OTOH, there are instances where well defined applications can
run DDL.

> What do you do when you have to do maintenance?

Define "maintenance".  Maybe you're bringing along a few more bad
habits from sybase.

>  Don't you take your primary
> offline and clean it?

I can (and have once or twice) because I use slony.

> Or is this old-school thinking?  I am coming from a
> Sybase environment, and previously I was able to use transaction logs to
> catch up post-maintenance.

Well, that's not gonna work here.

> It seems odd to me to have a fast, powerful machine left solely in warm
> standby recovery mode that  cannot be used to alleviate the pressures of DB
> maintenance.

Well, maybe you're thinking of log shipping replication in a different
way because of your experience.  8.5 might bring along true hot WAL
replication standbys.  They didn't make it into 8.4


> My systems admin has done nothing but complain about a
> "wasted" machine - he does not see the value of having the standby.

Then he's not a particularly well informed nor educated sys admin.  If
the primary server fails, hard, the seconday can take over in minutes.
 Is there no value in that?  If he can't see the value there, then
find a new sysadmin .

>  Of course, if it were Slony'd, we could use it, I suppose.

Yep, and if the hot WAL replication stuff was available.  I hear
there's a patch for it that works mostly


>
> Thanks for your help,
> Jennifer
>
> ________________________________
> Date: Tue, 7 Jul 2009 07:33:16 -0400
> Subject: Re: [ADMIN] Catching up Production from Warm Standby after
> maintenance - Please help
> From: scott.lists@enterprisedb.com
> To: jenniferm411@hotmail.com
> CC: scott.marlowe@gmail.com; pgsql-admin@postgresql.org
>
> On Tue, Jul 7, 2009 at 5:12 AM, Jennifer Spencer <jenniferm411@hotmail.com>
> wrote:
>
>>
>> If you've moved on, so to speak, with the new primary, you restart the
>> old primary, now warm standby, the same way you initially created the
>> warm standby. issue the start hot backup command to the primary, copy
>> over all the data dir and start shipping WAL files to it before you
>> start continuous recovery.
>
> If I do that, the primary will not be clean anymore.  It will be as
> unvacuumed and index-bloated as the warm standby.  Or am I missing
> something?
>
>    I think that Scott's point was that once you have brought the standby
> 'alive', you have no other option but to start over.  Warm-Standby isn't for
> reindex type operations, i.e. it's a failover mechanism, not to be confused
> with a switchover mechanism which lets you move back and forth easily.  Once
> you cut to the standby, you have to do a full re-sync to the old primary
> system.  What you're looking for is a replication system like Slony.
>
>     Are indexing and vacuuming hurting so much that you can't do them
> online?   Why not use 'create index concurrently' and set vacuum_cost_delay
> to help keep these operations from impacting your production system?  What
> version of PG are you using?
> -- Another Scott :-)
>
> ________________________________
> Lauren found her dream laptop. Find the PC that’s right for you.



--
When fascism comes to America, it will be intolerance sold as diversity.

Re: Catching up Production from Warm Standby aftermaintenance - Please help

От

Scott Marlowe

Дата:

07 июля 2009 г., 17:15:50

On Tue, Jul 7, 2009 at 11:10 AM, Scott Whitney<swhitney@journyx.com> wrote:
> I'd like to phone in with a slightly different opinion on VACUUM FULL. Yeah,
> it should be avoided when possible, but it's not always possible. In our
> case, I've got 300ish databases backing to a single database server. Each of
> those dbs has a couple of hundred tables and a hundred or more views. The
> product (Journyx Timesheet) is pretty complex, and I find that if I do _not_
> perform a full vacuum once per week, my customer dbs start to slow down
> inordinately. Queries which would run in 1-2 seconds will run in 30-40
> seconds after a few weeks of not performing a full vacuum.

Wait, full vacuum on the whole db, or vacuum full?

> I've got autovac
> running on all dbs.
>
> Now, that could well be due to index bloat with complex indexes, or it could
> be due to a variety of other factors, but also my pg_clog directory does not
> clear out, but continues to create new clog segments. Running my weekly
> vac-full-analyze resolves that problem for me. This might not be the case
> for you if you have a less complex schema, especially noting how you say you
> use it.

You likely have very long running transactions.  Look for idle in
transaction queries in the pg_stat_activity table.

It may be that right now vacuum full is the only fix but if you can
identify a reason regular vacuum isn't working you could eliminate the
need for vacuum full.

> I _think_ autovacuum, somewhere around early 8.x resolves the transaction
> wrap-around issues, but someone else should verify that.

Ayup.

Re: Catching up Production from Warm Standby aftermaintenance - Please help

От

"Scott Whitney"

Дата:

07 июля 2009 г., 17:22:19

I'd like to phone in with a slightly different opinion on VACUUM FULL. Yeah,
it should be avoided when possible, but it's not always possible. In our
case, I've got 300ish databases backing to a single database server. Each of
those dbs has a couple of hundred tables and a hundred or more views. The
product (Journyx Timesheet) is pretty complex, and I find that if I do _not_
perform a full vacuum once per week, my customer dbs start to slow down
inordinately. Queries which would run in 1-2 seconds will run in 30-40
seconds after a few weeks of not performing a full vacuum. I've got autovac
running on all dbs.

Now, that could well be due to index bloat with complex indexes, or it could
be due to a variety of other factors, but also my pg_clog directory does not
clear out, but continues to create new clog segments. Running my weekly
vac-full-analyze resolves that problem for me. This might not be the case
for you if you have a less complex schema, especially noting how you say you
use it.

I _think_ autovacuum, somewhere around early 8.x resolves the transaction
wrap-around issues, but someone else should verify that.

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jennifer Spencer
Sent: Tuesday, July 07, 2009 12:02 PM
To: kevin.grittner@wicourts.gov; scott.lists@enterprisedb.com
Cc: scott.marlowe@gmail.com; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Catching up Production from Warm Standby
aftermaintenance - Please help

>> Am I wrong?
>
> Probably. My first concern is to make sure you aren't doing VACUUM
> FULL as part of your maintenance cycle. That option is meant for
> recovery from extreme bloat, and is sort of a "last resort".

Good - glad to be wrong about that!  We do mostly inserts, no updates and
very few deletes.  We drop entire tables but don't delete often.  We have
very long rows, though.  Do you think the above is a situation likely to
create extreme bloat?  My Sybase experience with extreme bloat was that it
was caused by a three-field clustered index in a very long short-row table
over time (~a year).  This job doesn't use clustered indexes.

I thought we had to do vacuum full to avoid transaction ID wraparound/reset
issues?  We do have a lot of transactions, a whole lot.

Are you saying that most admins avoid VACUUM FULL as much as possible?  What
about XID?
Thanks,
Jennifer

Other
> vacuums coexist with normal usage as long as you have things properly
> configured for your environment. You probably won't have a need to
> reindex if you stay away from VACUUM FULL and otherwise avoid unusual
> activity which causes index bloat; however, if you do need to reindex
> without down time, there is CREATE INDEX CONCURRENTLY which can be
> used to achieve that.
>
> -Kevin


________________________________

Windows LiveT: Keep your life in sync. Check it out.
<http://windowslive.com/explore?ocid=TXT_TAGLM_WL_BR_life_in_synch_062009>

Re: Catching up Production from Warm Standby aftermaintenance - Please help

От

"Scott Whitney"

Дата:

07 июля 2009 г., 17:35:19

>> I'd like to phone in with a slightly different opinion on VACUUM FULL.
Yeah,
>> it should be avoided when possible, but it's not always possible. In our
>> case, I've got 300ish databases backing to a single database server. Each
of
>> those dbs has a couple of hundred tables and a hundred or more views. The
>> product (Journyx Timesheet) is pretty complex, and I find that if I do
_not_
>> perform a full vacuum once per week, my customer dbs start to slow down
>> inordinately. Queries which would run in 1-2 seconds will run in 30-40
>> seconds after a few weeks of not performing a full vacuum.

>Wait, full vacuum on the whole db, or vacuum full?

Vac full analyze on each and every database weekly. Sepcifically:

PGCMD = 'vacuumdb -a -f -v -z'

>> I've got autovac
>> running on all dbs.
>>
>> Now, that could well be due to index bloat with complex indexes, or it
could
>> be due to a variety of other factors, but also my pg_clog directory does
not
>> clear out, but continues to create new clog segments. Running my weekly
>> vac-full-analyze resolves that problem for me. This might not be the case
>> for you if you have a less complex schema, especially noting how you say
you
>> use it.

>You likely have very long running transactions.  Look for idle in
>transaction queries in the pg_stat_activity table.

>It may be that right now vacuum full is the only fix but if you can
>identify a reason regular vacuum isn't working you could eliminate the
>need for vacuum full.

I suspect it has to do with our architecture. Each application (out of the
box) has 2 daemons constantly connected to the pg backend, or there are more
if configured. My _guess_ would be that since the database is detected as
"in use," autovac isn't fully clearing. I could be wrong. Some of my
backend_starts go back to February of this year (last time I restarted the
server, as I recall). I would _think_ that a vac full would have the same
issues, but maybe not. In the case of the vac full, I'm not stopping and
restarting the app servers, so it should be the same as an autovac, in
theory.

In my pg_stat_activity, all I see is constant "command string not enabled."
stats_command_string is commented out in my conf (default is off). What's
the performance implications, if any, of turning that on?

Re: Catching up Production from Warm Standby aftermaintenance - Please help

От

Alvaro Herrera

Дата:

07 июля 2009 г., 17:35:31

Scott Whitney escribió:
> I'd like to phone in with a slightly different opinion on VACUUM FULL. Yeah,
> it should be avoided when possible, but it's not always possible. In our
> case, I've got 300ish databases backing to a single database server. Each of
> those dbs has a couple of hundred tables and a hundred or more views. The
> product (Journyx Timesheet) is pretty complex, and I find that if I do _not_
> perform a full vacuum once per week, my customer dbs start to slow down
> inordinately. Queries which would run in 1-2 seconds will run in 30-40
> seconds after a few weeks of not performing a full vacuum. I've got autovac
> running on all dbs.

That's most likely because you have too small an FSM.  Have you tuned
that?

> Now, that could well be due to index bloat with complex indexes,

VACUUM FULL does not clean indexes.

> or it could be due to a variety of other factors, but also my pg_clog
> directory does not clear out, but continues to create new clog
> segments.

That's expected.  If pg_clog size bothers you, there's another parameter
you can tweak.  However, pg_clog size should not normally be a problem;
it's just 32kB for every million transactions or something like that.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Catching up Production from Warm Standbyaftermaintenance - Please help

От

"Scott Whitney"

Дата:

07 июля 2009 г., 17:41:47

>That's most likely because you have too small an FSM.  Have you tuned
>that?

My settings are:

max_fsm_pages = 1500000                 # min max_fsm_relations*16, 6 bytes
each
max_fsm_relations = 70000               # min 100, ~70 bytes each

It's quite possible that those settings are too low, but I've honestly no
idea how to estimate these other than trial and error. I've not had the time
to really brush up on that level of tuning.

>> or it could be due to a variety of other factors, but also my pg_clog
>> directory does not clear out, but continues to create new clog
>> segments.

>That's expected.  If pg_clog size bothers you, there's another parameter
>you can tweak.  However, pg_clog size should not normally be a problem;
>it's just 32kB for every million transactions or something like that.

Right, the problem was that I ran into db corruption a couple of years ago,
and I had thousands of clog segments sitting out there with uncommitted
transactions, for whatever reason. Turns out it had something to do with
template1 not getting vacuumed, near as I could tell. Once I recovered from
that nightmare, the only way I found to ensure that the clogs were properly
removed was to vac full for whatever reason.

Re: Catching up Production from Warm Standby aftermaintenance - Please help

От

Jennifer Spencer

Дата:

07 июля 2009 г., 17:43:12

You are sure that the XID wraparound is gone? That's good news. No other reasons for vacuum full on the entire database.

We could do it a table at a time if we absolutely have to do it, and that would minimize down time on the rest of the system.

-Jennifer

> > I _think_ autovacuum, somewhere around early 8.x resolves the transaction
> > wrap-around issues, but someone else should verify that.
>
> Ayup.
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin

Lauren found her dream laptop. Find the PC that’s right for you.

Re: Catching up Production from Warm Standbyaftermaintenance - Please help

От

Lewis Kapell

Дата:

07 июля 2009 г., 17:48:47

Scott Whitney wrote:
> max_fsm_pages = 1500000                 # min max_fsm_relations*16, 6 bytes
> each
> max_fsm_relations = 70000               # min 100, ~70 bytes each
>
> It's quite possible that those settings are too low, but I've honestly no
> idea how to estimate these other than trial and error. I've not had the time
> to really brush up on that level of tuning.
>

If you were planning to upgrade to 8.4 in the near future, then you
wouldn't have to worry about these settings any more.  Those settings
tune themselves and they are no longer specified in the configuration file.

Lewis Kapell

Re: Catching up Production from Warm Standby aftermaintenance - Please help

От

Scott Mead

Дата:

07 июля 2009 г., 17:52:25

On Tue, Jul 7, 2009 at 1:40 PM, Jennifer Spencer <jenniferm411@hotmail.com> wrote:

You are sure that the XID wraparound is gone? That's good news. No other reasons for vacuum full on the entire database.

I think we're talking apples and gorillas on the use of the word 'full'.

There is a command:

VACUUM FULL;

When you do that, you lock a table and much hatred reigns upon you from user-land.

When you:

vacuum all databases in a cluster (notice, no 'FULL' here), (could be through autovacuum)

You are doing a vacuum that is capable of operating alongside transactions.

XID wrap-around is still an issue, but if you have autovacuum on and you pay attention to your logs, you'll be okay.

We could do it a table at a time if we absolutely have to do it, and that would minimize down time on the rest of the system.

No need, see above.

Also, a quick note about your growth pattern (~ 1 TB / year)

If you're going to be growing that much, index builds on a TB of data really stink. You may want to look into constraint_exclusion partitioning (maybe by quarter or something easy to chunk up ) would be worth-while. When you do need to do index rebuilds, you can:

A) Do them concurrently

Rebuilding an index becomes

CREATE INDEX CONCURRENTLY my_index_1 ....

DROP INDEX my_index_0;

ANALYZE table;

That way, you aren't blocking during that rebuild.

B) Your indexes will be a subset of your data, in my example, you'd only be building one quarter's worth of indexes.

C) It's possible that after a quarter is over, you won't even need to reindex because maybe you won't need data from a previous quarter.

Good - glad to be wrong about that! We do mostly inserts, no updates and very few deletes. We drop entire tables but don't delete often. We have very long rows, though. Do you think the above is a situation likely to create extreme bloat?

I wonder, what do you use to decide when to reindex? Under this situation, it's very possible that you don't need to do it all that often. Are you just flying 'willy-nilly' about reindexing things, or is there some indicator you use?

--Scott

-Jennifer

> > I _think_ autovacuum, somewhere around early 8.x resolves the transaction
> > wrap-around issues, but someone else should verify that.
>
> Ayup.
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin

Lauren found her dream laptop. Find the PC that’s right for you.

Re: Catching up Production from Warm Standbyaftermaintenance - Please help

От

Scott Mead

Дата:

07 июля 2009 г., 17:52:38

On Tue, Jul 7, 2009 at 1:46 PM, Lewis Kapell <lkapell@setonhome.org> wrote:

If you were planning to upgrade to 8.4 in the near future, then you wouldn't have to worry about these settings any more. Those settings tune themselves and they are no longer specified in the configuration file.

The visibility map also helps with vacuum, it keeps the amount of un-necessary I/O down (if a whole block has 'live' rows anyway).

--Scott

Lewis Kapell

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Catching up Production from Warm Standby aftermaintenance - Please help

От

"Kevin Grittner"

Дата:

07 июля 2009 г., 17:55:04

Jennifer Spencer <jenniferm411@hotmail.com> wrote:

> We do mostly inserts, no updates and very few deletes.  We drop
> entire tables but don't delete often.  We have very long rows,
> though.  Do you think the above is a situation likely to create
> extreme bloat?

No.  Only deletes and updates can cause dead rows, and if you use
regular vacuums with reasonable frequency (including just having a
reasonably configured autovacuum) you won't get extreme bloat from
even those -- provided you avoid a few pitfalls.

(1)  Avoid long-running transactions to the degree that you can.
Vacuum must leave a dead row if there's any transaction that might
still be able to see it.

(2)  If you're running a version prior to 8.4, make sure your Free
Space Manager settings are adequate.  (A VACUUM VERBOSE at the
database level will report where you're at in that regard in the last
few lines.)

(3)  Avoid updating too many rows in a single database transaction.
(A series of smaller updates can allow autovacuum to reclaim dead row
space from earlier updates to use for the new rows generated by later
updates.)

> My Sybase experience with extreme bloat was that it was caused by a
> three-field clustered index in a very long short-row table over time
> (~a year).  This job doesn't use clustered indexes.

The issues are entirely different in PostgreSQL.  Clustered indexes
don't mean remotely the same thing.  (In PostgreSQL a CLUSTER
operation rewrites the table, putting the rows in the order of the
specified index, but there is no effort to maintain that sequence
after that point.  An index flagged for "cluster" in PostgreSQL is
merely the default index to use if you ask to cluster a table without
specifying an index.)  Sybase doesn't (or didn't last I used it) use
MVCC, so an UPDATE modified the row in place, protecting the operation
with blocking locks; it wasn't the DELETE/INSERT pair that it is in
PostgreSQL.

> I thought we had to do vacuum full to avoid transaction ID
> wraparound/reset issues?  We do have a lot of transactions, a whole
> lot.

Protecting against that requires vacuum to *freeze* the tuples, which
does not require VACUUM FULL.  Don't confuse VACUUM FULL with VACUUM
FREEZE or a with a VACUUM of the entire database.  Autovacuum can
normally cover your needs for tuple freezing without explicitly
running anything to do so.

> Are you saying that most admins avoid VACUUM FULL as much as
> possible?

Yes.  We tried it a few times early on and discovered that aggressive
maintenance is unnecessary if you do a good job with your regular
maintenance, and that if you have room for a second copy of a table,
CLUSTER is almost always a better option than VACUUM FULL.

-Kevin

Re: Catching up Production from Warm Standby after maintenance - Please help

От

Scott Marlowe

Дата:

07 июля 2009 г., 19:45:44

On Tue, Jul 7, 2009 at 11:01 AM, Jennifer
Spencer<jenniferm411@hotmail.com> wrote:
>>> Am I wrong?
>>
>> Probably. My first concern is to make sure you aren't doing VACUUM
>> FULL as part of your maintenance cycle. That option is meant for
>> recovery from extreme bloat, and is sort of a "last resort".
>
> Good - glad to be wrong about that!  We do mostly inserts, no updates and
> very few deletes.  We drop entire tables but don't delete often.  We have
> very long rows, though.  Do you think the above is a situation likely to
> create extreme bloat?

Nope.  Updates and deletes are where bloat comes from.

>  My Sybase experience with extreme bloat was that it
> was caused by a three-field clustered index in a very long short-row table
> over time (~a year).  This job doesn't use clustered indexes.

You could, they don't cause bloat in pgsql.  In fact they help get rid of it.

> I thought we had to do vacuum full to avoid transaction ID wraparound/reset
> issues?  We do have a lot of transactions, a whole lot.

I think you thought you saw the word full in the wrong order with the
requirement to do a full vacuum of the database.

> Are you saying that most admins avoid VACUUM FULL as much as possible?  What
> about XID?

Yes, and it's handled by both autovacuum and running a full database
wide regular old vacuum.

Re: Catching up Production from Warm Standby after maintenance - Please help

От

Scott Marlowe

Дата:

07 июля 2009 г., 19:53:57

On Tue, Jul 7, 2009 at 10:42 AM, Jennifer
Spencer<jenniferm411@hotmail.com> wrote:
> We are using v. 8.3.1 at present.  We anticipate a terabyte of data each
> year starting in November, and I am concerned about what happens
> maintenance-wise a couple of years down the line.  I think that we won't be
> able to do vacuuming/reindexing with the machine online and serving users if
> the database is over a certain size.  Am I wrong?

I wanted to comment a bit more on this.  At my last job I had a head
dev guy who's experience with pgsql was back in the 7.0 days or so,
and his standard phrase was "vacuum in postgresql isn't fast enough."
The problem is that vacuum in postgres, to him, was full and locking
vacuum and it was being run on a machine with a mirrored set of PATA
hard drives that could read / write in the low megs / second range.

Enter 2009.  Individual hard drives can read / write in the 100MB/s
range, and a good RAID array controller can aggregate many drives and
get 400 to 800 MB/s easily.  Now the issue is controlling vacuum so
that it doesn't eat up all of your IO bandwidth, which is why you have
autovacuum_vacuum_cost_delay, and other settings, that let you tone
down vacuum so it's not in the way.

So the real issue for you will be do you have enough IO bandwidth to
handle all your users AND autovacuum at the same time.  Vacuuming can
run just fine with users online, as long as you've set your autovac
cost parameters to not make it get in the way, and as long as you have
enough IO bandwidth to handle your load.  Routine reindexing and
vacuum full and such are not generally recommended.

So, what class machines are these, and specifically how much ram, what
kind of RAID controllers, and how many hard drives are you throwing at
the problem?

Re: Catching up Production from Warm Standby after maintenance - Please help

От

Jennifer Spencer

Дата:

07 июля 2009 г., 22:58:32

Hi - Some answers, questions and comments below

> At my last job I had a head dev guy who's experience with pgsql was back in the 7.0 days or so, and his standard phrase was "vacuum in postgresql isn't fast enough."
Yeah, I don't want to turn into that guy.

> So, what class machines are these, and specifically how much ram, what kind of RAID controllers, and how many hard drives are you throwing at the problem?
We have two identical enterprise Linux machines coming. They are not here yet (hence my planning concerns). Presently, we have a lesser but still good enterprise machine with 64gb of RAM, 8 CPUs and ~1 TB of space. The new ones coming have 5 TB each of RAID disks mirrored across to another 5TB (20 5.4gb spinning disks). Not sure the RAID level but I think it's level six. They each have 16 CPUs, and (I believe) 128 Gb of RAM, connected to our NFS network on a 10g switch (at least I think it's the 10 gigabit switch - it might be one gigabit).

> I wonder, what do you use to decide when to reindex? Under this situation, it's very possible that you don't need to do it all that often. Are you just flying 'willy-nilly' about reindexing things, or is there some indicator you use?
I am hoping to use system statistics for the table(s). In theory, once the index size gets to be a larger logical fraction of the table size than it reasonably should be, it's time to consider re-indexing. I thought to use some combination of pg_stat_user_indexes info and pg_statio_all_indexes, but I haven't had to do this yet and if you have suggestions, that'd be helpful. I should be able to run a check of things once/week and cron it to email me the results with a warning if the numbers get too high.

>So, you're allowing users to do adhoc DDL on a production system? No testing / staging environment beforehand?  You are asking for trouble
IMHO.  OTOH, there are instances where well defined applications can run DDL.

We have developed a big set of C APIs that work with Postgres, and allow the users to create data tables based on a definition file.  They can specify an index or three, and can 
choose among a few data types.  The thing works remarkably well, much better than I would have thought at first blush.  The "production" tables that are the big ones are designed carefully with DBA input.  

>Define "maintenance".  Maybe you're bringing along a few more bad habits from Sybase.
 Oh, I am sure that's likely.  Heh.  I lock out the users, reindex things that need reindexing, based on use patterns and size, I update all histograms and statistics for user and system tables, I remove any dead rows possible 
(hopefully all of them), and I make a nice clean tape set when I am done - before letting anyone back in.  I often cycle the server (pg_ctl start/stop) as well.  Sometimes we take that time to power cycle the whole machine
and do any machine maintenance we can after the nice clean backup is done.

> XID wrap-around is still an issue, but if you have autovacuum on and you pay attention to your logs, you'll be okay.
Having re-read the section on XID wraparound, and having the support of our manager to keep people from doing long-running transactions, I think we're okay here too. I can always set a cron for the combing through the logs (which I need to do anyway), and I can keep a better eye on LRTs. Do you prefer parsing through 'select * from pg_stat_activity;' for checking for long-running trans, or is there an easier way?

>You may want to look into constraint_exclusion partitioning
Thanks for the specifics on this tip. I think I'll be able to use that since our data is largely time-based.

> If you're running a version prior to 8.4, make sure your Free Space Manager settings are adequate. (A VACUUM VERBOSE at the database level will report where you're at in that regard in the last few lines.)
I show the following in postgresql.conf (this has not been tuned):
# - Free Space Map -
max_fsm_pages = 204800                  # min max_fsm_relations*16, 6 bytes each
                                        # (change requires restart)
#max_fsm_relations = 1000               # min 100, ~70 bytes each
                                        # (change requires restart)

I have a couple of weeks to consider the possibilty of upgrading to 8.4. Since it's only been out for a short time, I doubt we'll upgrade before 8.4.2.

> We tried it a few times early on and discovered that aggressive maintenance is unnecessary if you do a good job with your regular maintenance, and that if you have room for a second copy of a table, CLUSTER is almost always a better option than VACUUM FULL.

Hm - Kevin, when you issue CLUSTER on your table, when do you switch the users over and do you let them keep writing during the process? Or is the second copy table read-only to the users while the cluster-build is happening on the primary (making it unavailable)? Thanks - and thanks for the Sybase difference explanation.

Thanks again everyone for your time and your help. I really appreciate it.
-Jennifer

Windows Live™: Keep your life in sync. Check it out.

Re: Catching up Production from Warm Standby after maintenance - Please help

От

Scott Marlowe

Дата:

08 июля 2009 г., 02:17:48

On Tue, Jul 7, 2009 at 4:58 PM, Jennifer
Spencer<jenniferm411@hotmail.com> wrote:
> Hi -  Some answers, questions and comments below
>
>> So, what class machines are these, and specifically how much ram, what
>> kind of RAID controllers, and how many hard drives are you throwing at the
>> problem?
> We have two identical enterprise Linux machines coming.  They are not here
> yet (hence my planning concerns).  Presently, we have a lesser but still
> good enterprise machine with 64gb of RAM, 8 CPUs and ~1 TB of space.  The
> new ones coming have 5 TB each of RAID disks mirrored across to another 5TB
> (20 5.4gb spinning disks).  Not sure the RAID level but I think it's level
> six.  They each have 16 CPUs, and (I believe) 128 Gb of RAM, connected to
> our NFS network on a 10g switch (at least I think it's the 10 gigabit switch
> - it might be one gigabit).

Assuming that your db is mostly read, RAID-6+1 is ok, but see if you
can benchmark things with RAID-10 to be sure, as RAID-10 is generally
the best choice.  Also, get a battery backed cache if you can.

>>  I wonder, what do you use to decide when to reindex?  Under this
>> situation, it's very possible that you don't need to do it all that often.
>>  Are you just flying 'willy-nilly' about reindexing things, or is there some
>> indicator you use?
> I am hoping to use system statistics for the table(s).  In theory, once the
> index size gets to be a larger logical fraction of the table size than it
> reasonably should be, it's time to consider re-indexing.  I thought to use
> some combination of pg_stat_user_indexes info and pg_statio_all_indexes, but
> I haven't had to do this yet and if you have suggestions, that'd be
> helpful.  I should be able to run a check of things once/week and cron it to

Generally speaking vacuuming should take care of index and table bloat
to prevent this.

>>Define "maintenance".  Maybe you're bringing along a few more bad habits
>> from Sybase.
>  Oh, I am sure that's likely.  Heh.  I lock out the users, reindex things
> that need reindexing, based on use patterns and size, I update all
> histograms and statistics for user and system tables, I remove any dead rows
> possible
> (hopefully all of them),

depending on your usage patterns some dead space is actually desirable.

> and I make a nice clean tape set when I am done -
> before letting anyone back in.  I often cycle the server (pg_ctl start/stop)
> as well.  Sometimes we take that time to power cycle the whole machine
> and do any machine maintenance we can after the nice clean backup is done.

Good time to fsck the volumes as well.

>> If you're running a version prior to 8.4, make sure your Free Space
>> Manager settings are adequate. (A VACUUM VERBOSE at the database level will
>> report where you're at in that regard in the last few lines.)
> I show the following in postgresql.conf (this has not been tuned):
> # - Free Space Map -
> max_fsm_pages = 204800                  # min max_fsm_relations*16, 6 bytes
> each
>                                         # (change requires restart)
> #max_fsm_relations = 1000               # min 100, ~70 bytes each
>                                         # (change requires restart)

Both are probably way too low.  I'm guessing there's more than 1000
things in your db to vacuum.

Re: Catching up Production from Warm Standby after maintenance - Please help

От

"Kevin Grittner"

Дата:

08 июля 2009 г., 15:53:01

Scott Marlowe <scott.marlowe@gmail.com> wrote:
> Jennifer Spencer<jenniferm411@hotmail.com> wrote:

>> new ones coming have 5 TB each of RAID disks mirrored across to
>> another 5TB (20 5.4gb spinning disks).  Not sure the RAID level but
>> I think it's level six.  They each have 16 CPUs, and (I believe)
>> 128 Gb of RAM, connected to our NFS network on a 10g switch (at
>> least I think it's the 10 gigabit switch - it might be one
>> gigabit).

> get a battery backed cache if you can.

I'd say that with the rest of the hardware described, performance will
be very disappointing without a good RAID controller with battery
backed cache configured for write-back.  Also, file system (we use
XFS) and mount options (we use noatime and nobarrier), and elevator
(we use deadline) can make a big difference.

-Kevin

Re: Catching up Production from Warm Standby after maintenance - Please help

От

"Kevin Grittner"

Дата:

08 июля 2009 г., 16:02:30

Jennifer Spencer <jenniferm411@hotmail.com> wrote:

> Hm - Kevin, when you issue CLUSTER on your table, when do you switch
> the users over and do you let them keep writing during the process?
> Or is the second copy table read-only to the users while the
> cluster-build is happening on the primary (making it unavailable)?
> Thanks - and thanks for the Sybase difference explanation.

We only use CLUSTER occassionally, to recover from bloat caused by
exceptional circumstances (like mass deletes or a long-running
transaction from some will-behaved programmer.).  CLUSTER has to lock
the table for the duration, blocking other users.  We schedule this
off-hours.

I'm glad the info was useful.  Our move from Sybase was incredibly
smooth, due largely to help from these PostgreSQL lists; I figure I
owe it to the community to give back as I am able.  :-)

-Kevin

Re: Catching up Production from Warm Standby after maintenance - Please help

От

Chris Browne

Дата:

08 июля 2009 г., 16:44:21

jenniferm411@hotmail.com (Jennifer Spencer) writes:

>>> Am I wrong?
>>
>> Probably. My first concern is to make sure you aren't doing VACUUM
>> FULL as part of your maintenance cycle. That option is meant for
>> recovery from extreme bloat, and is sort of a "last resort".

> Good - glad to be wrong about that!  We do mostly inserts, no
> updates and very few deletes.  We drop entire tables but don't
> delete often.  We have very long rows, though.  Do you think the
> above is a situation likely to create extreme bloat? 

That kind of situation is one I'd not expect to lead to much, if any
bloat.

The usual cases that lead to bloat is where there are a lot of
updates/deletes.  That is the reason to expect to have pages used that
are nearly empty.

If it's nearly insert-only, then pages would only be nearly empty if
they are nearby pages that are completely full (and hence needed to be
split).  That doesn't seem like a major problem ;-).

> My Sybase experience with extreme bloat was that it was caused by a
> three-field clustered index in a very long short-row table over time
> (~a year).  This job doesn't use clustered indexes.

PostgreSQL doesn't have clustered indexes of that sort, so that
problem won't emerge ;-).

> I thought we had to do vacuum full to avoid transaction ID
> wraparound/reset issues?  We do have a lot of transactions, a whole
> lot.  Are you saying that most admins avoid VACUUM FULL as much as
> possible?  What about XID?

No, you don't need to use VACUUM FULL to avoid ID wraparound.  Plain,
NOT-FULL vacuum, which does not block things, handles that perfectly
well.  No need to relate FULL vacuum with that.
--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://linuxdatabases.info/info/advocacy.html
There's a  new language called C+++.   The only problem  is every time
you try to compile your modem disconnects.

Re: Catching up Production from Warm Standbyaftermaintenance - Please help

От

Michael Monnerie

Дата:

13 июля 2009 г., 07:27:13

On Dienstag 07 Juli 2009 Scott Whitney wrote:
> max_fsm_pages = 1500000                 # min max_fsm_relations*16, 6
> bytes each
> max_fsm_relations = 70000               # min 100, ~70 bytes each
>
> It's quite possible that those settings are too low, but I've
> honestly no idea how to estimate these other than trial and error.
> I've not had the time to really brush up on that level of tuning.

Isn't
# VACUUM ANALYZE VERBOSE ;
showing that info in the end? Or are those only for the current db, not
for the server overall?

INFO:  free space map contains 32526 pages in 68 relations
DETAIL:  A total of 31648 page slots are in use (including overhead).
31648 page slots are required to track all free space.
Current limits are:  150000 page slots, 1000 relations, using 984 kB.

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4

Вложения

signature.asc

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Catching up Production from Warm Standby after maintenance - Please help

Вложения