Обсуждение: Template zero xid issue

Поиск

Список

Период

Сортировка

Template zero xid issue

От

"Keaton Adams"

Дата:

06 августа 2007 г., 15:43:45

Our template 0 database has an XID issue:

postgres=# select datname, age(datfrozenxid) from pg_database;

datname | age

--------------+------------

postgres | 1073965753

template1 | 1073892377

template0 | 2146706522

(7 rows)

When we try a vacuum full, it skips template 0:

[root@p02c06d130 mxl]# /mxl/var/pgsql/bin/vacuumdb -a -z -U postgres

vacuumdb: vacuuming database "postgres"

VACUUM

vacuumdb: vacuuming database "template1"

VACUUM

Suggestions for how to fix this issue?

Thanks,

Keaton

Re: Template zero xid issue

От

Tom Lane

Дата:

06 августа 2007 г., 15:51:00

"Keaton Adams" <kadams@mxlogic.com> writes:
> Our template 0 database has an XID issue:

No, it doesn't.  template0 never needs vacuuming.

            regards, tom lane

Re: Template zero xid issue

От

"Keaton Adams"

Дата:

06 августа 2007 г., 16:14:27

Our production DB shut down because of it:

<2007-08-06 09:44:08 MDT>LOG:  background writer process (PID 24439)
exited with exit code 0
<2007-08-06 09:44:08 MDT>LOG:  terminating any other active server
processes
<2007-08-06 09:44:08 MDT>LOG:  all server processes terminated;
reinitializing
<2007-08-06 09:44:08 MDT>LOG:  database system was shut down at
2007-08-06 09:44:08 MDT
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>LOG:  checkpoint record is at 1A7/4C987A2C
<2007-08-06 09:44:08 MDT>LOG:  redo record is at 1A7/4C987A2C; undo
record is at 0/0; shutdown TRUE
<2007-08-06 09:44:08 MDT>LOG:  next transaction ID: 2146747335; next
OID: 677503791
<2007-08-06 09:44:08 MDT>LOG:  next MultiXactId: 2937; next
MultiXactOffset: 5901
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>LOG:  database system is ready
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>LOG:  transaction ID wrap limit is 2147484146,
limited by database "template0"
<2007-08-06 09:44:08 MDT>WARNING:  database "template0" must be vacuumed
within 736811 transactions
<2007-08-06 09:44:08 MDT>HINT:  To avoid a database shutdown, execute a
full-database VACUUM in "template0".
<2007-08-06 09:44:08 MDT>LOG:  shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>LOG:  database system is shut down



-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, August 06, 2007 9:51 AM
To: Keaton Adams
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Template zero xid issue

"Keaton Adams" <kadams@mxlogic.com> writes:
> Our template 0 database has an XID issue:

No, it doesn't.  template0 never needs vacuuming.

            regards, tom lane

Re: Template zero xid issue

От

"Keaton Adams"

Дата:

06 августа 2007 г., 16:26:19

So it is odd that over 14,000 XIDs are being consumed on template zero
in just over one minute?

bash-3.00$ date; psql -dpostgres -frun.sql
Mon Aug  6 10:18:12 MDT 2007
   datname    |    age
--------------+------------
 postgres     | 1074342864
 .
 .
 .
 template1    | 1074269488
 template0    | 1074078525
(7 rows)


bash-3.00$ date; psql -dpostgres -frun.sql
Mon Aug  6 10:19:27 MDT 2007
   datname    |    age
--------------+------------
 postgres     | 1074357079
 .
 .
 .
 template1    | 1074283703
 template0    | 1074092740
(7 rows)


What would be using template0 at this rate?

-Keaton



-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Keaton Adams
Sent: Monday, August 06, 2007 10:14 AM
To: Tom Lane
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Template zero xid issue

Our production DB shut down because of it:

<2007-08-06 09:44:08 MDT>LOG:  background writer process (PID 24439)
exited with exit code 0
<2007-08-06 09:44:08 MDT>LOG:  terminating any other active server
processes
<2007-08-06 09:44:08 MDT>LOG:  all server processes terminated;
reinitializing
<2007-08-06 09:44:08 MDT>LOG:  database system was shut down at
2007-08-06 09:44:08 MDT
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>LOG:  checkpoint record is at 1A7/4C987A2C
<2007-08-06 09:44:08 MDT>LOG:  redo record is at 1A7/4C987A2C; undo
record is at 0/0; shutdown TRUE
<2007-08-06 09:44:08 MDT>LOG:  next transaction ID: 2146747335; next
OID: 677503791
<2007-08-06 09:44:08 MDT>LOG:  next MultiXactId: 2937; next
MultiXactOffset: 5901
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>LOG:  database system is ready
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>LOG:  transaction ID wrap limit is 2147484146,
limited by database "template0"
<2007-08-06 09:44:08 MDT>WARNING:  database "template0" must be vacuumed
within 736811 transactions
<2007-08-06 09:44:08 MDT>HINT:  To avoid a database shutdown, execute a
full-database VACUUM in "template0".
<2007-08-06 09:44:08 MDT>LOG:  shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>FATAL:  the database system is shutting down
<2007-08-06 09:44:08 MDT>LOG:  database system is shut down



-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, August 06, 2007 9:51 AM
To: Keaton Adams
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Template zero xid issue

"Keaton Adams" <kadams@mxlogic.com> writes:
> Our template 0 database has an XID issue:

No, it doesn't.  template0 never needs vacuuming.

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: Template zero xid issue

От

Tom Lane

Дата:

06 августа 2007 г., 16:30:21

"Keaton Adams" <kadams@mxlogic.com> writes:
> Our production DB shut down because of it:

[ squint... ]  Which PG version is this exactly?  Is template0 marked as
datallowconn in pg_database, or not?

            regards, tom lane

Re: Template zero xid issue

От

"Keaton Adams"

Дата:

06 августа 2007 г., 16:36:20

Sorry, I should have mentioned that to begin with: psql 8.1.4


mxl=# select datname, datallowconn from pg_database;
   datname    | datallowconn
--------------+--------------
 postgres     | t
 .
 .
 .
 template1    | t
 template0    | f
(7 rows)

It's set to false.

mxl=# \c template0;
FATAL:  database "template0" is not currently accepting connections
Previous connection kept


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, August 06, 2007 10:30 AM
To: Keaton Adams
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Template zero xid issue

"Keaton Adams" <kadams@mxlogic.com> writes:
> Our production DB shut down because of it:

[ squint... ]  Which PG version is this exactly?  Is template0 marked as
datallowconn in pg_database, or not?

            regards, tom lane

Re: Template zero xid issue

От

Tom Lane

Дата:

06 августа 2007 г., 16:36:20

"Keaton Adams" <kadams@mxlogic.com> writes:
> So it is odd that over 14,000 XIDs are being consumed on template zero
> in just over one minute?

You misunderstand what you are looking at --- that's the cluster-wide
XID consumption rate, not any one database's.

            regards, tom lane

Re: Template zero xid issue

От

"Joshua D. Drake"

Дата:

06 августа 2007 г., 16:50:43

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Keaton Adams wrote:
> Sorry, I should have mentioned that to begin with: psql 8.1.4

1. Upgrade to the actual stable release, which is 8.1.9
2. IIRC there is a bug in 8.1.4 that causes template0 to increment when
it shouldn't. Update the template0 to allow connections, vacuum full,
dissallow connections, and then upgrade.

Sincerely,

Joshua D. Drake

>
>
> mxl=# select datname, datallowconn from pg_database;
>    datname    | datallowconn
> --------------+--------------
>  postgres     | t
>  .
>  .
>  .
>  template1    | t
>  template0    | f
> (7 rows)
>
> It's set to false.
>
> mxl=# \c template0;
> FATAL:  database "template0" is not currently accepting connections
> Previous connection kept
>
>
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Monday, August 06, 2007 10:30 AM
> To: Keaton Adams
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Template zero xid issue
>
> "Keaton Adams" <kadams@mxlogic.com> writes:
>> Our production DB shut down because of it:
>
> [ squint... ]  Which PG version is this exactly?  Is template0 marked as
> datallowconn in pg_database, or not?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>


- --

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGt1HXATb/zqfZUUQRAuOHAJ9jSKZXU9CzLWcgy2DYSRXLMTsqTQCgnw4V
jVsSNPjay0w6MHxGWsdu8no=
=oJhU
-----END PGP SIGNATURE-----

Re: Template zero xid issue

От

Tom Lane

Дата:

06 августа 2007 г., 16:51:17

"Keaton Adams" <kadams@mxlogic.com> writes:
> Sorry, I should have mentioned that to begin with: psql 8.1.4

Hmm, you know that's pretty old don't you?  The latest release in
that branch is 8.1.9 (and there'll be an 8.1.10 before long).

> It's set to false.

> mxl=3D# \c template0;
> FATAL:  database "template0" is not currently accepting connections
> Previous connection kept

That's what it should be, but nonconnectable databases are supposed to
be ignored in figuring the XID wraparound limit, so I don't see how you
got the warning mentioning template0.

Do you have autovacuum enabled?  I think there are some relevant
post-8.1.4 changes in autovac.

            regards, tom lane

Re: Template zero xid issue

От

Tom Lane

Дата:

06 августа 2007 г., 17:02:31

"Joshua D. Drake" <jd@commandprompt.com> writes:
> 2. IIRC there is a bug in 8.1.4 that causes template0 to increment when
> it shouldn't. Update the template0 to allow connections, vacuum full,
> dissallow connections, and then upgrade.

That should be "vacuum freeze" but otherwise this is the correct
recovery process.  However, before destroying the evidence I'd like to
find out how he got into this state ...

            regards, tom lane

Re: Template zero xid issue

От

"Keaton Adams"

Дата:

06 августа 2007 г., 18:20:38

We needed the DB up right away; one of the operations guys connected
through a standalone backend to vacuum template0 and now the DB is up
again:

"sudo -u postgres /mxl/var/pgsql/bin/postgres -D /mxl/var/pgsql/data -O
-P template0

    Once I got in, I just ran a VACUUM VERBOSE ANALYZE; to "reset"
the XID counter."

So just to be clear on this statement:

"You misunderstand what you are looking at --- that's the cluster-wide
XID consumption rate, not any one database's."

In the docs it says that "the value stored in the datfrozenxid field is
the freeze cutoff XID that was used by the last database-wide vacuum
operation. All normal XIDs older than this cutoff XID are guaranteed to
have been replaced by FrozenXID within that database." The query to
check is:

select datname, age(datfrozenxid) from pg_database;

"The age column measures the number of transactions from the cutoff XID
to the current transaction's XID.  When the age approaches two billion,
the database must be vacuumed again to avoid risk of wraparound
failures."

So, if XID is database instance (cluster-wide), that would explain why
the actual increment of the age value is the same across all DB's, even
though the actual XID numbers are different per db?

bash-3.00$ date; psql -dpostgres -frun.sql
Mon Aug  6 11:20:03 MDT 2007
   datname    |    age
--------------+------------
 postgres     | 1075005030
 mxl          | 1075004446
 .
 .
 .
 template1    | 1074931654
 template0    | 1074740691
(7 rows)

bash-3.00$ date; psql -dpostgres -frun.sql
Mon Aug  6 11:23:53 MDT 2007
   datname    |    age
--------------+------------
 postgres     | 1075046568
 mxl          | 1075045984
 .
 .
 .
 template1    | 1074973192
 template0    | 1074782229
(7 rows)

Postgres increment: 41,538
Mxl increment: 41,538
Template 1 increment: 41,538
Template 0 increment: 41,538

But, the actual reset of the FrozenXID needs to take place at the
individual DB level, and that's where the full-db vacuum comes into
play, correct?

And the bug in 8.1.4 is allowing the FrozenXID on template0 to increment
when it shouldn't be doing so, and since the normal vacuum ignores a
read-only db such as template0, that is why we had to force a vacuum
after the DB shut down?

Do I have this right?

Thanks again,

Keaton

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, August 06, 2007 11:02 AM
To: Joshua D. Drake
Cc: Keaton Adams; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Template zero xid issue

"Joshua D. Drake" <jd@commandprompt.com> writes:
> 2. IIRC there is a bug in 8.1.4 that causes template0 to increment
when
> it shouldn't. Update the template0 to allow connections, vacuum full,
> dissallow connections, and then upgrade.

That should be "vacuum freeze" but otherwise this is the correct
recovery process.  However, before destroying the evidence I'd like to
find out how he got into this state ...

            regards, tom lane

Re: Template zero xid issue

От

Tom Lane

Дата:

06 августа 2007 г., 19:11:13

"Keaton Adams" <kadams@mxlogic.com> writes:
> Do I have this right?

It's not clear.  You didn't answer the question about whether you were
running autovac, but even if you were I'm not sure why you got the
complaint about template0.  AFAICS the only way for template0 to be
installed as oldest_datname is if vac_truncate_clog() is invoked in
template0, allowing it to be the initial selection before the loop
starts.  Which could happen if autovac is on, since it will try to
vacuum template0, and in 8.1.4 it neglects to use the FREEZE option.
But even then, if we just finished a DB-wide vacuum on template0,
it should not have had the oldest datfrozenxid; and certainly not one
old enough to provoke complaints.

Actually there's a whole lot that isn't clear about this.  Your first
report showed a warning --- not an error --- about template0, so it's
not at all clear that the "database system is shutting down" messages
had anything to do with that at all.  And you later posted some queries
that weren't drawing any such message.  What was the exact sequence of
events here?

Alvaro: you have any thoughts about this?  I'm suspicious that we ought
to change 8.1's vac_truncate_clog() to not include the current database
automatically, since it might be not-datallowconn; but I don't see how
that potential bug could actually be exposed as a fault.

            regards, tom lane

Re: Template zero xid issue

От

"Keaton Adams"

Дата:

06 августа 2007 г., 19:49:12

Auto vacuum is not enabled:

#-----------------------------------------------------------------------
----
# AUTOVACUUM PARAMETERS
#-----------------------------------------------------------------------
----

#autovacuum = off                       # enable autovacuum subprocess?
#autovacuum_naptime = 60                # time between autovacuum runs,
in secs
#autovacuum_vacuum_threshold = 1000     # min # of tuple updates before
                                        # vacuum
#autovacuum_analyze_threshold = 500     # min # of tuple updates before
                                        # analyze
#autovacuum_vacuum_scale_factor = 0.4   # fraction of rel size before
                                        # vacuum
#autovacuum_analyze_scale_factor = 0.2  # fraction of rel size before
                                        # analyze
#autovacuum_vacuum_cost_delay = -1      # default vacuum cost delay for
                                        # autovac, -1 means use
                                        # vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1      # default vacuum cost limit for
                                        # autovac, -1 means use
                                        # vacuum_cost_limit

Instead, the vacuumdb command is run on the DBs to perform a full DB
vacuum every Saturday. The DB template0 is skipped by the vacuumdb
process, according to the output:

/xyz/var/pgsql/bin/vacuumdb -a -z -U postgres
vacuumdb: vacuuming database "postgres"
VACUUM
vacuumdb: vacuuming database "mxl"
VACUUM
vacuumdb: vacuuming database "template1"
VACUUM

1. Ops has a job that monitors datfrozenxid. An event was triggered this
morning because the count on template0 was approaching the limit.  I
don't have the original output, but it looked something like this:

mxl=# select datname, age(datfrozenxid) from pg_database;
   datname    |    age
--------------+------------
 postgres     | 1478038975
 mxl_survey   | 1478038263
 mxl          | 1478036889
 template1    | 1478010730
 template0    | 2146747335

2. As we were attempting to figure out why this was the case, Postgres
stopped accepting commands, as reported in the log:

<2007-08-06 09:42:50 MDT>ERROR:  database is not accepting commands to
avoid wraparound data loss in database "template0"
<2007-08-06 09:42:50 MDT>HINT:  Stop the postmaster and use a standalone
backend to vacuum database "template0".

3. One of the admins initiated a shutdown request to the DB.  The
shutdown was NOT automatically done by Postgres as I had originally
stated, but it became necessary nonetheless:

<2007-08-06 09:44:05 MDT>LOG:  received fast shutdown request
<2007-08-06 09:44:05 MDT>LOG:  aborting any active transactions
<2007-08-06 09:44:05 MDT>FATAL:  terminating connection due to
administrator command
<2007-08-06 09:44:05 MDT>FATAL:  terminating connection due to
administrator command
<2007-08-06 09:44:05 MDT>FATAL:  terminating connection due to
administrator command

4. During the shutdown we continued to receive these messages in the
log:

<2007-08-06 09:44:08 MDT>LOG:  transaction ID wrap limit is 2147484146,
limited by database "template0"
<2007-08-06 09:44:08 MDT>WARNING:  database "template0" must be vacuumed
within 736811 transactions
<2007-08-06 09:44:08 MDT>HINT:  To avoid a database shutdown, execute a
full-database VACUUM in "template0".

5. Once the DB was down, a standalone backend was used to execute a
vacuum against template0:

sudo -u postgres /mxl/var/pgsql/bin/postgres -D /mxl/var/pgsql/data -O
-P template0

>VACUUM VERBOSE ANALYZE;

6. The DB started successfully and normal operations resumed.

We do have a second 8.1.4 DB running the same type of transaction load
and the datfrozenxid query shows template0 having a larger 'age' value
than any of the other databases in the cluster:

postgres=# select datname, age(datfrozenxid) from pg_database;
   datname    |    age
--------------+------------
 postgres     | 1108184564
 mxl          | 1108182478
 template1    | 1108156319
 template0    | 1497605405

So before it grows into the 2.1 billion value we'll need to do a manual
reset of the FrozenXID on this DB as well.

Of course, we are looking into an 8.1.9 upgrade to avoid this problem in
the future.

Thanks,

Keaton

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, August 06, 2007 1:11 PM
To: Keaton Adams
Cc: Joshua D. Drake; Alvaro Herrera; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Template zero xid issue

"Keaton Adams" <kadams@mxlogic.com> writes:
> Do I have this right?

It's not clear.  You didn't answer the question about whether you were
running autovac, but even if you were I'm not sure why you got the
complaint about template0.  AFAICS the only way for template0 to be
installed as oldest_datname is if vac_truncate_clog() is invoked in
template0, allowing it to be the initial selection before the loop
starts.  Which could happen if autovac is on, since it will try to
vacuum template0, and in 8.1.4 it neglects to use the FREEZE option.
But even then, if we just finished a DB-wide vacuum on template0,
it should not have had the oldest datfrozenxid; and certainly not one
old enough to provoke complaints.

Actually there's a whole lot that isn't clear about this.  Your first
report showed a warning --- not an error --- about template0, so it's
not at all clear that the "database system is shutting down" messages
had anything to do with that at all.  And you later posted some queries
that weren't drawing any such message.  What was the exact sequence of
events here?

Alvaro: you have any thoughts about this?  I'm suspicious that we ought
to change 8.1's vac_truncate_clog() to not include the current database
automatically, since it might be not-datallowconn; but I don't see how
that potential bug could actually be exposed as a fault.

            regards, tom lane

Re: Template zero xid issue

От

Tom Lane

Дата:

06 августа 2007 г., 22:11:14

"Keaton Adams" <kadams@mxlogic.com> writes:
> Auto vacuum is not enabled:

Hmph.  That shoots down the one semi-plausible idea I had about how you
got into this state.  It seems that the unexpected limit_datname value
must have come from flatfiles.c, but that *certainly* won't pick up
any database with datallowconn = false.  Is it possible that someone
was manually tinkering with the content of pg_database.datallowconn?

I see several other hard-to-explain aspects about the log excerpts you
sent, notably the key warning itself:

> <2007-08-06 09:44:08 MDT>WARNING:  database "template0" must be vacuumed
> within 736811 transactions

8.1 was designed to clamp down *hard* at 1000000 transactions from the
wraparound point, so how did you get to consume almost 300000 more
transactions beyond that limit?

Could you send me the whole postmaster log from the point of the first
sign of trouble, rather than just excerpts?  I suspect there might be
useful evidence you omitted.

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Template zero xid issue