Обсуждение: Vacuum return codes (vacuum as db integrity check?)

Поиск
Список
Период
Сортировка

Vacuum return codes (vacuum as db integrity check?)

От
"Stephane Charette"
Дата:
General PostgreSQL administration question:

I'm trying to use "vacuumdb" as a way to perform a database integrity
check before starting my db-enabled application.  I used to run it as
"vacuumdb foo" and if the return code was anything other than zero, I
knew the database was most likely hosed.  (At which point I would
destroy everything and reload from backup.)

Now I've made a change and I find myself using the command "vacuumdb
-f foo" to perform a full vacuum.  However, I've noticed that vacuum
full seems to return non-zero return values much more often.  (Meaning
the database is corrupt?)  Is this known?  Are the possible return
values published somewhere without having to dig through all of the
source code?

(This is PostgrSQL v7.2 compiled for Linux, running on an older kernel
2.2.x.)

BTW -- is this the "right" thing to do for a database integrity check?
Is there a better way to do this than a vacuum?

Stephane Charette


Re: Vacuum return codes (vacuum as db integrity check?)

От
Tom Lane
Дата:
"Stephane Charette" <stephanecharette@telus.net> writes:
> Now I've made a change and I find myself using the command "vacuumdb
> -f foo" to perform a full vacuum.  However, I've noticed that vacuum
> full seems to return non-zero return values much more often.

This is not a very useful statement.  What would be useful is to see the
error messages you are getting.  (Look in the postmaster log, if your
script is discarding stderr.)

            regards, tom lane

Re: Vacuum return codes (vacuum as db integrity check?)

От
"Stephane Charette"
Дата:
To re-state the reason behind my posting:

>>[...]
>>I'm trying to use "vacuumdb" as a way to perform a database integrity
>>check before starting my db-enabled application.
>>[...]

>>Now I've made a change and I find myself using the command "vacuumdb
>>-f foo" to perform a full vacuum.  However, I've noticed that vacuum
>>full seems to return non-zero return values much more often.

...to which Tom Lane (thank you!) replied:

>This is not a very useful statement.  What would be useful is to see the
>error messages you are getting.  (Look in the postmaster log, if your
>script is discarding stderr.)

Unfortunately, I don't have the situation repro'd at the very moment.
I will look the next time this happens.  However, I do have some other
related questions:

1) Where is the postmaster log kept?  I've looked, but I'm cannot seem
to find it.

2) Does a non-zero return value from vacuumdb necessarily mean the
database is corrupted?

3) If vacuumdb returns non-zero, does it mean we should blow away the
database and restore from latest backup?  Or is there a typical "fixdb"
application that people run?

4) Are the exit codes in src/bin/pgsql/settings.h the only exit codes
that can be used in the source?  Or are there others defined in another
file somewhere else?

5) Is vacuumdb the "right" thing to do for a database integrity check?

6) Is there a better way to do a database integrity check than a
vacuumdb?

And perhaps most importantly,

7) When power fails on a box that is running a 7.2 PostgreSQL database,
do *you* ever find that the database is corrupt when the box comes back
up?

(A few technical details:  db version is 7.2; Linux kernel 2.2.14;
postmaster is running with the following command-line:
"/usr/local/pgsql/bin/postmaster -D /foo/data -S"; haven't changed any
of the default tuning values in the p*.conf files.)

Thanks in advance for any assistance you can provide,

Stephane Charette


Re: Vacuum return codes (vacuum as db integrity check?)

От
Tom Lane
Дата:
"Stephane Charette" <stephanecharette@telus.net> writes:
> 1) Where is the postmaster log kept?  I've looked, but I'm cannot seem
> to find it.

I'm talking about the postmaster's stderr output.  Look to see what your
postmaster startup script does with stderr ... if it routes it to
/dev/null, you'll need to change the script.

> 2) Does a non-zero return value from vacuumdb necessarily mean the
> database is corrupted?

Impossible to tell without seeing the error messages.

> 5) Is vacuumdb the "right" thing to do for a database integrity check?

Not particularly.  I'd consider a successful pg_dumpall run to be a more
thorough check (or at least an equally good, and quite different, one).

> 7) When power fails on a box that is running a 7.2 PostgreSQL database,
> do *you* ever find that the database is corrupt when the box comes back
> up?

IIRC, there were some problems with sequences going backwards after a
crash in 7.2.  If you are on 7.2.x for x < 3, you are running a version
with serious known bugs; you should update before complaining too much
...

            regards, tom lane

Re: Vacuum return codes (vacuum as db integrity check?)

От
"Stephane Charette"
Дата:
Tom Lane wrote:

>IIRC, there were some problems with sequences going backwards after a
>crash in 7.2.  If you are on 7.2.x for x < 3, you are running a version
>with serious known bugs; you should update before complaining too much
>...

Crazy question, but how do I determine the version number?

If I run "postmaster --version", I get told "postmaster (PostgreSQL)
7.2".

Does that really mean "7.2"?  We were under the impression that we'd
upgraded these boxes to 7.2.1 earlier this summer, so we're wondering
if the minor version number is perhaps not being displayed.  All of the
shell scripts in .../pgsql/bin/ contain the word "7.2" -- no mention of
minor version numbers such as "7.2.0", "7.2.1", etc.

(Next time we'll skip the binaries and download & compile the source
ourself so we know for certain what we're using!)

Best regards,

Stephane Charette



Re: Vacuum return codes (vacuum as db integrity check?)

От
Tom Lane
Дата:
"Stephane Charette" <stephanecharette@telus.net> writes:
> If I run "postmaster --version", I get told "postmaster (PostgreSQL)
> 7.2".

> Does that really mean "7.2"?

Yes, it does.  You could double-check with "select version()" though.

            regards, tom lane

Uninitialized pages notices

От
Timothy D McKernan
Дата:
Our client is running a large-ish database, and has infrequently been
getting the following messages:

Nov  6 04:17:09 dolidb-n1 logger: NOTICE:  Rel pg_type: Uninitialized
page 6 - fixing
Nov  6 04:17:09 dolidb-n1 logger: NOTICE:  Rel pg_type: Uninitialized
page 7 - fixing
Nov  6 04:17:09 dolidb-n1 logger: NOTICE:  Rel pg_type: Uninitialized
page 8 - fixing
Nov  6 04:17:09 dolidb-n1 logger: NOTICE:  Rel pg_type: Uninitialized
page 9 - fixing
Nov  6 04:17:09 dolidb-n1 logger: NOTICE:  Rel pg_type: Uninitialized
page 10 - fixing

This goes on for about 2000 lines.  A google search found Tom Lane
telling someone with this symptom that they might have a serious
hardware problem.  However, _that_ person was getting these messages on
a consistent basis, whereas we haven't seen this message in about 4
months, and then all of a sudden the other night during a nightly vacuum
it popped up.

I'd like to know what this means to our system - what causes an
uninitialized page?  What could cause it to be so infrequent?

Here are our system details:
postgre: 7.2.3 (~4GB in size)
os: Redhat 7.1sbe (Seawolf)
kernel: 2.4.9-12smp
ram: ~1GB ECC

Thanks,
Tim


--
Dyrect Media Group
P.O. Box 486
6000 Goodrich Rd
Clarence Center, NY  14032-0486

OFFICE: 716-504-1141 ext 208
CELL: 716-510-2451
AIM: somecallmetim100


Re: Uninitialized pages notices

От
Tom Lane
Дата:
Timothy D McKernan <tdm4@dyrectmedia.com> writes:
> Nov  6 04:17:09 dolidb-n1 logger: NOTICE:  Rel pg_type: Uninitialized
> page 6 - fixing

> I'd like to know what this means to our system - what causes an
> uninitialized page?  What could cause it to be so infrequent?

"Uninitialized" means "page contains zeroes" (or at least a few critical
page-header fields contain zeroes, which they should never do).  Usually
I take this as an indication of hardware problems.  But:

> Here are our system details:
> postgre: 7.2.3 (~4GB in size)
> os: Redhat 7.1sbe (Seawolf)
> kernel: 2.4.9-12smp
> ram: ~1GB ECC

SMP?  Are you actually using SMP hardware?  I seem to recall that the
2.4 Linux kernels weren't stable on SMP machines till 2.4.15 or
thereabouts.  In any case, RedHat 7.1 is pretty long in the tooth.
Perhaps an OS update would make life better.

            regards, tom lane

Re: Vacuum return codes (vacuum as db integrity check?)

От
Timothy D McKernan
Дата:
Our client is running a large-ish database, and has infrequently been
getting the following messages:

Nov  6 04:17:09 dolidb-n1 logger: NOTICE:  Rel pg_type: Uninitialized
page 6 - fixing
Nov  6 04:17:09 dolidb-n1 logger: NOTICE:  Rel pg_type: Uninitialized
page 7 - fixing
Nov  6 04:17:09 dolidb-n1 logger: NOTICE:  Rel pg_type: Uninitialized
page 8 - fixing
Nov  6 04:17:09 dolidb-n1 logger: NOTICE:  Rel pg_type: Uninitialized
page 9 - fixing
Nov  6 04:17:09 dolidb-n1 logger: NOTICE:  Rel pg_type: Uninitialized
page 10 - fixing

This goes on for about 2000 lines.  A google search found Tom Lane
telling someone with this symptom that they might have a serious
hardware problem.  However, _that_ person was getting these messages on
a consistent basis, whereas we haven't seen this message in about 4
months, and then all of a sudden the other night during a nightly vacuum
it popped up.

I'd like to know what this means to our system - what causes an
uninitialized page?  What could cause it to be so infrequent?

Here are our system details:
postgre: 7.2.3 (~4GB in size)
os: Redhat 7.1sbe (Seawolf)
kernel: 2.4.9-12smp
ram: ~1GB ECC

Thanks,
Tim

--
Dyrect Media Group
P.O. Box 486
6000 Goodrich Rd
Clarence Center, NY  14032-0486

OFFICE: 716-504-1141 ext 208
CELL: 716-510-2451
AIM: somecallmetim100



Re: Vacuum return codes (vacuum as db integrity check?)

От
"Stephane Charette"
Дата:
>> Now I've made a change and I find myself using the command "vacuumdb
>> -f foo" to perform a full vacuum.  However, I've noticed that vacuum
>> full seems to return non-zero return values much more often.
>
>This is not a very useful statement.  What would be useful is to see the
>error messages you are getting.  (Look in the postmaster log, if your
>script is discarding stderr.)

Ok -- I redirected stdout/stderr for postmaster and vacuumdb, and the
problem finally re-occurred a few minutes ago.

This is what happens:

- database is started using "postmaster -D /foo >/tmp/dblog 2>&1 &"

- pg_dumpall is run which results in a return code of zero (Tom Lane
mentionned a few days ago that pg_dumpall might be a better "database
integrity check" than running vacuumdb)

- vacuumdb is started using "vacuumdb -f log >/tmp/vacuumdblog 2>&1"
- vacuumdb results in a return code of 1!  (I'm testing how valid our
previous attempts at "integrity checks" might be)

The error received when we run "vacuumdb" is:

-> ERROR:  Cannot insert a duplicate key into unique index
pg_class_oid_index
-> vacuumdb: vacuum  log failed

The error logged by postmaster is exactly the same:

-> ERROR:  Cannot insert a duplicate key into unique index
pg_class_oid_index

Now in the past, when vacuumdb returns non-zero return codes, we've
considered the databasebase to be hosed, and thus, would blow it away
and rebuild it.  Many days of data would sometimes be lost.  However,
pg_dumpall shows that we seem to still have access to the data.

Looking up "vacuum cannot insert a duplicate key into unique index" on
usenet returns quite a few postings, but no many suggestions or
solutions.

In our case, we are using 7.2 on a linux 2.2.14 kernel.  Binaries were
downloaded via RPM directly from RedHat.

My questions now would be:

1) How serious is the vacuumdb error?
2) How do we fix it?
3) Is the database hosed?

Thanks in advance,

Stephane Charette


Re: Vacuum return codes (vacuum as db integrity check?)

От
Tom Lane
Дата:
"Stephane Charette" <stephanecharette@telus.net> writes:
> The error received when we run "vacuumdb" is:

> -> ERROR:  Cannot insert a duplicate key into unique index
> pg_class_oid_index

That's pretty interesting.  My first thought is a corrupt index.
Please try reindexing pg_class.  You'll need to do this in a standalone
backend --- read the REINDEX reference page carefully.

> In our case, we are using 7.2 on a linux 2.2.14 kernel.  Binaries were
> downloaded via RPM directly from RedHat.

7.2?  You should be using 7.2.3.

            regards, tom lane

Re: Uninitialized pages notices

От
Timothy D McKernan
Дата:
Tom Lane wrote:
> Timothy D McKernan <tdm4@dyrectmedia.com> writes:
>
>>Nov  6 04:17:09 dolidb-n1 logger: NOTICE:  Rel pg_type: Uninitialized
>>page 6 - fixing
>
>
>>I'd like to know what this means to our system - what causes an
>>uninitialized page?  What could cause it to be so infrequent?
>
>
> "Uninitialized" means "page contains zeroes" (or at least a few critical
> page-header fields contain zeroes, which they should never do).  Usually
> I take this as an indication of hardware problems.  But:
>
>
>>Here are our system details:
>>postgre: 7.2.3 (~4GB in size)
>>os: Redhat 7.1sbe (Seawolf)
>>kernel: 2.4.9-12smp
>>ram: ~1GB ECC
>
>
> SMP?  Are you actually using SMP hardware?  I seem to recall that the
> 2.4 Linux kernels weren't stable on SMP machines till 2.4.15 or
> thereabouts.  In any case, RedHat 7.1 is pretty long in the tooth.
> Perhaps an OS update would make life better.

Yes, we have a dual processor board.  Per your suggestion we'll be
upgrading the system soon and watching the results of that.

Thanks,
Tim


>
>             regards, tom lane
>


--
Dyrect Media Group
P.O. Box 486
6000 Goodrich Rd
Clarence Center, NY  14032-0486

OFFICE: 716-504-1141 ext 208
CELL: 716-510-2451
AIM: somecallmetim100