Обсуждение: duplicate primary index in bayes db from SpamAssassin

Поиск
Список
Период
Сортировка

duplicate primary index in bayes db from SpamAssassin

От
Michael Monnerie
Дата:
Dear list, I don't know whether pgsql-admin or the SA list is the
correct one to ask, so I start here. I have a bayes Postgres database
for SpamAssassin, which I use shared from 2 servers to have the same
content. I think this could be a locking issue or whatever, I don't
care. BUT: I have lots of "duplicate index" elements now, preventing a
restore or cleanup. Here's the dump and diff if anyone cares:
http://zmi.at/x/bayes_dup_idx.dump.bz2 (upload currently running)
http://zmi.at/x/bayes_dup_idx.diff.bz2

Could the problem be caused by any settings to the DB, or because of the
stored procedures? Normally there should be no way this can happen, so
what can cause this?

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net                   Key-ID: 1C1209B4

Вложения

Re: duplicate primary index in bayes db from SpamAssassin

От
Tom Lane
Дата:
Michael Monnerie <michael.monnerie@it-management.at> writes:
> care. BUT: I have lots of "duplicate index" elements now, preventing a
> restore or cleanup.

What PG version is this?

            regards, tom lane

Re: duplicate primary index in bayes db from SpamAssassin

От
Alvaro Herrera
Дата:
Michael Monnerie wrote:
> Dear list, I don't know whether pgsql-admin or the SA list is the
> correct one to ask, so I start here. I have a bayes Postgres database
> for SpamAssassin, which I use shared from 2 servers to have the same
> content. I think this could be a locking issue or whatever, I don't
> care.

What do you mean "shared from 2 servers"?  Are these two servers using a
shared network mount with the database files?  Or do you mean that these
two servers connect as clients to a single Postgres server?

What versions of things are all these running?  Please be as specific as
humanly possible.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: duplicate primary index in bayes db from SpamAssassin

От
Michael Monnerie
Дата:
On Montag, 14. April 2008 Tom Lane wrote:
> What PG version is this?

Sorry:
# select version();
                                                  version
------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.1.11 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
20061115 (prerelease) (SUSE Linux)

It's the RPM installed from openSUSE 10.2

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net                   Key-ID: 1C1209B4

Вложения

Re: duplicate primary index in bayes db from SpamAssassin

От
Michael Monnerie
Дата:
On Montag, 14. April 2008 Alvaro Herrera wrote:
> What do you mean "shared from 2 servers"?  Are these two servers
> using a shared network mount with the database files?  Or do you mean
> that these two servers connect as clients to a single Postgres
> server?

Two servers have SA running, and connect via different users (who have
the same access rights) to the same Postgres server running the bayes
db.

> What versions of things are all these running?  Please be as specific
> as humanly possible.

# select version();
                                                  version
------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.1.11 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
20061115 (prerelease) (SUSE Linux)

It's the RPM installed from openSUSE 10.2

The SA versions are 3.2.0 from one server and 3.2.4 from the other.

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net                   Key-ID: 1C1209B4

Вложения

Re: duplicate primary index in bayes db from SpamAssassin

От
Michael Monnerie
Дата:
On Montag, 14. April 2008 Tom Lane wrote:
> What PG version is this?

I've looked a little more into history, and PostgreSQL was 8.1.5
originally from CD, then 8.1.9 and now 8.1.11.

I'm currently investigating a way to delete duplicates from the table.
Is there any chance to do that? A
SELECT * from bayes_token;
shows no dups, as there's a primary index... at least it works here.

This is the bayes_token table and its index:
CREATE TABLE bayes_token (
    id integer DEFAULT 0 NOT NULL,
    token bytea DEFAULT ''::bytea NOT NULL,
    spam_count integer DEFAULT 0 NOT NULL,
    ham_count integer DEFAULT 0 NOT NULL,
    atime integer DEFAULT 0 NOT NULL
);
ALTER TABLE ONLY bayes_token
    ADD CONSTRAINT bayes_token_pkey PRIMARY KEY (id, token);

Could I create a new db without indices, restore the data there, and
then run a "delete from.. where duplicates"? Sometimes there's 3x the
same content in the primary index, but I'd need to delete all except
the one with the highest spam_count.

I just found I even have a duplicate in bayes_vars:

COPY bayes_vars (id, username, spam_count, ham_count, token_count,
last_expire, last_atime_delta, last_expire_reduce, oldest_token_age,
newest_token_age) FROM stdin;
1       vscan   194393  517531  2602114 1206030039      2764800 15304
1203265204      1206057801
1       vscan   194398  517535  2602799 1206030039      2764800 15304
1203265204      1206064729
4       vscan   18305   25403   2042983 1208143427      1382400 13268
1206501543      1208216540
\.

Definition:
CREATE TABLE bayes_vars (
    id serial NOT NULL,
    username character varying(200) DEFAULT ''::character varying NOT
NULL,
    spam_count integer DEFAULT 0 NOT NULL,
    ham_count integer DEFAULT 0 NOT NULL,
    token_count integer DEFAULT 0 NOT NULL,
    last_expire integer DEFAULT 0 NOT NULL,
    last_atime_delta integer DEFAULT 0 NOT NULL,
    last_expire_reduce integer DEFAULT 0 NOT NULL,
    oldest_token_age integer DEFAULT 2147483647 NOT NULL,
    newest_token_age integer DEFAULT 0 NOT NULL
);
ALTER TABLE ONLY bayes_vars
    ADD CONSTRAINT bayes_vars_pkey PRIMARY KEY (id);

But here, I could see the dups with SELECT, and even delete one record:

# select * from bayes_vars;
 id | username | spam_count | ham_count | token_count | last_expire |
last_atime_delta | last_expire_reduce | oldest_token_age |
newest_token_age

----+----------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------
  1 | vscan    |     194393 |    517531 |     2602114 |  1206030039 |
2764800 |              15304 |       1203265204 |       1206057801
  1 | vscan    |     194398 |    517535 |     2602799 |  1206030039 |
2764800 |              15304 |       1203265204 |       1206064729
  4 | vscan    |      18375 |     25828 |     2050196 |  1208229525 |
1382400 |              13268 |       1206501543 |       1208240610
(3 Zeilen)

bayes_pg_v1=# delete from bayes_vars where spam_count =194393;
DELETE 1
bayes_pg_v1=# select * from bayes_vars;
 id | username | spam_count | ham_count | token_count | last_expire |
last_atime_delta | last_expire_reduce | oldest_token_age |
newest_token_age

----+----------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------
  1 | vscan    |     194398 |    517535 |     2602799 |  1206030039 |
2764800 |              15304 |       1203265204 |       1206064729
  4 | vscan    |      18375 |     25829 |     2050215 |  1208229525 |
1382400 |              13268 |       1206501543 |       1208240637
(2 Zeilen)

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net                   Key-ID: 1C1209B4

Вложения

Re: duplicate primary index in bayes db from SpamAssassin

От
Michael Monnerie
Дата:
On Montag, 14. April 2008 Michael Monnerie wrote:
> > What do you mean "shared from 2 servers"?  Are these two servers
> > using a shared network mount with the database files?  Or do you
> > mean that these two servers connect as clients to a single Postgres
> > server?
>
> Two servers have SA running, and connect via different users (who
> have the same access rights) to the same Postgres server running the
> bayes db.

SA=SpamAssassin, I should possibly explain. There are stored procedures
in the DB which get called, could the problem be connected to this?

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net                   Key-ID: 1C1209B4

Вложения

Re: duplicate primary index in bayes db from SpamAssassin

От
Michael Monnerie
Дата:
On Montag, 14. April 2008 Michael Monnerie wrote:
> problem with duplicate primary index entries...

I just got a warning in my Log:

Apr 16 01:52:19 db1.zmi.at postgres[21103]: [3-1] DB=bayes_pg_v1_dupidx
U=postgres H=[local] 48053eaf.526f VACUUM:WARNUNG:
Index »bayes_token_pkey« enthält 2539650 Zeilenversionen, aber Tabelle
enthält 4752337 Zeilenversionen
(Index contains 2539650 rows, but table has 4752337 rows)

BTW: I know that the DB language cannot be changed without recreating,
but can I change the language of the log messages?

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net                   Key-ID: 1C1209B4

Вложения

Re: duplicate primary index in bayes db from SpamAssassin

От
Michael Monnerie
Дата:
On Mittwoch, 16. April 2008 Michael Monnerie wrote:
> problem with duplicate primary index entries...

I created the bayes db fresh, named it bayes_pg_v2 and re-relearned all
ham/spam from 3 concurrent connections into the db. No problem so far.

But I renamed the old, broken db and tried a
and got this:
INFO:  »bayes_token«: 20567514 entfernbare, 4752337 nicht entfernbare
Zeilenversionen in 186274 Seiten gefunden
DETAIL:  0 tote Zeilenversionen können noch nicht entfernt werden.
Nicht entfernbare Zeilenversionen sind zwischen 56 und 56 Bytes lang.
Es gibt 0 unbenutzte Itemzeiger.
Gesamter freier Platz (einschließlich entfernbare Zeilenversionen) ist
1154820852 Bytes.
33480 Seiten sind leer oder werden leer werden, einschließlich 0 am Ende
der Tabelle.
186149 Seiten mit 1154819352 freien Bytes sind mögliche Ziele zum
Verschieben.

It looks like autovacuum - which is on - could not be run, and the db
growed a lot, and now a VACUUM cannot finish because of the duplicate
index:
INFO:  Index »bayes_token_pkey« enthält 2539650 Zeilenversionen in 76273
Seiten
DETAIL:  0 Indexzeilenversionen wurde entfernt.
11727 Indexseiten wurden gelöscht, 11727 sind gegenwärtig
wiederverwendbar.
CPU 10.44s/2.49u sec elapsed 172.59 sec.
WARNUNG:  Index »bayes_token_pkey« enthält 2539650 Zeilenversionen, aber
Tabelle enthält 4752337 Zeilenversionen
TIP:  Bauen Sie den Index mit REINDEX neu.

But REINDEX doesn't work of course, because of the duplicates...

I'd like to drop that broken db, or should I keep it for analyzation?

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net                   Key-ID: 1C1209B4

Вложения

Re: duplicate primary index in bayes db from SpamAssassin

От
Tom Lane
Дата:
Michael Monnerie <michael.monnerie@it-management.at> writes:
> I'd like to drop that broken db, or should I keep it for analyzation?

Looking back over the thread, it seems that the problems you saw could
be explained by the VACUUM-related bugs we fixed in 8.1.9 and 8.1.10;
although obviously this theory requires assuming that the dups were
there before you updated to 8.1.11.  You didn't say much about time
frame so I don't know if that's plausible.

This latest bit with many fewer index entries than table entries is just
weird, though.  The 8.1.10 VACUUM bug could have led to having *more*
index entries than table entries, but not the other way around.

Before you zap the database, would you be willing to make the actual
database files (not a dump, but the stored files) available?  I'd
be interested to take a closer look at that index.

            regards, tom lane

Re: duplicate primary index in bayes db from SpamAssassin

От
Michael Monnerie
Дата:
On Donnerstag, 17. April 2008 Tom Lane wrote:
> Before you zap the database, would you be willing to make the actual
> database files (not a dump, but the stored files) available?  I'd
> be interested to take a closer look at that index.

Yes, could you please give me the quick hint how to find out which
SELECT I must do to find the correct directory to a database? I saw it
once, but... many other things got stored in my brain in between, so I
VACUUMed that information out ;-)

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net                   Key-ID: 1C1209B4

Вложения

Re: duplicate primary index in bayes db from SpamAssassin

От
Michael Monnerie
Дата:
On Donnerstag, 17. April 2008 Tom Lane wrote:
> This latest bit with many fewer index entries than table entries is
> just weird, though.  The 8.1.10 VACUUM bug could have led to having
> *more* index entries than table entries, but not the other way
> around.

Somehow I can think of it like that: SA didn't find user "vscan"
anymore, so created a new entry for it. And then it created all the new
learned ham/spam tokens again despite they were there already. Must
have to do with not finding it from the index or so.

It would be nice if pg_dump would FAIL on such a broken database. Like
this, I would have become informed about the problem. The bayes db
isn't quite interactive, and SA seems to ignore errors. The reason I
found it was that our SPAM filter started to let spam through which did
not happen for years (yes, we have *very hard* anti spam settings, our
customers love it that way - no spam). So I got aware that bayes got
stupid and then I found the mess...

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net                   Key-ID: 1C1209B4

Вложения

Re: duplicate primary index in bayes db from SpamAssassin

От
Tom Lane
Дата:
Michael Monnerie <michael.monnerie@it-management.at> writes:
> On Donnerstag, 17. April 2008 Tom Lane wrote:
>> Before you zap the database, would you be willing to make the actual
>> database files (not a dump, but the stored files) available? �I'd
>> be interested to take a closer look at that index.

> Yes, could you please give me the quick hint how to find out which
> SELECT I must do to find the correct directory to a database? I saw it
> once, but... many other things got stored in my brain in between, so I
> VACUUMed that information out ;-)

SELECT oid, datname FROM pg_database;

The stuff I need to look at is under $PGDATA/base/OID/.
Please also send the results of

SELECT relfilenode, relname FROM pg_class;

in the busted database --- that'll save trying to claw it out of
pg_class by hand.

            regards, tom lane

Re: duplicate primary index in bayes db from SpamAssassin

От
Tom Lane
Дата:
I wrote:
> This latest bit with many fewer index entries than table entries is just
> weird, though.  The 8.1.10 VACUUM bug could have led to having *more*
> index entries than table entries, but not the other way around.
> Before you zap the database, would you be willing to make the actual
> database files (not a dump, but the stored files) available?  I'd
> be interested to take a closer look at that index.

I finally got a chance to do some analysis, and what I find is multiple
instances of index corruption.  There are 18 index pages that are all
zeroes:

Bogus pd_special pointer on page 65663
Bogus pd_special pointer on page 65847
Bogus pd_special pointer on page 65932
Bogus pd_special pointer on page 65948
Bogus pd_special pointer on page 65956
Bogus pd_special pointer on page 65967
Bogus pd_special pointer on page 65974
Bogus pd_special pointer on page 66066
Bogus pd_special pointer on page 66456
Bogus pd_special pointer on page 66775
Bogus pd_special pointer on page 67979
Bogus pd_special pointer on page 68126
Bogus pd_special pointer on page 68501
Bogus pd_special pointer on page 68565
Bogus pd_special pointer on page 68854
Bogus pd_special pointer on page 70321
Bogus pd_special pointer on page 70326
Bogus pd_special pointer on page 75066

It is possible to have a zero page in an index, but only if a backend
extended the index by one page and then crashed before it could write
a WAL record about the added page.  Unless you've been having a whole
lot of PANIC crashes you didn't mention, there's something else going
on here.

I also found a bunch of places where the sibling links between pages
are wrong/inconsistent:

block 1174 has left link 27874, but block 27874 has right link 66918
block 1309 has right link 59817, but block 59817 has left link 46980
block 2476 has right link 27885, but block 27885 has left link 71959
block 4489 has right link 59819, but block 59819 has left link 61853
block 7591 has right link 25594, but block 25594 has left link 27816
block 20986 has right link 1174, but block 1174 has left link 27874
block 24616 has left link 1309, but block 1309 has right link 59817
block 25416 has left link 5155, but block 5155 has right link 59809
block 27484 has left link 4489, but block 4489 has right link 59819
block 27816 has left link 7591, but block 7591 has right link 25594
block 43322 has left link 2476, but block 2476 has right link 27885
block 59809 has right link 25416, but block 25416 has left link 5155
block 59810 has left link 15449, but block 15449 has right link 39432
block 59810 has right link 23744, but block 23744 has left link 40043
block 59811 has left link 10693, but block 10693 has right link 24259
block 59811 has right link 24259, but block 24259 has left link 10693
block 59812 has left link 17191, but block 17191 has right link 68457
block 59812 has right link 24704, but block 24704 has left link 68457
block 59813 has left link 20391, but block 20391 has right link 21998
block 59813 has right link 27337, but block 27337 has left link 52006
block 59814 has left link 6918, but block 6918 has right link 17551
block 59814 has right link 17551, but block 17551 has left link 6918
block 59815 has left link 16727, but block 16727 has right link 32141
block 59815 has right link 22466, but block 22466 has left link 32141

THe fact that blocks between 59809 and 59819 are involved in the
majority of these cases is interesting.  I'm not sure what to make
of it, but it suggests that some single event messed up a number of
block splits or deletions, rather than that there's an infrequent
bug in those operations.

There are also odd inconsistencies between the tree levels: there are
64142 blocks in tree level 0 but only 64136 downlinks in level 1, and
there are four places where a downlink's next key fails to match the
high key of the child page.  So this index has definitely suffered
multiple events where a split or deletion didn't work right.

On the whole I'm thinking that you may have an intermittent hardware
problem, especially since we've not seen comparable symptoms reported
by anyone else.

            regards, tom lane

Re: duplicate primary index in bayes db from SpamAssassin

От
Michael Monnerie
Дата:
On Mittwoch, 30. April 2008 Tom Lane wrote:
> I finally got a chance to do some analysis, and what I find is
> multiple instances of index corruption.  There are 18 index pages
> that are all zeroes:
> [snip]
> On the whole I'm thinking that you may have an intermittent hardware
> problem, especially since we've not seen comparable symptoms reported
> by anyone else.

We've had a bad crash once where the server couldn't write it's buffers.
The RAID controller also caches writes, and the server is in VMware,
which I guess adds to the problem as the whole client machine also has
caches and so on. I now remember we had a (very NOT important) database
that we simply deleted and recreated as it's only for a web-app without
data in it, but didn't see any other problems at that time.

So probably this is this kind of "should never ever happen" except your
machine goes bye-bye badly. What makes me a bit worried is that pg_dump
did not report an error at all, but made it's backup as if everything
is OK. Could that be changed maybe so that in case of a problem it at
least cries out loudly "broken db found" or so? Or even better it
should not create a backup file but say "to save this broken db please
recall pg_dump with --ignore-broken-db". We would have found it earlier
if the backup wouldn't have worked.
It's always good to have that kind of extra warnings, better than
oversee problems.

Thanks for your analyzation.

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net                   Key-ID: 1C1209B4

Вложения

Re: duplicate primary index in bayes db from SpamAssassin

От
Tom Lane
Дата:
Michael Monnerie <michael.monnerie@it-management.at> writes:
> So probably this is this kind of "should never ever happen" except your
> machine goes bye-bye badly. What makes me a bit worried is that pg_dump
> did not report an error at all, but made it's backup as if everything
> is OK. Could that be changed maybe so that in case of a problem it at
> least cries out loudly "broken db found" or so?

pg_dump intentionally does *not* look at the contents of indexes (at
least not any on user tables).  I think this is a good thing because
it makes it more likely that you can dump a damaged database.  pg_dump
is not supposed to be some kind of fsck-like tool.

            regards, tom lane

Re: duplicate primary index in bayes db from SpamAssassin

От
Michael Monnerie
Дата:
On Mittwoch, 7. Mai 2008 Tom Lane wrote:
> pg_dump intentionally does *not* look at the contents of indexes (at
> least not any on user tables).  I think this is a good thing because
> it makes it more likely that you can dump a damaged database.
>  pg_dump is not supposed to be some kind of fsck-like tool.

OK, I see that normally you should get that info out of the postgresql
log. The problem with this application is, that often it happens to try
an insert on already existing primary keys, which makes you overlook
the problem is real. Maybe something can be improved here, but I don't
know what would be the best way. Maybe a big log line like
YOUR DATABASE xxx SEEMS TO BE DAMAGED - PLEASE CHECK
such a line would jump into the eyes even on a big log, and it makes you
nervous enough to really check things.

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net                   Key-ID: 1C1209B4

Вложения