Обсуждение: pg_dumpall problem - duplicated users

Поиск
Список
Период
Сортировка

pg_dumpall problem - duplicated users

От
Przemysław Nogaj
Дата:
Hi,

few days ago we made vacuum of all databases on our server. After that
we canot dump databases:

# pg_dumpall -i -U postgres > test.sql
pg_dump: query to obtain list of schemas failed: ERROR:  More than one
tuple returned by a subselect used as an expression.
pg_dumpall: pg_dump failed on database "alibi", exiting


There are duplicated system users, pg_catalogs, pg_temp1, pg_toast and
public...


template1=> select * from pg_catalog.pg_shadow where usesysid = 1;
 usename  | usesysid | usecreatedb | usesuper | usecatupd
|               passwd                | valuntil | useconfig
----------+----------+-------------+----------+-----------+-------------------------------------+----------+-----------
 postgres |        1 | t           | t        | t
|                                     |          |
 postgres |        1 | t           | t        | t         |
md5c084502ed11efa9d3d96d29717a5e555 |          |
(2 rows)


Any suggestions how to solve the problem? There are many databases on
the server we can't loose. Applications can connect to databases - all
works fine, hoewer we must make dumps...


Thanks for any help,
Greetings

Przemek

Re: pg_dumpall problem - duplicated users

От
"Bartek GRUbY"
Дата:
Hi,
i have similar problem - can't make dumps of any db with the same error message (and all applications using this pg
serverworks fine) . But in my case, i think, only user 'postgres' is duplicated: 

template1=> select * from pg_catalog.pg_shadow where usename = 'postgres';
 usename  | usesysid | usecreatedb | usesuper | usecatupd |               passwd                | valuntil | useconfig
----------+----------+-------------+----------+-----------+-------------------------------------+----------+-----------
 postgres |        1 | t           | t        | t         |                                     |          |
 postgres |        1 | t           | t        | t         | md5c084502ed11efa9d3d96d29717a5e555 |          |
(2 rows)

template1=> select * from pg_namespace ;
  nspname   | nspowner | nspacl
------------+----------+--------
 pg_catalog |        1 | {=U}
 pg_toast   |        1 | {=}
 public     |        1 | {=UC}
 pg_temp_1  |        1 |
(4 rows)

And every change i made to user postgres has effect to the one with passwd set. I can delete only him too. Someone can
help?Im running out of time so every advice will be appreciated. Maybe someone know how to delete whole pg_shadow table
andcreate users on db with no users at all (cause pg_shadow will be empty :). 

Thanks for any (and quick :) help,
greetings,

Bartek

Re: pg_dumpall problem - duplicated users

От
Tom Lane
Дата:
"Bartek GRUbY" <grubby@epf.pl> writes:
> i have similar problem - can't make dumps of any db with the same error message (and all applications using this pg
serverworks fine) . But in my case, i think, only user 'postgres' is duplicated: 

> template1=> select * from pg_catalog.pg_shadow where usename = 'postgres';
>  usename  | usesysid | usecreatedb | usesuper | usecatupd |               passwd                | valuntil |
useconfig
>
----------+----------+-------------+----------+-----------+-------------------------------------+----------+-----------
>  postgres |        1 | t           | t        | t         |                                     |          |
>  postgres |        1 | t           | t        | t         | md5c084502ed11efa9d3d96d29717a5e555 |          |
> (2 rows)

What PG version are you using exactly?  Could we see the system columns
for those rows, ie,

    select ctid,xmin,xmax,cmin,xmax,* from pg_shadow where usename = 'postgres';

            regards, tom lane

Re: pg_dumpall problem - duplicated users

От
Tom Lane
Дата:
Bartosz Nowak <grubby@go2.pl> writes:
> postgres=> select ctid,xmin,xmax,cmin,xmax,* from pg_shadow where
> usename = 'postgres';
>  ctid  |    xmin    | xmax  | cmin  | xmax  | usename  | usesysid |
> usecreatedb | usesuper | usecatupd |               passwd
> | valuntil | useconfig
>
--------+------------+-------+-------+-------+----------+----------+-------------+----------+-----------+-------------------------------------+----------+-----------


>  (0,1) |          1 | 50469 | 50469 | 50469 | postgres |        1 |
> t           | t        | t         |
> |          |
> (1,25) | 2559800612 |     0 |     0 |     0 | postgres |        1 |
> t           | t        | t         | md5c084502ed11efa9d3d96d29717a5e555
> |          |
> (2 rows)

Hmm --- clearly, that second xmin is corrupt.  I'd venture that this
is the result of a rolled-back (crashed?) ALTER USER SET PASSWORD
operation, in which somehow the new tuple's xmin got clobbered ... and
more than likely, its XMIN_COMMITTED bit got set at the same time.

You could manually delete either row, probably better to zap the second
one:
    delete from pg_shadow where ctid = '(1,25)';
and then things should be OK.  A dump and reload wouldn't be a bad idea
though, since there may be other corruption elsewhere that you haven't
noticed yet.

            regards, tom lane

Re: pg_dumpall problem - duplicated users

От
GRUbY
Дата:
Tom Lane napisał(a):

>"Bartek GRUbY" <grubby@epf.pl> writes:
>
>
>>i have similar problem - can't make dumps of any db with the same error message (and all applications using this pg
serverworks fine) . But in my case, i think, only user 'postgres' is duplicated: 
>>
>>
>
>
>
>>template1=> select * from pg_catalog.pg_shadow where usename = 'postgres';
>> usename  | usesysid | usecreatedb | usesuper | usecatupd |               passwd                | valuntil |
useconfig

>>----------+----------+-------------+----------+-----------+-------------------------------------+----------+-----------
>> postgres |        1 | t           | t        | t         |                                     |          |
>> postgres |        1 | t           | t        | t         | md5c084502ed11efa9d3d96d29717a5e555 |          |
>>(2 rows)
>>
>>
>
>What PG version are you using exactly?  Could we see the system columns
>for those rows, ie,
>
>    select ctid,xmin,xmax,cmin,xmax,* from pg_shadow where usename = 'postgres';
>
>            regards, tom lane
>
>
I'm using exactly  PostgreSQL 7.3.9 on i686-pc-linux-gnu, compiled by
GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3) :]

If it helps:

postgres=> select ctid,xmin,xmax,cmin,xmax,* from pg_shadow where
usename = 'postgres';
  ctid  |    xmin    | xmax  | cmin  | xmax  | usename  | usesysid |
usecreatedb | usesuper | usecatupd |               passwd
| valuntil | useconfig

--------+------------+-------+-------+-------+----------+----------+-------------+----------+-----------+-------------------------------------+----------+-----------
  (0,1) |          1 | 50469 | 50469 | 50469 | postgres |        1 |
t           | t        | t         |
|          |
 (1,25) | 2559800612 |     0 |     0 |     0 | postgres |        1 |
t           | t        | t         | md5c084502ed11efa9d3d96d29717a5e555
|          |
(2 rows)


If U have any ideas or more questions please write. Like I said, my
deadline is near :[ Now i'm thinking about writing a script to backup
all data and schemas from all dbs by myself and reinstallin PG... but it
is a lot of work and i cant check if i did everything right for sure :(

Greetings,
Bartek


Re: pg_dumpall problem - duplicated users

От
Tom Lane
Дата:
Bartosz Nowak <grubby@go2.pl> writes:
> Tom Lane napisa�(a):
>> You could manually delete either row, probably better to zap the second
>> one:
>> delete from pg_shadow where ctid = '(1,25)';

> Heh... i wish it was that easy - i tried it already :] When i delete the
> 'second' postgres user (with passwd set) PG is acting like there is no
> postgres account at all:
> ...
> And i cannot modify row of 'first' postgres user (without passwd set)
> with or without 'second' present:
> mw=> delete from pg_shadow where ctid = '(0,1)' ;
> DELETE 0

Ugh.  That's looking more like you have a transaction ID wraparound
problem.  How long has it been since pg_shadow was last vacuumed?

You could try a "VACUUM FREEZE pg_shadow" and see if the rows act any
more normally after that.  (Better take a filesystem-level backup
first, so you can get out of it if that makes things worse.)

            regards, tom lane

Re: pg_dumpall problem - duplicated users

От
GRUbY
Дата:
Tom Lane napisał(a):

> Bartosz Nowak <grubby@go2.pl> writes:
>
>
>> postgres=> select ctid,xmin,xmax,cmin,xmax,* from pg_shadow where
>> usename = 'postgres';
>> ctid  |    xmin    | xmax  | cmin  | xmax  | usename  | usesysid |
>> usecreatedb | usesuper | usecatupd |
>> passwd                | valuntil | useconfig
>>
--------+------------+-------+-------+-------+----------+----------+-------------+----------+-----------+-------------------------------------+----------+-----------

>>
>
>
>
>
>> (0,1) |          1 | 50469 | 50469 | 50469 | postgres |        1 |
>> t           | t        | t
>> |                                     |          |
>> (1,25) | 2559800612 |     0 |     0 |     0 | postgres |        1 |
>> t           | t        | t         |
>> md5c084502ed11efa9d3d96d29717a5e555 |          |
>> (2 rows)
>>
>
>
> Hmm --- clearly, that second xmin is corrupt.  I'd venture that this
> is the result of a rolled-back (crashed?) ALTER USER SET PASSWORD
> operation, in which somehow the new tuple's xmin got clobbered ... and
> more than likely, its XMIN_COMMITTED bit got set at the same time.
>
> You could manually delete either row, probably better to zap the second
> one:
>     delete from pg_shadow where ctid = '(1,25)';
> and then things should be OK.  A dump and reload wouldn't be a bad idea
> though, since there may be other corruption elsewhere that you haven't
> noticed yet.
>
>             regards, tom lane
>
>
Heh... i wish it was that easy - i tried it already :] When i delete the
'second' postgres user (with passwd set) PG is acting like there is no
postgres account at all:

pg_dumpall -i -U mw > test.sql
pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  pg_class_aclcheck: invalid
user id 1
pg_dump: The command was: SELECT (SELECT usename FROM pg_user WHERE
usesysid = datdba) as dba, pg_encoding_to_char(encoding) as encoding,
datpath FROM pg_database WHERE datname = 'alibi'
pg_dumpall: pg_dump failed on database "alibi", exiting

pg_dumpall -i -U postgres > test.sql
pg_dumpall: could not connect to database "template1": FATAL:  user
"postgres" does not exist

psql -U postgres
psql: FATAL:  user "postgres" does not exist

And i cannot modify row of 'first' postgres user (without passwd set)
with or without 'second' present:

mw=> delete from pg_shadow where ctid = '(0,1)' ;
DELETE 0

mw=> update pg_catalog.pg_shadow set passwd='test' where
usename='postgres' and ctid = '(0,1)';
UPDATE 0

So further help will be welcomed :) And thank U for trying to help me.

Greetings,
Bartek

Re: pg_dumpall problem - duplicated users

От
GRUbY
Дата:
Tom Lane napisał(a):

>Bartosz Nowak <grubby@go2.pl> writes:
>
>
>>Tom Lane napisał(a):
>>
>>
>>>You could manually delete either row, probably better to zap the second
>>>one:
>>>delete from pg_shadow where ctid = '(1,25)';
>>>
>>>
>
>
>
>>Heh... i wish it was that easy - i tried it already :] When i delete the
>>'second' postgres user (with passwd set) PG is acting like there is no
>>postgres account at all:
>>...
>>And i cannot modify row of 'first' postgres user (without passwd set)
>>with or without 'second' present:
>>mw=> delete from pg_shadow where ctid = '(0,1)' ;
>>DELETE 0
>>
>>
>
>Ugh.  That's looking more like you have a transaction ID wraparound
>problem.  How long has it been since pg_shadow was last vacuumed?
>
>You could try a "VACUUM FREEZE pg_shadow" and see if the rows act any
>more normally after that.  (Better take a filesystem-level backup
>first, so you can get out of it if that makes things worse.)
>
>            regards, tom lane
>
>
We vacuum whole base regularly, and lately it was about week ago (then
the problem occured). Since pg_dump don't work we are making
filesystem-level backups, and still im experimenting on local copy of
this db for safety.

VACUUM FREEZE pg_shadow changed the pg_shadow table:

select ctid,xmin,xmax,cmin,xmax,* from pg_shadow where usename = 'postgres';
 ctid  | xmin | xmax  | cmin  | xmax  | usename  | usesysid |
usecreatedb | usesuper | usecatupd |               passwd
| valuntil | useconfig

-------+------+-------+-------+-------+----------+----------+-------------+----------+-----------+-------------------------------------+----------+-----------
 (0,1) |    1 | 50469 | 50469 | 50469 | postgres |        1 |
t           | t        | t  |
|          |
 (1,4) |    2 |     0 |     0 |     0 | postgres |        1 |
t           | t        | t  | md5c084502ed11efa9d3d96d29717a5e555
|          |
(2 rows)

But rows still act unnormally :[, i.e.:

update pg_catalog.pg_shadow set passwd='test' where usename='postgres'
and ctid = '(0,1)';
UPDATE 0
delete from pg_shadow where ctid = '(0,1)' ;
DELETE 0
delete from pg_shadow where usesysid = 1 and passwd !=
'md5c084502ed11efa9d3d96d29717a5e555' ;
DELETE 0
delete from pg_shadow where usesysid = 1 ;
DELETE 1
select ctid,xmin,xmax,cmin,xmax,* from pg_shadow where usename =
'postgres';  ctid  | xmin | xmax  | cmin  | xmax  | usename  | usesysid
| usecreatedb | usesuper | usecatupd | passwd | valuntil | useconfig

-------+------+-------+-------+-------+----------+----------+-------------+----------+-----------+--------+----------+-----------
 (0,1) |    1 | 50469 | 50469 | 50469 | postgres |        1 |
t           | t        | t  |        |          |
(1 row)

update pg_catalog.pg_shadow set passwd='test' where usename='postgres' ;
UPDATE 0

pg_dumpall -i -U mw > test.sql
pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  pg_class_aclcheck: invalid
user id 1
pg_dump: The command was: select (select usename from pg_user where
usesysid = datdba) as dba, encoding, datpath from pg_database where
datname = 'alibi'
pg_dumpall: pg_dump failed on alibi, exiting

Greetings,
Bartek

--
Pozdrawiam,
GRUbY


Re: pg_dumpall problem - duplicated users

От
Tom Lane
Дата:
Bartosz Nowak <grubby@go2.pl> writes:
>   Tom Lane napisa�(a):
>>> Yeah. We need to look at the contents of the infomask for these rows.

>  Item   1 -- Length:   95  Offset: 8096 (0x1fa0)  Flags: USED
>   XID: min (1)  CMIN|XMAX: 50469  CMAX|XVAC: 0
>   Block Id: 0  linp Index: 29   Attributes: 8   Size: 24
>   infomask: 0x0501 (HASNULL|XMIN_COMMITTED|XMAX_COMMITTED)
>   t_bits: [0]: 0x1f

>   1fa0: 01000000 25c50000 00000000 00000000  ....%...........
>   1fb0: 1d000800 0105181f 706f7374 67726573  ........postgres

Well, you are definitely suffering from transaction ID wraparound.
Other rows in the table bear XIDs as high as 2559800779, so your
current transaction counter is at least that much.  Transaction 50469,
which obsoleted this row, was a *long* time ago.

The reason that this row is suddenly being seen as valid, rather than
deleted, is that the XID counter has wrapped around and so 50469
is now seen as "in the future" not "in the past" --- that is, the code
thinks the row is deleted, but by a concurrent transaction that started
later than our own transaction.  So it's valid, but you can't delete it
because it's already deleted.

I can't see any evidence of data corruption.  I think you just forgot
to vacuum pg_shadow for a very long time.  This row should have been
physically deleted long ago --- but evidently VACUUM was never run
across pg_shadow, until it was too late.

While you could perhaps manually repair this one row with a hex editor,
I don't think that will get you out of trouble.  The odds are good that
there are similar problems elsewhere, including both system catalogs and
user tables (ie, if you weren't vacuuming pg_shadow, what else weren't
you vacuuming?).

I'm not sure that you are going to be able to get out of this without
losing data.  Here's what I'd try (but bear in mind that I just made up
this procedure on the spot):

1. Stop the postmaster and take a physical dump of the $PGDATA tree,
if you don't have one already.  This will at least let you get back to
where you are now if the rest doesn't work.

2. Restart the postmaster, but don't let anyone else in (for safety
I'd suggest modifying pg_hba.conf to guarantee this).  You're going
to be down for a little bit :-(

3. Do a VACUUM FREEZE (database-wide) in *each* database, or at least
each one you care about recovering.  *Don't* use FULL.

4. Stop the postmaster again.  Use pg_controldata to note the current
transaction counter ("latest checkpoint's NextXID").  Then use
pg_resetxlog with the -x switch to deliberately back up the XID counter
to something less than 2 billion.

5. Restart the postmaster, and try to pg_dump everything.  Also do
whatever sanity tests you can think of --- look for missing and
duplicated rows in particular.

6. If you think you have a good dump, initdb and reload the dump,
and you're back in business.  If it didn't work, well, revert to
your last backup :-(

After you get out of your immediate problem, you had better look at
your vacuuming procedures and find out why you are in this fix.  See
http://www.postgresql.org/docs/7.3/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

            regards, tom lane

Re: pg_dumpall problem - duplicated users

От
Tom Lane
Дата:
Bartosz Nowak <grubby@go2.pl> writes:
> By the way: isn't that situation a bug in PG? What about a situation
> when this row was obsoleted just before XID counter wrap around and
> there is no way to do vacuum between those facts? The same situation
> would appear.

No, because XID space is circular: you always have exactly 2 billion
transactions before any newly-created row wraps around.  (I strongly
suggest you read the manual page I pointed you to.)  The bug is that
PG didn't force you to vacuum the table within that time.  8.1 will have
code that warns increasingly stridently and eventually shuts down the
server before wraparound can occur...

> Why should i note the current trans. counter?

See below.  Also, if it's not something a bit past 2 billion then this
whole theory is wrong and you don't want to proceed.

> And shouldn't i set XID counter to the max possible value (2^32 = over 4 billion) to avoid some transactions to be
seenas "in the future" ? 

No.  The VACUUM FREEZE will replace every XID in the "immediate past" 2
billion transactions with FrozenXid, which will make that stuff safe
against backing up the XID counter, and then you want to back up so that
the oldest transactions are seen as "in the past" again.  You need to
use a starting XID enough less than 2 billion that you have time to
validate and dump the data before those oldest transactions go out of
range again ... but not more than 2 billion less than where it is when
you shut down, else you will have some unfrozen rows "in the future".

At least that's my theory about how to get out of this.  Like I said,
it's untried.

            regards, tom lane