Re: pg_upgrade bug found!

Поиск
Список
Период
Сортировка
От bricklen
Тема Re: pg_upgrade bug found!
Дата
Msg-id BANLkTim6w+tX9mBRGtLqDfGxsgmuJJvbBQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_upgrade bug found!  (bricklen <bricklen@gmail.com>)
Ответы Re: pg_upgrade bug found!  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On Fri, Apr 8, 2011 at 7:20 PM, bricklen <bricklen@gmail.com> wrote:
> On Fri, Apr 8, 2011 at 7:11 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> bricklen,
>>
>> * bricklen (bricklen@gmail.com) wrote:
>>> I looked deeper into our backup archives, and it appears that I do
>>> have the clog file reference in the error message "DETAIL:  Could not
>>> open file "pg_clog/04BE": No such file or directory."
>>
>> Great!  And there's no file in pg_clog which matches that name (or
>> exist which are smaller in value), right?
>>
>>> It exists in an untouched backup directory that I originally made when
>>> I set up the backup and ran pg_upgrade. I'm not sure if it is from
>>> version 8.4 or 9.0.2 though. Is it safe to just copy it into my
>>> production pg_clog dir and restart?
>>
>> It should be, provided you're not overwriting any files or putting a
>> clog file in place which is greater than the other clog files in that
>> directory.
>
> It appears that there are no files lower.
>
> Missing clog: 04BE
>
> production pg_clog dir:
> ls -lhrt 9.0/data/pg_clog
> total 38M
> -rw------- 1 postgres postgres 256K Jan 25 21:04 04BF
> -rw------- 1 postgres postgres 256K Jan 26 12:35 04C0
> -rw------- 1 postgres postgres 256K Jan 26 20:58 04C1
> -rw------- 1 postgres postgres 256K Jan 27 13:02 04C2
> -rw------- 1 postgres postgres 256K Jan 28 01:00 04C3
> ...
>
> old backup pg_clog dir (possibly v8.4)
> ...
> -rw------- 1 postgres postgres 256K Jan 23 21:11 04BB
> -rw------- 1 postgres postgres 256K Jan 24 08:56 04BC
> -rw------- 1 postgres postgres 256K Jan 25 06:32 04BD
> -rw------- 1 postgres postgres 256K Jan 25 10:58 04BE
> -rw------- 1 postgres postgres 256K Jan 25 20:44 04BF
> -rw------- 1 postgres postgres 8.0K Jan 25 20:54 04C0
>
>
> So, if I have this right, my steps to take are:
> - copy the backup 04BE to production pg_clog dir
> - restart the database
> - run Bruce's script
>
> Does that sound right? Has anyone else experienced this? I'm leery of
> testing this on my production db, as our last pg_dump was from early
> this morning, so I apologize for being so cautious.
>
> Thanks,
>
> Bricklen

What I've tested and current status:

When I saw the announcement a few hours ago, I started setting up a
9.0.3 hot standby. I brought it live a few minutes ago.
- I copied over the 04BE clog from the original backup,
- restarted the standby cluster
- ran the script against the main database
and turned up a bunch of other transactions that were missing:

psql:pg_upgrade_tmp.sql:539: ERROR:  could not access status of
transaction 1248683931
DETAIL:  Could not open file "pg_clog/04A6": No such file or directory.

psql:pg_upgrade_tmp.sql:540: ERROR:  could not access status of
transaction 1249010987
DETAIL:  Could not open file "pg_clog/04A7": No such file or directory.

psql:pg_upgrade_tmp.sql:541: ERROR:  could not access status of
transaction 1250325059
DETAIL:  Could not open file "pg_clog/04A8": No such file or directory.

psql:pg_upgrade_tmp.sql:542: ERROR:  could not access status of
transaction 1252759918
DETAIL:  Could not open file "pg_clog/04AA": No such file or directory.

psql:pg_upgrade_tmp.sql:543: ERROR:  could not access status of
transaction 1254527871
DETAIL:  Could not open file "pg_clog/04AC": No such file or directory.

psql:pg_upgrade_tmp.sql:544: ERROR:  could not access status of
transaction 1256193334
DETAIL:  Could not open file "pg_clog/04AD": No such file or directory.

psql:pg_upgrade_tmp.sql:556: ERROR:  could not access status of
transaction 1268739471
DETAIL:  Could not open file "pg_clog/04B9": No such file or directory.

I checked, and found that each one of those files exists in the
original backup location.

- scp'd those files to the hot standby clog directory,
- pg_ctl stop -m fast
- pg_ctl start
- ran the script

Hit a bunch of missing clog file errors like above, repeated the scp +
bounce + script process 4 or 5 more times until no more missing clog
file messages surfaced.

Now, is this safe to run against my production database?

**Those steps again, to run against prod:

cp the clog files from the original backup to dir to my production pg_clog dir
bounce the database
run the script against all database in the cluster

Anyone have any suggestions or changes before I commit myself to this
course of action?

Thanks,

Bricklen


В списке pgsql-hackers по дате отправления:

Предыдущее
От: bricklen
Дата:
Сообщение: Re: pg_upgrade bug found!
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: pg_upgrade bug found!