Re: [GENERAL] pg_upgrade problem

Поиск
Список
Период
Сортировка
От daveg
Тема Re: [GENERAL] pg_upgrade problem
Дата
Msg-id 20110906001650.GI24583@sonic.net
обсуждение исходный текст
Ответ на Re: [GENERAL] pg_upgrade problem  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: [GENERAL] pg_upgrade problem
Список pgsql-hackers
Sorry I missed your reply, catching up now.

On Wed, Aug 31, 2011 at 09:56:59PM -0400, Bruce Momjian wrote:
> daveg wrote:
> > On Mon, Aug 29, 2011 at 07:49:24PM +0200, hubert depesz lubaczewski wrote:
> > > On Mon, Aug 29, 2011 at 06:54:41PM +0200, hubert depesz lubaczewski wrote:
> > > vacuumdb: vacuuming of database "etsy_v2" failed: ERROR:  could not access status of transaction 3429738606
> > > DETAIL:  Could not open file "pg_clog/0CC6": No such file or directory.
> > > 
> > > Interestingly.
> > > 
> > > In old dir there is pg_clog directory with files:
> > > 0AC0 .. 0DAF (including 0CC6, size 262144)
> > > but new pg_clog has only:
> > > 0D2F .. 0DB0
> > > 
> > > File content - nearly all files that exist in both places are the same, with exception of 2 newest ones in new
datadir:
> > > 3c5122f3e80851735c19522065a2d12a  0DAF
> > > 8651fc2b9fa3d27cfb5b496165cead68  0DB0
> > > 
> > > 0DB0 doesn't exist in old, and 0DAF has different md5sum: 7d48996c762d6a10f8eda88ae766c5dd
...
> > I had this same thing happen this Saturday just past and my client had to
> > restore the whole 2+ TB instance from the previous days pg_dumps.
...
> > After running pg_upgrade apparently successfully and analyzeing all the

Update: reviewing the logs I see some of the analyzes hit the "could not
access status of transaction" error too.

> > tables we restarted the production workload and started getting errors:
> > 
> > 2011-08-27 04:18:34.015  12337  c06  postgres  ERROR:  could not access status of transaction 2923961093
> > 2011-08-27 04:18:34.015  12337  c06  postgres  DETAIL:  Could not open file "pg_clog/0AE4": No such file or
directory.
> > 2011-08-27 04:18:34.015  12337  c06  postgres  STATEMENT:  analyze public.b_pxx;
> > 
> > On examination the pg_clog directory contained on two files timestamped
> > after the startup of the new cluster with 9.0.4. Other hosts that upgraded
> > successfully had numerous files in pg_clog dating back a few days. So it
> > appears that all the clog files went missing during the upgrade somehow.
> > a
> > This happened upgrading from 8.4.7 to 9.0.4, with a brief session in between
> > at 8.4.8. We have upgraded several hosts to 9.0.4 successfully previously.
> 
> I have posted this fix to the hackers email list, but I found it only
> affected old 8.3 servers, not old 8.4.X, so I am confused by your bug
> report.
> 
> I have tested 8.4.X to 9.0.4 and found pg_upgrade preserves toast
> relfrozenxids properly in that case.
> 
> Can you tell me what table is showing this error?  Does it happen during
> vacuum?  Can you run a vacuum verbose to see what it is throwing the
> error on?  Thanks.

This was upgrading from 8.4.8 to 9.0.4. I don't have the running cluster
anymore, but I do have tar.gz archives of it and could probably find
2TB free somewhere to restore it to if there is something useful to extract.

However, I don't think this was toast related. Most of our rows are short and 
have only int, float, and short text columns. These errors hit over 60
different tables mostly during the analyzes we ran immediately after the
upgrade. It also hit during select, insert and delete statements. We did not
run the db more than a few minutes as the damage was so extensive.

As far as I can tell pg_upgrade never copied any pg_clog files from the
old cluster to the new cluster. I wish I had detected that before running
the remove_old_cluster.sh script.

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Fetter
Дата:
Сообщение: Re: Couple document fixes
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: [GENERAL] pg_upgrade problem