Обсуждение: Aborted Transaction During Vacuum

Поиск
Список
Период
Сортировка

Aborted Transaction During Vacuum

От
"G. Anthony Reina"
Дата:
I've been getting a strange error from the vacuum command. When I try
'vacuum verbose analyze' the vacuum goes through the tables fine until
just after finishing one particular table. Then I get the error:

NOTICE: AbortTransaction and not in in-progress state
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
  before or while processing the request.
 
We have lost the connection to the backend, so further processing is
impossible. Terminating.


When I try to vacuum the tables individually, I get no problems with
aborted backends.

Does anyone know what is going on here?

-Tony

PostgreSQL 6.5.1 on RH Linux 6.0, PII/400 MHz, 512 Meg RAM, 18 Gig HD





Re: [HACKERS] Aborted Transaction During Vacuum

От
Tom Lane
Дата:
"G. Anthony Reina" <reina@nsi.edu> writes:
> I've been getting a strange error from the vacuum command. When I try
> 'vacuum verbose analyze' the vacuum goes through the tables fine until
> just after finishing one particular table. Then I get the error:

> NOTICE: AbortTransaction and not in in-progress state
> pqReadData() -- backend closed the channel unexpectedly.

Interesting.  Is there any additional message appearing in the
postmaster log?  Is a core file being generated?  (look in the
data/base/ subdirectory for the database in question)  If there
is a corefile, a debugger backtrace from it would be helpful.
        regards, tom lane


Re: [HACKERS] Aborted Transaction During Vacuum

От
"G. Anthony Reina"
Дата:
Tom Lane wrote:

>
> Interesting.  Is there any additional message appearing in the
> postmaster log?  Is a core file being generated?  (look in the
> data/base/ subdirectory for the database in question)  If there
> is a corefile, a debugger backtrace from it would be helpful.
>
>                         regards, tom lane

Tom,
   I tried the 'vacuum verbose analyze' again today. I get the same error
with the AbortTransaction. There is a core file generated but no pg_vlock
file. The core is over 1 Gig in size (38 Megs gzipped) so I'm not so sure
you'd want to get that  (you can have it if you want it though!). Plus,
there seems to be nothing written to the postmaster.log file (I re-started
the postmaster before the vacuum using 'nohup postmaster -i -B 15000 -o -F
> postmaster.log&').
   Oliver Elphick (Oliver.Elphick@lfix.co.uk) told me that this sounds
like like a bug you patched for him a few days ago:

> It does sound very like the bug that was found a few days back where the

> pg_vlock file gets deleted by  a mistaken call to vc_abort().  This only

> gets called in the analyze code.

We're probably talking about the same bug. When I try just 'vacuum verbose'
without the analyze, the vacuum completes just fine. So it must be within
the analyze code.

-Tony




Re: [HACKERS] Aborted Transaction During Vacuum

От
Tom Lane
Дата:
"G. Anthony Reina" <reina@nsi.edu> writes:
>     I tried the 'vacuum verbose analyze' again today. I get the same error
> with the AbortTransaction. There is a core file generated but no pg_vlock
> file. The core is over 1 Gig in size (38 Megs gzipped) so I'm not so sure
> you'd want to get that  (you can have it if you want it though!).

It wouldn't do me any good anyway without a copy of the executable and a
copy of gdb built for whatever platform you are on.  I was hoping you
could run gdb on the corefile there and just send the backtrace output
("gdb postgres-executable core-file" and then say "bt").

>     Oliver Elphick (Oliver.Elphick@lfix.co.uk) told me that this sounds
> like like a bug you patched for him a few days ago:

I doubt it's the same bug --- the error message generated by the other
bug was different...
        regards, tom lane


Re: [HACKERS] Aborted Transaction During Vacuum

От
"Oliver Elphick"
Дата:
Tom Lane wrote: >>     Oliver Elphick (Oliver.Elphick@lfix.co.uk) told me that this sounds >> like like a bug you
patchedfor him a few days ago: > >I doubt it's the same bug --- the error message generated by the other >bug was
different...

The error message I posted was the log output at level 3, which he doesn't
seem to be running; if you switch to logging at that level, look for
messages that mention pg_vlock.

As far as I recall, the only error message he has listed so far is from
psql, which merely records the unlink failure at the end of the vacuum.
Everything else looks remarkably like the vc_abort bug.

--      Vote against SPAM: http://www.politik-digital.de/spam/                ========================================
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver              PGP key from public servers; key
ID32B8FAA1                ========================================    "Watch ye therefore, and pray always, that ye may
be     accounted worthy to escape all these things that shall     come to pass, and to stand before the Son of man."
                             Luke 21:36 
 




Re: [HACKERS] Aborted Transaction During Vacuum

От
Tom Lane
Дата:
"Oliver Elphick" <olly@lfix.co.uk> writes:
> As far as I recall, the only error message he has listed so far is from
> psql, which merely records the unlink failure at the end of the vacuum.
> Everything else looks remarkably like the vc_abort bug.

But the vc_abort problem didn't cause a backend coredump --- it just
reported an error and failed to complete the vacuum, no?
        regards, tom lane


Re: [HACKERS] Aborted Transaction During Vacuum

От
"Oliver Elphick"
Дата:
Tom Lane wrote: >"Oliver Elphick" <olly@lfix.co.uk> writes: >> As far as I recall, the only error message he has listed
sofar is from >> psql, which merely records the unlink failure at the end of the vacuum. >> Everything else looks
remarkablylike the vc_abort bug. > >But the vc_abort problem didn't cause a backend coredump --- it just >reported an
errorand failed to complete the vacuum, no? >
 

I got a coredump too; I never mentioned it, because I found the proximate
cause and curing that made it go away.  When unlink failed in vc_shutdown,
it called ELOG and a segfault occurred a little later.

I forgot about that when your patch fixed the original problem.

--      Vote against SPAM: http://www.politik-digital.de/spam/                ========================================
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver              PGP key from public servers; key
ID32B8FAA1                ========================================    "Watch ye therefore, and pray always, that ye may
be     accounted worthy to escape all these things that shall     come to pass, and to stand before the Son of man."
                             Luke 21:36 
 




Re: [HACKERS] Aborted Transaction During Vacuum

От
"G. Anthony Reina"
Дата:
Tom Lane wrote:

> But the vc_abort problem didn't cause a backend coredump --- it just
> reported an error and failed to complete the vacuum, no?
>
>                         regards, tom lane

Here's the error message again:

NOTICE:  --Relation ex_ellipse_proc--
NOTICE:  --Relation ex_ellipse_proc--
NOTICE:  Pages 2545: Changed 0, Reapped 0, Empty 0, New 0; Tup 30535: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 660, MaxLen 660; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 1/0 sec.
NOTICE:  Pages 2545: Changed 0, Reapped 0, Empty 0, New 0; Tup 30535: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 660, MaxLen 660; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 1/0 sec.
NOTICE:  Index pkex_ellipse_proc: Pages 138; Tuples 30535. Elapsed 0/0 sec.
NOTICE:  Index pkex_ellipse_proc: Pages 138; Tuples 30535. Elapsed 0/0 sec.
NOTICE:  --Relation ex_ellipse_cell--
NOTICE:  --Relation ex_ellipse_cell--
NOTICE:  Pages 370: Changed 0, Reapped 0, Empty 0, New 0; Tup 6109: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 80, MaxLen 2736; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 0/0 sec.
NOTICE:  Pages 370: Changed 0, Reapped 0, Empty 0, New 0; Tup 6109: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 80, MaxLen 2736; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 0/0 sec.
NOTICE:  Index pkex_ellipse_cell: Pages 42; Tuples 6109. Elapsed 0/0 sec.
NOTICE:  Index pkex_ellipse_cell: Pages 42; Tuples 6109. Elapsed 0/0 sec.
NOTICE:  --Relation ex_ellipse_opto--
NOTICE:  --Relation ex_ellipse_opto--
NOTICE:  Pages 26356: Changed 0, Reapped 0, Empty 0, New 0; Tup 71475: Vac
0, Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 1760, MaxLen 6108; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 3/2 sec.
NOTICE:  Pages 26356: Changed 0, Reapped 0, Empty 0, New 0; Tup 71475: Vac
0, Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 1760, MaxLen 6108; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 3/2 sec.
NOTICE:  Index pkex_ellipse_opto: Pages 357; Tuples 71475. Elapsed 0/0 sec.
NOTICE:  Index pkex_ellipse_opto: Pages 357; Tuples 71475. Elapsed 0/0 sec.
NOTICE:  --Relation ex_ellipse_opto_proc--
NOTICE:  --Relation ex_ellipse_opto_proc--
NOTICE:  Pages 14742: Changed 0, Reapped 0, Empty 0, New 0; Tup 30535: Vac
0, Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 1816, MaxLen 5900; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 1/1 sec.
NOTICE:  Pages 14742: Changed 0, Reapped 0, Empty 0, New 0; Tup 30535: Vac
0, Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen 1816, MaxLen 5900; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail.
Pages 0/0. Elapsed 1/1 sec.
NOTICE:  Index pkex_ellipse_opto_proc: Pages 138; Tuples 30535. Elapsed 0/0
sec.
NOTICE:  Index pkex_ellipse_opto_proc: Pages 138; Tuples 30535. Elapsed 0/0
sec.
ERROR:  vacuum: can't destroy lock file!
NOTICE:  AbortTransaction and not in in-progress state
NOTICE:  AbortTransaction and not in in-progress state
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
  before or while processing the request.
 
We have lost the connection to the backend, so further processing is
impossible.  Terminating.
[postgres@bigred ~]$



-Tony




Re: [HACKERS] Aborted Transaction During Vacuum

От
Tom Lane
Дата:
"Oliver Elphick" <olly@lfix.co.uk> writes:
>> But the vc_abort problem didn't cause a backend coredump --- it just
>> reported an error and failed to complete the vacuum, no?

> I got a coredump too; I never mentioned it, because I found the proximate
> cause and curing that made it go away.  When unlink failed in vc_shutdown,
> it called ELOG and a segfault occurred a little later.

Ah, I wish I'd known that.  So what Tony is seeing is exactly the same
behavior you observed.  OK, I feel better now --- I thought the coredump
was probably some platform-specific misbehavior that only Tony was seeing.

We still need to figure out what is causing it, because I can see no
reason for a coredump after vc_shutdown elog()s.  Something is being
clobbered that should not be.  But it sounds like installing the
vc_abort patch will get Tony on his feet, and then we can look for the
secondary bug at our leisure.
        regards, tom lane