Re: The lightbulb just went on...

Поиск
Список
Период
Сортировка
От The Hermit Hacker
Тема Re: The lightbulb just went on...
Дата
Msg-id Pine.BSF.4.21.0010162151320.342-100000@thelab.hub.org
обсуждение исходный текст
Ответ на The lightbulb just went on...  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: The lightbulb just went on...
Список pgsql-hackers
Something to force a v7.0.3 ... ?

On Mon, 16 Oct 2000, Tom Lane wrote:

> ... with a blinding flash ...
> 
> The VACUUM funnies I was complaining about before may or may not be real
> bugs, but they are not what's biting Alfred.  None of them can lead to
> the observed crashes AFAICT.
> 
> What's biting Alfred is the code that moves a tuple update chain, lines
> 1541 ff in REL7_0_PATCHES.  This sets up a pointer to a source tuple in
> "tuple".  Then it gets the destination page it plans to move the tuple
> to, and applies vc_vacpage to that page if it hasn't been done already.
> But when we're moving a tuple chain, *it is possible for the destination
> page to be the same as the source page*.  Since vc_vacpage applies
> PageRepairFragmentation, all the live tuples on the page may get moved.
> Afterwards, tuple.t_data is out of date and pointing at some random
> chunk of some other tuple.  The subsequent copy of the tuple copies
> garbage, which explains Alfred's several crashes in constructing index
> entries for the copied tuple (all of which bombed out from the
> index-build calls at lines 1634 ff, ie, for tuples being moved as part
> of a chain).  Once in a while, the obsolete pointer will be pointing at
> the real header of a different tuple --- perhaps even the place where we
> are about to put the copy.  This improbable case explains the one
> observed Assert crash in which a copied tuple's HEAP_MOVED_IN bit
> mysteriously got turned off.  Reason: it was cleared through the
> old-tuple pointer just after being set via the new-tuple one.
> 
> Proof that this is happening can be seen in the core dumps for Alfred's
> index-construction-crash cases: tuple.t_data does not point at the same
> place that the tuple.ip_posid'th page line item points at.  This could
> only happen if the page was reshuffled since the tuple pointer was set
> up.  The explanation for the Assert crash is a bit of a leap of faith,
> but I feel confident that it's right.
> 
> The solution is to do everything we're going to do with the source
> tuple, especially copying it and updating its state, *before* we apply
> vc_vacpage to the destination page.  Then we don't care if the source
> gets moved during vc_vacpage.
> 
> I will prepare a patch along this line and send it to Alfred for
> testing.
> 
>             regards, tom lane
> 
> 

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Hollomon
Дата:
Сообщение: Re: Re: New relkind for views
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Re: New relkind for views