Re: The lightbulb just went on...
От | The Hermit Hacker |
---|---|
Тема | Re: The lightbulb just went on... |
Дата | |
Msg-id | Pine.BSF.4.21.0010162151320.342-100000@thelab.hub.org обсуждение исходный текст |
Ответ на | The lightbulb just went on... (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: The lightbulb just went on...
|
Список | pgsql-hackers |
Something to force a v7.0.3 ... ? On Mon, 16 Oct 2000, Tom Lane wrote: > ... with a blinding flash ... > > The VACUUM funnies I was complaining about before may or may not be real > bugs, but they are not what's biting Alfred. None of them can lead to > the observed crashes AFAICT. > > What's biting Alfred is the code that moves a tuple update chain, lines > 1541 ff in REL7_0_PATCHES. This sets up a pointer to a source tuple in > "tuple". Then it gets the destination page it plans to move the tuple > to, and applies vc_vacpage to that page if it hasn't been done already. > But when we're moving a tuple chain, *it is possible for the destination > page to be the same as the source page*. Since vc_vacpage applies > PageRepairFragmentation, all the live tuples on the page may get moved. > Afterwards, tuple.t_data is out of date and pointing at some random > chunk of some other tuple. The subsequent copy of the tuple copies > garbage, which explains Alfred's several crashes in constructing index > entries for the copied tuple (all of which bombed out from the > index-build calls at lines 1634 ff, ie, for tuples being moved as part > of a chain). Once in a while, the obsolete pointer will be pointing at > the real header of a different tuple --- perhaps even the place where we > are about to put the copy. This improbable case explains the one > observed Assert crash in which a copied tuple's HEAP_MOVED_IN bit > mysteriously got turned off. Reason: it was cleared through the > old-tuple pointer just after being set via the new-tuple one. > > Proof that this is happening can be seen in the core dumps for Alfred's > index-construction-crash cases: tuple.t_data does not point at the same > place that the tuple.ip_posid'th page line item points at. This could > only happen if the page was reshuffled since the tuple pointer was set > up. The explanation for the Assert crash is a bit of a leap of faith, > but I feel confident that it's right. > > The solution is to do everything we're going to do with the source > tuple, especially copying it and updating its state, *before* we apply > vc_vacpage to the destination page. Then we don't care if the source > gets moved during vc_vacpage. > > I will prepare a patch along this line and send it to Alfred for > testing. > > regards, tom lane > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
В списке pgsql-hackers по дате отправления: