Re: hung backends stuck in spinlock heavy endless loop

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: hung backends stuck in spinlock heavy endless loop
Дата
Msg-id CAM3SWZRQAE3H0p+C1y5sWJAcjask=M7Hzc2v3-Viqv7u9LHZmw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: hung backends stuck in spinlock heavy endless loop  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On Fri, Jan 16, 2015 at 6:21 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> It looks very much like that a page has for some reason been moved to a
> different block number. And that's exactly what Peter found out in his
> investigation too; an index page was mysteriously copied to a different
> block with identical content.

What I found suspicious about that was that the spuriously identical
pages were not physically adjacent, but logically adjacent (i.e. the
bad page was considered the B-Tree right link of the good page by the
good, spuriously-copied-by-bad page). It also seems likely that that
small catalog index on pg_class(oid) was well cached in
shared_buffers. So I agree that it's unlikely that this is actually a
hardware or filesystem problem. Beyond that, if I had to guess, I'd
say that the problem is more likely to be in the B-Tree code than it
is in the buffer manager or whatever (so the "logically adjacent"
thing is probably not an artifact of the order that the pages were
accessed, since it appears there was a downlink to the bad page. This
downlink was not added recently. Also, this logical adjacency is
unlikely to be mere coincidence - Postgres seemed to fairly
consistently break this way).

Does anyone have a better developed sense of where the ultimate
problem here is than I do? I guess I've never thought too much about
how the system fails when a catalog index is this thoroughly broken.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Lisa Guo
Дата:
Сообщение: n_live_tup smaller than the number of rows in a table
Следующее
От: Tom Lane
Дата:
Сообщение: Re: n_live_tup smaller than the number of rows in a table