Re: dsa_allocate could not find 4 free pages

Поиск
Список
Период
Сортировка
От Mark Dilger
Тема Re: dsa_allocate could not find 4 free pages
Дата
Msg-id 25884857-F310-4C10-AC97-3C85A5F2D8FD@gmail.com
обсуждение исходный текст
Ответ на Re: dsa_allocate could not find 4 free pages  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
> On Dec 5, 2017, at 4:07 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>
> On Wed, Dec 6, 2017 at 9:35 AM, Mark Dilger <hornschnorter@gmail.com> wrote:
>>> On Dec 5, 2017, at 11:25 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>>> Does the plan have multiple Gather nodes with Parallel Bitmap Heap Scan?
>>
>> This was encountered and logged by a java client.  The only data I got was:
>>
>> org.postgresql.util.PSQLException: ERROR: dsa_allocate could not find 4 free pages
>>  Where: parallel worker
>
> This means that the DSA area is corrupted.  Presumably
> get_best_segment(area, 4) returned a segment that wasn't actually good
> for 4 pages, either because it was incorrectly binned or because its
> free space btree was corrupted.  Another path would be that
> make_new_segment(area, 4) returned a segment that couldn't find 4
> pages, but that seems unlikely.
>
>> [query plan with one Gather and no Parallel Bitmap Heap Scan]
>
> I'm not sure why this plan would ever call dsa_allocate().
>
>> [query plan with no Gather but plenty of Btimap Heap Scans]
>
> And this one certainly can't.  I guess you must sometimes get a
> different variation that has Gather nodes and uses Parallel Bitmap
> Heap Scan.

Yes, I can believe that the plan is sometimes different.  This error has
occurred several times now, but it is still rather infrequent, so either the
plan that triggers it is rare, or the bug is intermittent even with the same
plan being chosen, or perhaps both.

>  Then the question is whether the es_query_dsa multiple
> Gather bug can explain this: for example, if dsa_free(wrong_dsa_area,
> p) was called, perhaps it could produce this type of corruption.
> Otherwise we have a different bug.  Any clues on how to reproduce the
> problem would be very welcome.

I have written (and rewritten, and rewritten) a tap test in the hopes of
getting a test case that reproduces this reliably (or even once), but
without luck so far.  I will keep trying.

mark



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: [HACKERS] Proposal: Local indexes for partitioned table
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: [HACKERS] Proposal: Local indexes for partitioned table