Re: infinite loop in parallel hash joins / DSA / get_best_segment

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: infinite loop in parallel hash joins / DSA / get_best_segment
Дата
Msg-id CAEepm=2R24dengvkjWw7a=c2pDvEkAXSH5q0=nrFZpw1gkj50Q@mail.gmail.com
обсуждение исходный текст
Ответ на infinite loop in parallel hash joins / DSA / get_best_segment  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: infinite loop in parallel hash joins / DSA / get_best_segment  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
On Mon, Sep 17, 2018 at 10:38 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> While performing some benchmarks on REL_11_STABLE (at 444455c2d9), I've
> repeatedly hit an apparent infinite loop on TPC-H query 4. I don't know
> what exactly are the triggering conditions, but the symptoms are these:
>
> 1) A parallel worker" process is consuming 100% CPU, with per for
> reporting profile like this:
>
>     34.66%  postgres          [.] get_segment_by_index
>     29.44%  postgres          [.] get_best_segment
>     29.22%  postgres          [.] unlink_segment.isra.2
>      6.66%  postgres          [.] fls
>      0.02%  [unknown]         [k] 0xffffffffb10014b0
>
> So all the time seems to be spent within get_best_segment.
>
> 2) The backtrace looks like this (full backtrace attached):
>
>     #0  0x0000561a748c4f89 in get_segment_by_index
>     #1  0x0000561a748c5653 in get_best_segment
>     #2  0x0000561a748c67a9 in dsa_allocate_extended
>     #3  0x0000561a7466ddb4 in ExecParallelHashTupleAlloc
>     #4  0x0000561a7466e00a in ExecParallelHashTableInsertCurrentBatch
>     #5  0x0000561a7466fe00 in ExecParallelHashJoinNewBatch
>     #6  ExecHashJoinImpl
>     #7  ExecParallelHashJoin
>     #8  ExecProcNode
>     ...
>
> 3) The infinite loop seems to be pretty obvious - after setting
> breakpoint on get_segment_by_index we get this:
>
> Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ...
> (gdb) c
> Continuing.
>
> Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ...
> (gdb) c
> Continuing.
>
> Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ...
> (gdb) c
> Continuing.
>
> That is, we call the function with the same index over and over.
>
> Why is that? Well:
>
> (gdb) print *area->segment_maps[3].header
> $1 = {magic = 216163851, usable_pages = 512, size = 2105344, prev = 3,
> next = 3, bin = 0, freed = false}
>
> So, we loop forever.
>
> I don't know what exactly are the triggering conditions here. I've only
> ever observed the issue on TPC-H with scale 16GB, partitioned lineitem
> table and work_mem set to 8MB and query #4. And it seems I can reproduce
> it pretty reliably.

Urgh.  Thanks Tomas.  I will investigate.

-- 
Thomas Munro
http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: infinite loop in parallel hash joins / DSA / get_best_segment
Следующее
От: Tom Lane
Дата:
Сообщение: More deficiencies in outfuncs/readfuncs processing