Обсуждение: infinite loop in parallel hash joins / DSA / get_best_segment
Hi, While performing some benchmarks on REL_11_STABLE (at 444455c2d9), I've repeatedly hit an apparent infinite loop on TPC-H query 4. I don't know what exactly are the triggering conditions, but the symptoms are these: 1) A parallel worker" process is consuming 100% CPU, with per for reporting profile like this: 34.66% postgres [.] get_segment_by_index 29.44% postgres [.] get_best_segment 29.22% postgres [.] unlink_segment.isra.2 6.66% postgres [.] fls 0.02% [unknown] [k] 0xffffffffb10014b0 So all the time seems to be spent within get_best_segment. 2) The backtrace looks like this (full backtrace attached): #0 0x0000561a748c4f89 in get_segment_by_index #1 0x0000561a748c5653 in get_best_segment #2 0x0000561a748c67a9 in dsa_allocate_extended #3 0x0000561a7466ddb4 in ExecParallelHashTupleAlloc #4 0x0000561a7466e00a in ExecParallelHashTableInsertCurrentBatch #5 0x0000561a7466fe00 in ExecParallelHashJoinNewBatch #6 ExecHashJoinImpl #7 ExecParallelHashJoin #8 ExecProcNode ... 3) The infinite loop seems to be pretty obvious - after setting breakpoint on get_segment_by_index we get this: Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ... (gdb) c Continuing. Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ... (gdb) c Continuing. Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ... (gdb) c Continuing. That is, we call the function with the same index over and over. Why is that? Well: (gdb) print *area->segment_maps[3].header $1 = {magic = 216163851, usable_pages = 512, size = 2105344, prev = 3, next = 3, bin = 0, freed = false} So, we loop forever. I don't know what exactly are the triggering conditions here. I've only ever observed the issue on TPC-H with scale 16GB, partitioned lineitem table and work_mem set to 8MB and query #4. And it seems I can reproduce it pretty reliably. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
On Mon, Sep 17, 2018 at 10:38 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > While performing some benchmarks on REL_11_STABLE (at 444455c2d9), I've > repeatedly hit an apparent infinite loop on TPC-H query 4. I don't know > what exactly are the triggering conditions, but the symptoms are these: > > 1) A parallel worker" process is consuming 100% CPU, with per for > reporting profile like this: > > 34.66% postgres [.] get_segment_by_index > 29.44% postgres [.] get_best_segment > 29.22% postgres [.] unlink_segment.isra.2 > 6.66% postgres [.] fls > 0.02% [unknown] [k] 0xffffffffb10014b0 > > So all the time seems to be spent within get_best_segment. > > 2) The backtrace looks like this (full backtrace attached): > > #0 0x0000561a748c4f89 in get_segment_by_index > #1 0x0000561a748c5653 in get_best_segment > #2 0x0000561a748c67a9 in dsa_allocate_extended > #3 0x0000561a7466ddb4 in ExecParallelHashTupleAlloc > #4 0x0000561a7466e00a in ExecParallelHashTableInsertCurrentBatch > #5 0x0000561a7466fe00 in ExecParallelHashJoinNewBatch > #6 ExecHashJoinImpl > #7 ExecParallelHashJoin > #8 ExecProcNode > ... > > 3) The infinite loop seems to be pretty obvious - after setting > breakpoint on get_segment_by_index we get this: > > Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ... > (gdb) c > Continuing. > > Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ... > (gdb) c > Continuing. > > Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ... > (gdb) c > Continuing. > > That is, we call the function with the same index over and over. > > Why is that? Well: > > (gdb) print *area->segment_maps[3].header > $1 = {magic = 216163851, usable_pages = 512, size = 2105344, prev = 3, > next = 3, bin = 0, freed = false} > > So, we loop forever. > > I don't know what exactly are the triggering conditions here. I've only > ever observed the issue on TPC-H with scale 16GB, partitioned lineitem > table and work_mem set to 8MB and query #4. And it seems I can reproduce > it pretty reliably. Urgh. Thanks Tomas. I will investigate. -- Thomas Munro http://www.enterprisedb.com
On Mon, Sep 17, 2018 at 10:42 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Mon, Sep 17, 2018 at 10:38 AM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: > > While performing some benchmarks on REL_11_STABLE (at 444455c2d9), I've > > repeatedly hit an apparent infinite loop on TPC-H query 4. I don't know > > what exactly are the triggering conditions, but the symptoms are these: > > > > ... > > Urgh. Thanks Tomas. I will investigate. Thanks very much to Tomas for giving me access to his benchmarking machine where this could be reproduced. Tomas was doing performance testing with no assertions, but with a cassert built I was able to hit an assertion failure after a while and eventually figure out what was going wrong. The problem is that the 'segment bins' (linked lists that group segments by the largest contiguous run of free pages) can become corrupted when segments become completely free and are returned to the operating system and then the same segment slot (index number) is recycled, with the right sequence of allocations and frees and timing. There is an LWLock that protects segment slot and bin manipulations, but there is a kind of ABA problem where one backend can finish up looking at the defunct former inhabitant of a slot that another backend has recently create a new segment in. There is handling for that in the form of freed_segment_counter, a kind of generation/invalidation signalling, but there are a couple of paths that fail to check it at the right times. With the attached draft patch, Tomas's benchmark script runs happily for long periods. A bit more study required with fresh eyes, tomorrow. -- Thomas Munro http://www.enterprisedb.com
Вложения
On Mon, Sep 17, 2018 at 9:12 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Mon, Sep 17, 2018 at 10:42 AM Thomas Munro > <thomas.munro@enterprisedb.com> wrote: > > On Mon, Sep 17, 2018 at 10:38 AM Tomas Vondra > > <tomas.vondra@2ndquadrant.com> wrote: > > > While performing some benchmarks on REL_11_STABLE (at 444455c2d9), I've > > > repeatedly hit an apparent infinite loop on TPC-H query 4. I don't know > > > what exactly are the triggering conditions, but the symptoms are these: > > > > > > ... > > With the attached draft patch, Tomas's benchmark script runs happily > for long periods. A bit more study required with fresh eyes, > tomorrow. Here is a tidied up version that I propose to back-patch to 10 & 11. -- Thomas Munro http://www.enterprisedb.com
Вложения
On Tue, Sep 18, 2018 at 11:35 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Mon, Sep 17, 2018 at 9:12 PM Thomas Munro > <thomas.munro@enterprisedb.com> wrote: > > On Mon, Sep 17, 2018 at 10:42 AM Thomas Munro > > <thomas.munro@enterprisedb.com> wrote: > > > On Mon, Sep 17, 2018 at 10:38 AM Tomas Vondra > > > <tomas.vondra@2ndquadrant.com> wrote: > > > > While performing some benchmarks on REL_11_STABLE (at 444455c2d9), I've > > > > repeatedly hit an apparent infinite loop on TPC-H query 4. I don't know > > > > what exactly are the triggering conditions, but the symptoms are these: > > > > With the attached draft patch, Tomas's benchmark script runs happily > > for long periods. A bit more study required with fresh eyes, > > tomorrow. > > Here is a tidied up version that I propose to back-patch to 10 & 11. Done. I figured reviews would be unlikely to show up soon for a dsa.c patch, but of course post-commit review is very welcome. I thought it best to have this fix in REL_11_STABLE being tested for as long as possible before any potential RC1 tag, especially if Tomas plans to continue his "big join" testing on that branch. -- Thomas Munro http://www.enterprisedb.com