Обсуждение: [BUGS] BUG #14657: Server process segmentation fault in v10,May 10th dev snapshot
[BUGS] BUG #14657: Server process segmentation fault in v10,May 10th dev snapshot
От
sveinn.sveinsson@gmail.com
Дата:
The following bug has been logged on the website:
Bug reference: 14657
Logged by: Sveinn Sveinsson
Email address: sveinn.sveinsson@gmail.com
PostgreSQL version: Unsupported/Unknown
Operating system: Linux x86_64
Description:
The following causes segmentation fault in v10, May 10th development
snapshot:
create table test (sd timestamp,anb varchar(16)) partition by range (sd);
create table test_1 partition of test for values from ('2017-01-01') to
('2017-02-01');
create index test_1_a on test_1 (anb,sd);
insert into test values ('2017-01-01','12345');
select min(sd), max(sd) from test where anb='12345';
The server log file shows:
2017-05-16 09:36:13.243 UTC [1503] LOG: server process (PID 3474) was
terminated by signal 11: Segmentation fault
2017-05-16 09:36:13.243 UTC [1503] DETAIL: Failed process was running:
select min(sd), max(sd) from test where anb='12345';
2017-05-16 09:36:13.244 UTC [1503] LOG: terminating any other active server
processes
2017-05-16 09:36:13.245 UTC [3463] WARNING: terminating connection because
of crash of another server process
The stack trace is (no debug info):
Program terminated with signal 11, Segmentation fault.
#0 0x000000000061ab1b in list_nth ()
(gdb) bt
#0 0x000000000061ab1b in list_nth ()
#1 0x00000000005e4081 in ExecLockNonLeafAppendTables ()
#2 0x00000000005f4d52 in ExecInitMergeAppend ()
#3 0x00000000005e0365 in ExecInitNode ()
#4 0x00000000005f35a7 in ExecInitLimit ()
#5 0x00000000005e00f3 in ExecInitNode ()
#6 0x00000000005dd207 in standard_ExecutorStart ()
#7 0x00000000006f96d2 in PortalStart ()
#8 0x00000000006f5c7f in exec_simple_query ()
#9 0x00000000006f6fac in PostgresMain ()
#10 0x0000000000475cdc in ServerLoop ()
#11 0x0000000000692ffa in PostmasterMain ()
#12 0x0000000000476600 in main ()
Regards,
Sveinn Sveinsson.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote:
> (gdb) bt
> #0 0x000000000061ab1b in list_nth ()
> #1 0x00000000005e4081 in ExecLockNonLeafAppendTables ()
> #2 0x00000000005f4d52 in ExecInitMergeAppend ()
> #3 0x00000000005e0365 in ExecInitNode ()
> #4 0x00000000005f35a7 in ExecInitLimit ()
> #5 0x00000000005e00f3 in ExecInitNode ()
> #6 0x00000000005dd207 in standard_ExecutorStart ()
> #7 0x00000000006f96d2 in PortalStart ()
> #8 0x00000000006f5c7f in exec_simple_query ()
> #9 0x00000000006f6fac in PostgresMain ()
> #10 0x0000000000475cdc in ServerLoop ()
> #11 0x0000000000692ffa in PostmasterMain ()
> #12 0x0000000000476600 in main ()
Seems like the issue is that the plans under multiple subroots are
pointing to the same partitioned_rels.
If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan
*plan, int rtoffset)" the rtoffset is specific to the subroot. Now,
problem is that set_plan_refs called for different subroot is updating
the same partition_rel info and make this value completely wrong which
will ultimately make ExecLockNonLeafAppendTables to access the out of
bound "rte" index.
set_plan_refs
{
[clipped]
case T_MergeAppend:
{
[clipped]
foreach(l, splan->partitioned_rels)
{
lfirst_int(l) += rtoffset;
I think the solution should be that create_merge_append_path make the
copy of partitioned_rels list?
Attached patch fixes the problem but I am not completely sure about the fix.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Вложения
Re: [BUGS] BUG #14657: Server process segmentation fault in v10, May10th dev snapshot
От
Amit Langote
Дата:
On 2017/05/18 2:14, Dilip Kumar wrote:
> On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote:
>> (gdb) bt
>> #0 0x000000000061ab1b in list_nth ()
>> #1 0x00000000005e4081 in ExecLockNonLeafAppendTables ()
>> #2 0x00000000005f4d52 in ExecInitMergeAppend ()
>> #3 0x00000000005e0365 in ExecInitNode ()
>> #4 0x00000000005f35a7 in ExecInitLimit ()
>> #5 0x00000000005e00f3 in ExecInitNode ()
>> #6 0x00000000005dd207 in standard_ExecutorStart ()
>> #7 0x00000000006f96d2 in PortalStart ()
>> #8 0x00000000006f5c7f in exec_simple_query ()
>> #9 0x00000000006f6fac in PostgresMain ()
>> #10 0x0000000000475cdc in ServerLoop ()
>> #11 0x0000000000692ffa in PostmasterMain ()
>> #12 0x0000000000476600 in main ()
Thanks for the test case Sveinn and thanks Dilip for analyzing.
> Seems like the issue is that the plans under multiple subroots are
> pointing to the same partitioned_rels.
That's correct.
> If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan
> *plan, int rtoffset)" the rtoffset is specific to the subroot. Now,
> problem is that set_plan_refs called for different subroot is updating
> the same partition_rel info and make this value completely wrong which
> will ultimately make ExecLockNonLeafAppendTables to access the out of
> bound "rte" index.
Yes.
> set_plan_refs
> {
> [clipped]
> case T_MergeAppend:
> {
> [clipped]
>
> foreach(l, splan->partitioned_rels)
> {
> lfirst_int(l) += rtoffset;
>
>
> I think the solution should be that create_merge_append_path make the
> copy of partitioned_rels list?
Yes, partitioned_rels should be copied.
> Attached patch fixes the problem but I am not completely sure about the fix.
Thanks for creating the patch, although I think a better fix would be to
make get_partitioned_child_rels() do the list_copy. That way, any other
users of partitioned_rels will not suffer the same issue. Attached patch
implements that, along with a regression test.
Added to the open items.
Thanks,
Amit
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Вложения
Re: [BUGS] BUG #14657: Server process segmentation fault in v10, May10th dev snapshot
От
Amit Langote
Дата:
On 2017/05/18 10:49, Amit Langote wrote:
> On 2017/05/18 2:14, Dilip Kumar wrote:
>> On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote:
>>> (gdb) bt
>>> #0 0x000000000061ab1b in list_nth ()
>>> #1 0x00000000005e4081 in ExecLockNonLeafAppendTables ()
>>> #2 0x00000000005f4d52 in ExecInitMergeAppend ()
>>> #3 0x00000000005e0365 in ExecInitNode ()
>>> #4 0x00000000005f35a7 in ExecInitLimit ()
>>> #5 0x00000000005e00f3 in ExecInitNode ()
>>> #6 0x00000000005dd207 in standard_ExecutorStart ()
>>> #7 0x00000000006f96d2 in PortalStart ()
>>> #8 0x00000000006f5c7f in exec_simple_query ()
>>> #9 0x00000000006f6fac in PostgresMain ()
>>> #10 0x0000000000475cdc in ServerLoop ()
>>> #11 0x0000000000692ffa in PostmasterMain ()
>>> #12 0x0000000000476600 in main ()
>
> Thanks for the test case Sveinn and thanks Dilip for analyzing.
>
>> Seems like the issue is that the plans under multiple subroots are
>> pointing to the same partitioned_rels.
>
> That's correct.
>
>> If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan
>> *plan, int rtoffset)" the rtoffset is specific to the subroot. Now,
>> problem is that set_plan_refs called for different subroot is updating
>> the same partition_rel info and make this value completely wrong which
>> will ultimately make ExecLockNonLeafAppendTables to access the out of
>> bound "rte" index.
>
> Yes.
>
>> set_plan_refs
>> {
>> [clipped]
>> case T_MergeAppend:
>> {
>> [clipped]
>>
>> foreach(l, splan->partitioned_rels)
>> {
>> lfirst_int(l) += rtoffset;
>>
>>
>> I think the solution should be that create_merge_append_path make the
>> copy of partitioned_rels list?
>
> Yes, partitioned_rels should be copied.
>
>> Attached patch fixes the problem but I am not completely sure about the fix.
>
> Thanks for creating the patch, although I think a better fix would be to
> make get_partitioned_child_rels() do the list_copy. That way, any other
> users of partitioned_rels will not suffer the same issue. Attached patch
> implements that, along with a regression test.
>
> Added to the open items.
Oops, forgot to cc -hackers. Patch attached again.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Thu, May 18, 2017 at 7:19 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Thanks for creating the patch, although I think a better fix would be to > make get_partitioned_child_rels() do the list_copy. That way, any other > users of partitioned_rels will not suffer the same issue. Attached patch > implements that, along with a regression test. Correct! This is generic fix. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #14657: Server process segmentation fault in v10, May10th dev snapshot
От
Sveinn Sveinsson
Дата:
The patch fixed the problem, thanks a lot.
Regards,
Sveinn.
On fim 18.maí 2017 01:53, Amit Langote wrote:
> On 2017/05/18 10:49, Amit Langote wrote:
>> On 2017/05/18 2:14, Dilip Kumar wrote:
>>> On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote:
>>>> (gdb) bt
>>>> #0 0x000000000061ab1b in list_nth ()
>>>> #1 0x00000000005e4081 in ExecLockNonLeafAppendTables ()
>>>> #2 0x00000000005f4d52 in ExecInitMergeAppend ()
>>>> #3 0x00000000005e0365 in ExecInitNode ()
>>>> #4 0x00000000005f35a7 in ExecInitLimit ()
>>>> #5 0x00000000005e00f3 in ExecInitNode ()
>>>> #6 0x00000000005dd207 in standard_ExecutorStart ()
>>>> #7 0x00000000006f96d2 in PortalStart ()
>>>> #8 0x00000000006f5c7f in exec_simple_query ()
>>>> #9 0x00000000006f6fac in PostgresMain ()
>>>> #10 0x0000000000475cdc in ServerLoop ()
>>>> #11 0x0000000000692ffa in PostmasterMain ()
>>>> #12 0x0000000000476600 in main ()
>> Thanks for the test case Sveinn and thanks Dilip for analyzing.
>>
>>> Seems like the issue is that the plans under multiple subroots are
>>> pointing to the same partitioned_rels.
>> That's correct.
>>
>>> If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan
>>> *plan, int rtoffset)" the rtoffset is specific to the subroot. Now,
>>> problem is that set_plan_refs called for different subroot is updating
>>> the same partition_rel info and make this value completely wrong which
>>> will ultimately make ExecLockNonLeafAppendTables to access the out of
>>> bound "rte" index.
>> Yes.
>>
>>> set_plan_refs
>>> {
>>> [clipped]
>>> case T_MergeAppend:
>>> {
>>> [clipped]
>>>
>>> foreach(l, splan->partitioned_rels)
>>> {
>>> lfirst_int(l) += rtoffset;
>>>
>>>
>>> I think the solution should be that create_merge_append_path make the
>>> copy of partitioned_rels list?
>> Yes, partitioned_rels should be copied.
>>
>>> Attached patch fixes the problem but I am not completely sure about the fix.
>> Thanks for creating the patch, although I think a better fix would be to
>> make get_partitioned_child_rels() do the list_copy. That way, any other
>> users of partitioned_rels will not suffer the same issue. Attached patch
>> implements that, along with a regression test.
>>
>> Added to the open items.
> Oops, forgot to cc -hackers. Patch attached again.
>
> Thanks,
> Amit
On Thu, May 18, 2017 at 7:23 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/05/18 10:49, Amit Langote wrote:
>> On 2017/05/18 2:14, Dilip Kumar wrote:
>>> On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote:
>>>> (gdb) bt
>>>> #0 0x000000000061ab1b in list_nth ()
>>>> #1 0x00000000005e4081 in ExecLockNonLeafAppendTables ()
>>>> #2 0x00000000005f4d52 in ExecInitMergeAppend ()
>>>> #3 0x00000000005e0365 in ExecInitNode ()
>>>> #4 0x00000000005f35a7 in ExecInitLimit ()
>>>> #5 0x00000000005e00f3 in ExecInitNode ()
>>>> #6 0x00000000005dd207 in standard_ExecutorStart ()
>>>> #7 0x00000000006f96d2 in PortalStart ()
>>>> #8 0x00000000006f5c7f in exec_simple_query ()
>>>> #9 0x00000000006f6fac in PostgresMain ()
>>>> #10 0x0000000000475cdc in ServerLoop ()
>>>> #11 0x0000000000692ffa in PostmasterMain ()
>>>> #12 0x0000000000476600 in main ()
>>
>> Thanks for the test case Sveinn and thanks Dilip for analyzing.
>>
>>> Seems like the issue is that the plans under multiple subroots are
>>> pointing to the same partitioned_rels.
>>
>> That's correct.
>>
>>> If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan
>>> *plan, int rtoffset)" the rtoffset is specific to the subroot. Now,
>>> problem is that set_plan_refs called for different subroot is updating
>>> the same partition_rel info and make this value completely wrong which
>>> will ultimately make ExecLockNonLeafAppendTables to access the out of
>>> bound "rte" index.
>>
>> Yes.
>>
>>> set_plan_refs
>>> {
>>> [clipped]
>>> case T_MergeAppend:
>>> {
>>> [clipped]
>>>
>>> foreach(l, splan->partitioned_rels)
>>> {
>>> lfirst_int(l) += rtoffset;
>>>
>>>
>>> I think the solution should be that create_merge_append_path make the
>>> copy of partitioned_rels list?
>>
>> Yes, partitioned_rels should be copied.
>>
>>> Attached patch fixes the problem but I am not completely sure about the fix.
>>
>> Thanks for creating the patch, although I think a better fix would be to
>> make get_partitioned_child_rels() do the list_copy. That way, any other
>> users of partitioned_rels will not suffer the same issue. Attached patch
>> implements that, along with a regression test.
>>
>> Added to the open items.
>
> Oops, forgot to cc -hackers. Patch attached again.
May be we should add a comment as to why the copy is needed.
We still have the same copy shared across multiple append paths and
set_plan_refs would change change it underneath those. May not be a
problem right now but may be a problem in the future. Another option,
which consumes a bit less memory is to make a copy at the time of
planning if the path gets selected as the cheapest path.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On Fri, May 19, 2017 at 6:07 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > We still have the same copy shared across multiple append paths and > set_plan_refs would change change it underneath those. May not be a > problem right now but may be a problem in the future. I agree. I think it's better for the path-creation functions to copy the list, so that there is no surprising sharing of substructure. set_plan_refs() obviously expects this data to be unshared, and this seems like the best way to ensure that's true in all cases. Committed that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company