Обсуждение: Crash in 9.4 Beta when partially collapsing left outer joins

Поиск
Список
Период
Сортировка

Crash in 9.4 Beta when partially collapsing left outer joins

От
lists@benjamindsmith.com
Дата:
I think this is the first time I've ever reported a PG crash, which is notable
since I've been using PG for over 10 years. ;)

Using the 9.4 Beta RPMs on CentOS 6.X/64, we're experiencing a reproducible
crash when running a query with a left outer join, partially collapsed.

TRAP: FailedAssertion("!(!restriction_is_or_clause((RestrictInfo *) orarg))",
File: "indxpath.c", Line: 1213)
< 2014-09-08 14:21:33.179 PDT >LOG:  server process (PID 19957) was terminated
by signal 6: Aborted
< 2014-09-08 14:21:33.179 PDT >DETAIL:  Failed process was running: SELECT
                students.id
        FROM students
        LEFT OUTER JOIN enrollments ON
                (
                enrollments.students_id = students.id
                )
        WHERE
                 (
                students.id = 5008
                OR
                      (
                                (
                                students.birthcity = 'Chico'
                                OR students.birthcity IS NULL
                                )
                        AND enrollments.start < 20141219
                        )
                );
< 2014-09-08 14:21:33.179 PDT >LOG:  terminating any other active server
processes
< 2014-09-08 14:21:33.179 PDT >WARNING:  terminating connection because of
crash of another server process
< 2014-09-08 14:21:33.179 PDT >DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.


Here's the smallest query I could find that it would crash on. Run on a blank
database, the following will reliably crash postgres:

CREATE TABLE students
    (id SERIAL PRIMARY KEY,
    birthcity VARCHAR DEFAULT NULL);

CREATE TABLE enrollments
    (students_id INTEGER NOT NULL REFERENCES students(id),
    start INTEGER);

SELECT
    students.id
FROM students
LEFT OUTER JOIN enrollments ON
    (
    enrollments.students_id = students.id
    )
WHERE
    (
    students.id = 5008
    OR
        (
            (
            students.birthcity = 'Chico'
            OR students.birthcity IS NULL
            )
        AND enrollments.start < 20141219
        )
    );

-----------------------
Other environment stuff:

[root@db1 pgsql]# rpm -qa | grep postg
postgresql94-libs-9.4beta2-1PGDG.rhel6.x86_64
postgresql94-server-9.4beta2-1PGDG.rhel6.x86_64
postgresql94-devel-9.4beta2-1PGDG.rhel6.x86_64
postgresql92-libs-9.2.9-1PGDG.rhel6.x86_64
postgresql94-9.4beta2-1PGDG.rhel6.x86_64
postgresql94-contrib-9.4beta2-1PGDG.rhel6.x86_64

[root@db1 pgsql]# uname -a
Linux db1.schoolpathways.com 2.6.32-431.23.3.el6.x86_64 #1 SMP Thu Jul 31
17:20:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux


What other information should I provide? We have the machine available if
necessary.

Benjamin Smith


Re: Crash in 9.4 Beta when partially collapsing left outer joins

От
Tom Lane
Дата:
lists@benjamindsmith.com writes:
> Using the 9.4 Beta RPMs on CentOS 6.X/64, we're experiencing a reproducible
> crash when running a query with a left outer join, partially collapsed.

The test case crashes as described for me.  Will take a look tomorrow.
Thanks for the report!

            regards, tom lane


Re: Crash in 9.4 Beta when partially collapsing left outer joins

От
Michael Paquier
Дата:
On Tue, Sep 9, 2014 at 6:36 AM,  <lists@benjamindsmith.com> wrote:
> What other information should I provide? We have the machine available if
> necessary.
This can be reproduced without especially LEFT OUTER JOIN, and system
crashes as long as index path is taken in planner, and that WHERE
clause uses a given combination of OR and AND like the one in the
query given. Here is a more simple example:
create table aa (a int);
create index aai on aa(a);
select a1.a from aa a1, aa a2 where a1.a = 0 or (a1.a = 0 or a1.a = 1)
and a2.a = 0;
Some bisecting is showing as well that the commit at the origin of the
regression is f343a88.
Regards,
--
Michael


Re: Crash in 9.4 Beta when partially collapsing left outer joins

От
Michael Paquier
Дата:
On Tue, Sep 9, 2014 at 2:43 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Some bisecting is showing as well that the commit at the origin of the
> regression is f343a88.
The failure is caused by an assertion not happy since this commit:
    frame #4: 0x0000000101d20670
postgres`generate_bitmap_or_paths(root=0x00007fd61d004d48,
rel=0x00007fd61c033a58, clauses=0x00007fd61d010200,
other_clauses=0x0000000000000000) + 480 at indxpath.c:1213
    frame #5: 0x0000000101d1fc37
postgres`create_index_paths(root=0x00007fd61d004d48,
rel=0x00007fd61c033a58) + 1255 at indxpath.c:314
    frame #6: 0x0000000101d1146b
postgres`set_plain_rel_pathlist(root=0x00007fd61d004d48,
rel=0x00007fd61c033a58, rte=0x00007fd61c033c88) + 75 at allpaths.c:397

While reading the code of this commit, I noticed that
extract_or_clause has added some logic for nested OR clauses: it
extracts their content and adds them directly to the list of
subclauses that are then used by generate_bitmap_or_paths, triggering
the assertion failure reported by the trace above.
The logic for nested OR is correct by reading it, hence why not simply
removing the assertion failing? The attached patch 1 does so.

Another approach would consist in removing the nested OR part and keep
the old assertion logic, like in the patch 2 attached, but this seems
like a no-go as f343a88 has actually improved nested OR tracking.
Thoughts?
Note: I added as well a regression tests in patch 1 as this is IMO the
correct approach, if that's considered as correct of course :)
--
Michael

Вложения

Re: Crash in 9.4 Beta when partially collapsing left outer joins

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> The logic for nested OR is correct by reading it, hence why not simply
> removing the assertion failing? The attached patch 1 does so.

The reason for the assert is that there should never be an OR directly
underneath an OR in the planner after eval_const_expressions has flattened
such cases.  Evidently commit f343a88 failed to preserve AND/OR flatness
in some cases :-(.  That code should be taught to do so, rather than
lobotomizing this assertion.  Lack of flatness causes optimization
inefficiencies, which is why we don't want to just allow it.

            regards, tom lane


Re: Crash in 9.4 Beta when partially collapsing left outer joins

От
Michael Paquier
Дата:
On Tue, Sep 9, 2014 at 10:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> The logic for nested OR is correct by reading it, hence why not simply
>> removing the assertion failing? The attached patch 1 does so.
>
> The reason for the assert is that there should never be an OR directly
> underneath an OR in the planner after eval_const_expressions has flattened
> such cases.  Evidently commit f343a88 failed to preserve AND/OR flatness
> in some cases :-(.  That code should be taught to do so, rather than
> lobotomizing this assertion.  Lack of flatness causes optimization
> inefficiencies, which is why we don't want to just allow it.
Ah, OK, I just saw your commit. so the trick is to add the arguments
of subclause in case of an OR clause found to have a correct
flattening here... Thanks!
--
Michael


Re: Crash in 9.4 Beta when partially collapsing left outer joins

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Tue, Sep 9, 2014 at 10:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The reason for the assert is that there should never be an OR directly
>> underneath an OR in the planner after eval_const_expressions has flattened
>> such cases.  Evidently commit f343a88 failed to preserve AND/OR flatness
>> in some cases :-(.  That code should be taught to do so, rather than
>> lobotomizing this assertion.  Lack of flatness causes optimization
>> inefficiencies, which is why we don't want to just allow it.

> Ah, OK, I just saw your commit. so the trick is to add the arguments
> of subclause in case of an OR clause found to have a correct
> flattening here... Thanks!

Right.  If you look again at that code in orclauses.c, you'll notice that
it is itself assuming AND/OR flatness in its input.  We could discard
that assumption, but it would just mean moving complexity from the places
that currently have to preserve flatness to other places.  For instance,
right now we suppose that all "top level" WHERE clauses are in the
top-level AND list ... if we had to check for sub-AND clauses and recurse
into those, it would make life complicated in numerous places.

I do wonder, having seen this bug, if there's someplace we could add
assertions to check for AND/OR flatness that'd be more certainly hit
by a violation.

            regards, tom lane