Обсуждение: Bug (8.4beta): FailedAssertion("!(bms_is_subset(relids, qualscope))", File: "initsplan.c", Line: 915)

Поиск
Список
Период
Сортировка
I noticed the following bug when testing an application (openbravo 2.40) on postgresql 8.4:

Environment:
8.4beta
Package from: https://launchpad.net/~pitti/+archive/postgresql
recompiled for ubuntu intrepid

The following query does trigger the FailedAssertion:

SELECT ad_field.name As Name, ad_field_trl.name as columnname
FROM ad_field left join ad_field_trl on ad_field.ad_field_id = ad_field_trl.ad_field_id
and ad_field_trl.ad_language = 'en_US',
ad_column
WHERE ad_field.ad_column_id = ad_column.ad_column_id
and ad_tab_id = to_number(1) and isParent='Y'
and exists(select 1 from ad_column c, ad_field f where c.ad_column_id = f.ad_column_id and c.iskey='Y'
and ad_tab_id=to_number(1) and UPPER(c.columnname) = UPPER(ad_column.columnname));

The minimum needed table-structure and function definition (to_number) are attached.

The original usecase did have to_number(?) via jdbc-preparedstatement and passing the parameter via setString, thus
usingthe to_number(text) function. But the same assertion does also happen with the query shown above.. 

Feel free to ask for any more needed information.

Regards,
Stefan


Вложения
Stefan Huehner <stefan@huehner.org> writes:
> I noticed the following bug when testing an application (openbravo 2.40) on postgresql 8.4:

Thank you for the report, but I do not see any problem when trying the
test case here.  Do you have any nondefault planner parameter settings?

            regards, tom lane
Tom Lane wrote:
> Stefan Huehner <stefan@huehner.org> writes:
>> I noticed the following bug when testing an application (openbravo 2.40) on postgresql 8.4:
>
> Thank you for the report, but I do not see any problem when trying the
> test case here.  Do you have any nondefault planner parameter settings?

hmm weird - the testcase crashes for me as well on 8.4B1:


(gdb) bt
#0  0xb7f69424 in __kernel_vsyscall ()
#1  0xb7df4640 in raise () from /lib/i686/cmov/libc.so.6
#2  0xb7df6018 in abort () from /lib/i686/cmov/libc.so.6
#3  0x0832cace in ExceptionalCondition (
     conditionName=0x845d954 "!(bms_is_subset(relids, qualscope))",
     errorType=0x8362014 "FailedAssertion", fileName=0x845d836
"initsplan.c", lineNumber=915)
     at assert.c:57
#4  0x08215586 in distribute_qual_to_rels (root=0xa29bd9c,
clause=0xa2aad0c, is_deduced=0 '\0',
     below_outer_join=0 '\0', jointype=JOIN_INNER, qualscope=0xa2ad254,
ojscope=0x0,
     outerjoin_nonnullable=0x0) at initsplan.c:915
#5  0x08215a28 in deconstruct_recurse (root=0xa29bd9c, jtnode=0xa2a8a18,
below_outer_join=0 '\0',
     qualscope=0xbf8823e8, inner_join_rels=0xbf8823e0) at initsplan.c:336
#6  0x08215b55 in deconstruct_recurse (root=0xa29bd9c, jtnode=0xa2a971c,
below_outer_join=0 '\0',
     qualscope=0xbf882468, inner_join_rels=0xbf88249c) at initsplan.c:394
#7  0x0821596b in deconstruct_recurse (root=0xa29bd9c, jtnode=0xa2a9784,
below_outer_join=0 '\0',
     qualscope=0xbf8824a0, inner_join_rels=0xbf88249c) at initsplan.c:305
#8  0x082161d5 in deconstruct_jointree (root=0xa29bd9c) at initsplan.c:238


Stefan
On Wed, May 06, 2009 at 10:33:11AM -0400, Tom Lane wrote:
> Stefan Huehner <stefan@huehner.org> writes:
> > I noticed the following bug when testing an application (openbravo 2.40) on postgresql 8.4:
>
> Thank you for the report, but I do not see any problem when trying the
> test case here.  Do you have any nondefault planner parameter settings?

None that i know of. I did use some prepackaed 8.4beta for ubuntu and i will try to reproduce the problem on a
unmodifiedself-compiled version: 

For reference the configure options used by the package:
  --mandir=\$${prefix}/share/postgresql/$(MAJOR_VER)/man \
  --with-docdir=\$${prefix}/share/doc/postgresql-doc-$(MAJOR_VER) \
  --sysconfdir=/etc/postgresql-common \
  --datadir=\$${prefix}/share/postgresql/$(MAJOR_VER) \
  --bindir=\$${prefix}/lib/postgresql/$(MAJOR_VER)/bin \
  --includedir=\$${prefix}/include/postgresql/ \
  --enable-nls \
  --enable-integer-datetimes \
  --enable-thread-safety \
  --enable-debug \
  --enable-cassert \
  --disable-rpath \
  --with-tcl \
  --with-perl \
  --with-python \
  --with-pam \
  --with-krb5 \
  --with-gssapi \
  --with-openssl \
  --with-libxml \
  --with-libxslt \
  --with-ldap \
  --with-ossp-uuid \
  --with-gnu-ld \
  --with-tclconfig=/usr/lib/tcl$(TCL_VER) \
  --with-tkconfig=/usr/lib/tk$(TCL_VER) \
  --with-includes=/usr/include/tcl$(TCL_VER) \
  --with-system-tzdata=/usr/share/zoneinfo \
  --with-pgport=5432


and all non-comment: postgresql.conf options:

data_directory = '/var/lib/postgresql/8.4/main'         # use data in another directory
datestyle = 'iso, mdy'
default_text_search_config = 'pg_catalog.english'
external_pid_file = '/var/run/postgresql/8.4-main.pid'          # write an extra PID file
hba_file = '/etc/postgresql/8.4/main/pg_hba.conf'       # host-based authentication file
ident_file = '/etc/postgresql/8.4/main/pg_ident.conf'   # ident configuration file
lc_messages = 'en_US.UTF-8'                     # locale for system error message
lc_monetary = 'en_US.UTF-8'                     # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'                      # locale for number formatting
lc_time = 'en_US.UTF-8'                         # locale for time formatting
log_error_verbosity=verbose
log_line_prefix = '%t '                 # special values:
max_connections = 100                   # (change requires restart)
port = 5434                             # (change requires restart)
shared_buffers = 32MB                   # min 128kB
ssl = false                             # (change requires restart)
unix_socket_directory = '/var/run/postgresql'           # (change requires restart)

Regards,
Stefan
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> Tom Lane wrote:
>> Thank you for the report, but I do not see any problem when trying the
>> test case here.  Do you have any nondefault planner parameter settings?

> hmm weird - the testcase crashes for me as well on 8.4B1:

I was trying it on HEAD ... but I don't see any post-beta1 changes
in the cvs log that look like they might have fixed this ...

            regards, tom lane
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> Tom Lane wrote:
>>> Thank you for the report, but I do not see any problem when trying the
>>> test case here.  Do you have any nondefault planner parameter settings?
>
>> hmm weird - the testcase crashes for me as well on 8.4B1:
>
> I was trying it on HEAD ... but I don't see any post-beta1 changes
> in the cvs log that look like they might have fixed this ...

confirmed - it does not crash on -HEAD for me as well but the plan
generated by EXPLAIN looks kinda funny:

                 QUERY PLAN
------------------------------------------
  Result  (cost=0.00..0.01 rows=1 width=0)
    One-Time Filter: false
(2 rows)


Stefan
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> Tom Lane wrote:
>> I was trying it on HEAD ... but I don't see any post-beta1 changes
>> in the cvs log that look like they might have fixed this ...

> confirmed - it does not crash on -HEAD for me as well but the plan
> generated by EXPLAIN looks kinda funny:

>                  QUERY PLAN
> ------------------------------------------
>   Result  (cost=0.00..0.01 rows=1 width=0)
>     One-Time Filter: false
> (2 rows)

Oh!  What is happening is that to_number(1) is being reduced to constant
NULL, whereupon it concludes that ad_tab_id=to_number(1) is constant
NULL, ergo the EXISTS can never succeed, ergo the entire WHERE is
constant false.  I suppose the change I made here
http://archives.postgresql.org/pgsql-committers/2009-04/msg00329.php
to improve constant-join-qual handling is what is preventing the
assertion failure, though I'm still not quite sure why (I'd better look
closer to see if there is still some form of the bug lurking).

Anyway I think this is an object lesson in why ignoring "WHEN OTHERS"
errors is dangerous.  to_number(integer) is defined as

CREATE FUNCTION to_number(integer) RETURNS numeric
    LANGUAGE plpgsql IMMUTABLE
    AS $_$
BEGIN
RETURN to_number($1, 'S99999999999999D999999');
EXCEPTION
  WHEN OTHERS THEN
    RETURN NULL;
END;
$_$;

and what is actually happening inside there is

regression=# select to_number(1, 'S99999999999999D999999');
ERROR:  function to_number(integer, unknown) does not exist
LINE 1: select to_number(1, 'S99999999999999D999999');
               ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

which is probably not what the author expects, but the WHEN OTHERS
exception is hiding it.

            regards, tom lane
Tom Lane wrote:

> Oh!  What is happening is that to_number(1) is being reduced to constant
> NULL, whereupon it concludes that ad_tab_id=to_number(1) is constant
> NULL, ergo the EXISTS can never succeed, ergo the entire WHERE is
> constant false.

> CREATE FUNCTION to_number(integer) RETURNS numeric
>     LANGUAGE plpgsql IMMUTABLE
>     AS $_$
> BEGIN
> RETURN to_number($1, 'S99999999999999D999999');
> EXCEPTION
>   WHEN OTHERS THEN
>     RETURN NULL;
> END;
> $_$;

How can the executor "see through" a plpgsql function definition?  Or do
you mean that this reduction is being done at execution time?  (hmm ...
this is what a one-time filter is, now that I think about it?)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote:
> Tom Lane wrote:
>
>> Oh!  What is happening is that to_number(1) is being reduced to constant
>> NULL, whereupon it concludes that ad_tab_id=to_number(1) is constant
>> NULL, ergo the EXISTS can never succeed, ergo the entire WHERE is
>> constant false.
>
>> CREATE FUNCTION to_number(integer) RETURNS numeric
>>     LANGUAGE plpgsql IMMUTABLE
>>     AS $_$
>> BEGIN
>> RETURN to_number($1, 'S99999999999999D999999');
>> EXCEPTION
>>   WHEN OTHERS THEN
>>     RETURN NULL;
>> END;
>> $_$;
>
> How can the executor "see through" a plpgsql function definition?  Or do
> you mean that this reduction is being done at execution time?  (hmm ...
> this is what a one-time filter is, now that I think about it?)

Note that the function is immutable, and the argument in the query is a
constant. The planner simply runs the function and replaces the function
call with the result.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Heikki Linnakangas wrote:

>> How can the executor "see through" a plpgsql function definition?  Or do
>> you mean that this reduction is being done at execution time?  (hmm ...
>> this is what a one-time filter is, now that I think about it?)
>
> Note that the function is immutable, and the argument in the query is a
> constant. The planner simply runs the function and replaces the function
> call with the result.

The planner runs the function?  Ok, neat -- I didn't know it did that.
(In my question above I meant "how can the _planner_ see through the
definition"; I was thinking that the planner had no way to run the
function).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Heikki Linnakangas wrote:
>> Note that the function is immutable, and the argument in the query is a
>> constant. The planner simply runs the function and replaces the function
>> call with the result.

> The planner runs the function?  Ok, neat -- I didn't know it did that.

See eval_const_expressions.  This is absolutely no different from
reducing 1=1 to TRUE.

            regards, tom lane
On Wed, May 06, 2009 at 01:26:05PM -0400, Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> > Tom Lane wrote:

> Oh!  What is happening is that to_number(1) is being reduced to constant
> NULL, whereupon it concludes that ad_tab_id=to_number(1) is constant
> NULL, ergo the EXISTS can never succeed, ergo the entire WHERE is
> constant false.  I suppose the change I made here
> http://archives.postgresql.org/pgsql-committers/2009-04/msg00329.php
> to improve constant-join-qual handling is what is preventing the
> assertion failure, though I'm still not quite sure why (I'd better look
> closer to see if there is still some form of the bug lurking).
>
> Anyway I think this is an object lesson in why ignoring "WHEN OTHERS"
> errors is dangerous.  to_number(integer) is defined as
>
> CREATE FUNCTION to_number(integer) RETURNS numeric
>     LANGUAGE plpgsql IMMUTABLE
>     AS $_$
> BEGIN
> RETURN to_number($1, 'S99999999999999D999999');
> EXCEPTION
>   WHEN OTHERS THEN
>     RETURN NULL;
> END;
> $_$;
>
> and what is actually happening inside there is
>
> regression=# select to_number(1, 'S99999999999999D999999');
> ERROR:  function to_number(integer, unknown) does not exist
> LINE 1: select to_number(1, 'S99999999999999D999999');
>                ^
> HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
>
> which is probably not what the author expects, but the WHEN OTHERS
> exception is hiding it.

This could be a side effect of my try to find a minimal testcase. I did try to give the minimum needed functions.
In the original app there are more overloaded to_number function which perhaps i did omit...

I will recheck this.. indepdently i agree that hiding all possible exceptions is quite a bad idea...

Regards,
Stefan
I wrote:
> I suppose the change I made here
> http://archives.postgresql.org/pgsql-committers/2009-04/msg00329.php
> to improve constant-join-qual handling is what is preventing the
> assertion failure, though I'm still not quite sure why (I'd better look
> closer to see if there is still some form of the bug lurking).

After tracing through this, that change isn't actually related at all.
The bug was introduced last June (so it's in 8.3.recent too) and was
fixed here:
http://archives.postgresql.org/pgsql-committers/2009-04/msg00203.php
The actual issue can be boiled down to approximately this:

    select ... from a left join b on something
    where false and exists(select 1 from c where something-else)

(IOW, the critical constant-false is actually the occurrence of
"ad_tab_id = to_number(1)" in the outer WHERE, not the one in the
EXISTS as I'd supposed.)  What happens is that when
distribute_qual_to_rels is given the constant-false qual to process,
it correctly determines that this should be pushed to the top of the
join tree; which is not where it is syntactically, because at this point
we've rewritten the whole thing to

    select ... from
        (select ... from a left join b on something
         where false)
        semi join c on something-else

if you pretend that SQL has SEMI JOIN as a native join type.
So this means we execute these lines:

                /* if not below outer join, push it to top of tree */
                if (!below_outer_join)
                    relids =
                        get_relids_in_jointree((Node *) root->parse->jointree,
                                               false);

after which, relids is no longer a subset of qualscope (since qualscope
is only a+b whereas the entire join tree is a+b+c).  So if the
check_outerjoin_delay call a few lines further down happens to return
true, the "Assert(bms_is_subset(relids, qualscope))" after that will
fail.

It formerly was, and now is again, the case that check_outerjoin_delay
can only return true if it enlarges the relids set, which of course is
impossible if relids is already the whole join tree.  So that's why the
bug wasn't seen before, even though the logic has been like this for a
long time.  The patch I applied last June made it possible for
check_outerjoin_delay to return true if there were any outer join within
the passed-in relids set (which is why the LEFT JOIN is a critical part
of the example).  While that patch turned out to be wrong, it seems like
this code is a bit on the fragile side.  What I'm thinking is that when
we set relids to the whole jointree, we ought to simultaneously set
qualscope to the same value; that's only one extra assignment and it
will prevent any future recurrence of the crash if check_outerjoin_delay
gets changed again.  This behavior is semantically sensible because what
this code is doing is pretending that the qual appeared at the top
syntactic level to begin with; if it had been there then qualscope
would match.

Kind of a long-winded explanation of what will be a one-line patch,
but there you have it.

            regards, tom lane
On Wed, May 06, 2009 at 03:47:46PM -0400, Tom Lane wrote:
> I wrote:
> > I suppose the change I made here
> > http://archives.postgresql.org/pgsql-committers/2009-04/msg00329.php
> > to improve constant-join-qual handling is what is preventing the
> > assertion failure, though I'm still not quite sure why (I'd better look
> > closer to see if there is still some form of the bug lurking).
>

<snipp>

> Kind of a long-winded explanation of what will be a one-line patch,
> but there you have it.

Hi,

i did retest the original failing query on current head.

Sepcifically:

git clone up to and including the on liner patch by you:

86a4abb3a187bf2cc548aedd58125274ac724b1c
Tweak distribute_qual_to_rels so that when we decide a pseudoconstant ...

and cannot reproduce the failure anymore in this version. :)

Thanks for the timely investigation of the issue

Regards,
Stefan