Обсуждение: is it a known issue or just a bug?

Поиск
Список
Период
Сортировка

is it a known issue or just a bug?

От
Hans-Jürgen Schönig
Дата:
Folks,

Last week one of my students confronted me with a nice little SQL 
statement which made me call gdb ...

Consider the following scenario:

[hs@fedora bug]$ cat q1.sql
create temporary sequence seq_ab;

select * from (Select nextval('seq_ab') as nv,                      * from    ( select 
t_product.id,t_text.value,t_price.price                                from    t_product,t_price,t_text
      where   t_product.id = t_price.product_id                                and t_product.name = t_text.id
                    and t_text.lang='de'                                and t_price.typ = 'default'
  order by price desc ) as t ) as u                --      WHERE nv <= 1                ;
 
[hs@fedora bug]$ psql test < q1.sql
CREATE SEQUENCE nv | id |  value  | price
----+----+---------+-------  1 |  3 | Banane  |    12  2 |  1 | T-Shirt |    10  3 |  2 | Apfel   |     7
(3 rows)

this query returns the right result.
however, when uncommenting the WHERE clause things look different:

[hs@fedora bug]$ cat q2.sql
create temporary sequence seq_ab;

select * from (Select nextval('seq_ab') as nv,                      * from    ( select 
t_product.id,t_text.value,t_price.price                                from    t_product,t_price,t_text
      where   t_product.id = t_price.product_id                                and t_product.name = t_text.id
                    and t_text.lang='de'                                and t_price.typ = 'default'
  order by price desc ) as t ) as u                        WHERE nv <= 1                ;
 
[hs@fedora bug]$ psql test < q2.sql
CREATE SEQUENCE nv | id |  value  | price
----+----+---------+-------  4 |  1 | T-Shirt |    10
(1 row)

Obviously nv = 4 is wrong ...
Looking at the execution plan of the second query the problem seems 
quite obvious:
                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------- Subquery Scan
t (cost=69.24..69.26 rows=1 width=68)   ->  Sort  (cost=69.24..69.25 rows=1 width=68)         Sort Key: t_price.price
     ->  Hash Join  (cost=22.51..69.23 rows=1 width=68)               Hash Cond: ("outer".name = "inner".id)
  Join Filter: (nextval('seq_ab'::text) <= 1)               ->  Nested Loop  (cost=0.00..46.68 rows=5 width=40)
           ->  Seq Scan on t_price  (cost=0.00..22.50 rows=5 
 
width=36)                           Filter: (typ = 'default'::text)                     ->  Index Scan using
t_product_pkeyon t_product 
 
(cost=0.00..4.82 rows=1 width=8)                           Index Cond: (t_product.id = "outer".product_id)
->  Hash  (cost=22.50..22.50 rows=5 width=36)                     ->  Seq Scan on t_text  (cost=0.00..22.50 rows=5 
 
width=36)                           Filter: (lang = 'de'::text)
(14 rows)


nextval() is called again when processing the WHERE clause.
this was fine if nextval() would return the same thing again and again 
(which is not the job of nextval).
if the planner materialized the subquery things would materialize the 
subquery in case of unstable functions things would work in this case.

I know I temp table would easily fix this query and it is certainly not 
the best query I have ever seen but still it seems like a bug and I just  wanted to know whether it is a know issue or
not.
Looking at the code I did not quite know whether this is something which 
should / can be fixed or not.

here is the data:
------------------------------------------------------
CREATE TABLE t_text (        id      int4,        lang    text,        value   text
);

CREATE TABLE t_group (        id      int4,        name    int4,           -- mehrsprachig in t_text        valid
boolean
);

INSERT INTO t_group VALUES (1, 1, 't');
INSERT INTO t_text  VALUES (1, 'de', 'Obst');
INSERT INTO t_text  VALUES (1, 'en', 'Fruits');

INSERT INTO t_group VALUES (2, 2, 't');
INSERT INTO t_text  VALUES (2, 'de', 'Kleidung');
INSERT INTO t_text  VALUES (2, 'en', 'Clothes');


CREATE UNIQUE INDEX idx_group_id ON t_group (id);

CREATE TABLE t_product (        id              int4,        name            int4,   -- mehrsprachig in t_text
active         boolean,        PRIMARY KEY (id)
 
);

INSERT INTO t_product VALUES (1, 3, 't');
INSERT INTO t_text  VALUES (3, 'de', 'T-Shirt');
INSERT INTO t_text  VALUES (3, 'en', 'T-Shirt');

INSERT INTO t_product VALUES (2, 4, 't');
INSERT INTO t_text  VALUES (4, 'de', 'Apfel');
INSERT INTO t_text  VALUES (4, 'en', 'Apple');

INSERT INTO t_product VALUES (3, 5, 't');
INSERT INTO t_text  VALUES (5, 'de', 'Banane');
INSERT INTO t_text  VALUES (5, 'en', 'Banana');


CREATE TABLE t_product_group (        product_id      int4    REFERENCES t_product(id)
     ON UPDATE CASCADE                                        ON DELETE CASCADE,        group_id        int4
REFERENCESt_group(id)                                        ON UPDATE CASCADE
ONDELETE CASCADE
 
);

INSERT INTO t_product_group VALUES (2, 1);
INSERT INTO t_product_group VALUES (3, 1);
INSERT INTO t_product_group VALUES (1, 2);

CREATE TABLE t_price (        id              int4,        typ             text,        price           numeric,
product_id     int4    REFERENCES t_product(id)                                        ON UPDATE CASCADE
                       ON DELETE CASCADE,        PRIMARY KEY (id)
 
);

INSERT INTO t_price VALUES (1, 'default', '10', 1);
INSERT INTO t_price VALUES (2, 'sonder', '20', 1);
INSERT INTO t_price VALUES (3, 'spezial', '30', 1);
INSERT INTO t_price VALUES (4, 'default', '7', 2);
INSERT INTO t_price VALUES (5, 'default', '12', 3);

Best regards,
    Hans

-- 
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/720/10 1234567 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at




Re: is it a known issue or just a bug?

От
Tom Lane
Дата:
Hans-Jürgen Schönig <postgres@cybertec.at> writes:
> Consider the following scenario:

> select * from (Select nextval('seq_ab') as nv,
>                        * from    ( select 
> t_product.id,t_text.value,t_price.price
>                                  from    t_product,t_price,t_text
>                          where   t_product.id = t_price.product_id
>                                  and t_product.name = t_text.id
>                                  and t_text.lang='de'
>                                  and t_price.typ = 'default'
>                          order by price desc ) as t ) as u
>                  WHERE nv <= 1
>                  ;

I don't think there's any very clean way to fix this sort of problem in
general.  We could make this particular example work if

(1) we prevented a subquery containing volatile functions in its
targetlist from being flattened into the parent query, and

(2) we prevented outer WHERE clauses from being pushed down into a
subquery when they reference subquery outputs containing volatile
functions.

There has been some recent discussion about doing (1) but I think we
forgot about the necessity to also do (2); otherwise you'd end up with

select * from (Select nextval('seq_ab') as nv,                      ...                      WHERE nextval('seq_ab') <=
1            ) as u                ;
 

which is hardly any better.

Now those things are both doable but where it really falls down is when
you join the subselect to some other table.  Short of materializing the
subselect there'd be no way to guarantee single evaluation of any one
row in the subselect.

I'd be willing to do (1) and (2) but not to force materialization; the
performance hit for that just seems unacceptable.
        regards, tom lane


Re: is it a known issue or just a bug?

От
Josh Berkus
Дата:
Tom,

> I don't think there's any very clean way to fix this sort of problem in
> general.  We could make this particular example work if

Frankly, I don't think there *is* any safe way to use volatile functions in
subqueries -- I certainly avoid it, except now() and random() which as
discussed are special cases.    Perhaps a WARNING is in order?

--
Josh Berkus
Aglio Database Solutions
San Francisco


Re: is it a known issue or just a bug?

От
Hans-Jürgen Schönig
Дата:
Josh Berkus wrote:
> Tom,
> 
> 
>>I don't think there's any very clean way to fix this sort of problem in
>>general.  We could make this particular example work if
> 
> 
> Frankly, I don't think there *is* any safe way to use volatile functions in 
> subqueries -- I certainly avoid it, except now() and random() which as 
> discussed are special cases.    Perhaps a WARNING is in order? 
> 

Personally I like Josh's idea. A warning would be a nice thing.

I have just looked at Oracle and SQL Server to find out what those 
systems are doing ...

Oracle has a very interesting concept *g*:

ORA-02287 sequence number not allowed here:
The usage of a sequence is limited and it can be used only in few areas 
of PL/SQL and SQL coding.

The following are the cases where you can't use a sequence:
For a SELECT Statement:  1. In a WHERE clause  2. In a GROUP BY or ORDER BY clause  3. In a DISTINCT clause  4. Along
witha UNION or INTERSECT or MINUS  5. In a sub-query
 

--------------------------

SQL Server does not support sequences the way we know it so it is hard 
to compare.

I did not have time to test DB2 yet.
Thanks a lot,
    Hans


-- 
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/720/10 1234567 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at




Re: is it a known issue or just a bug?

От
Tom Lane
Дата:
Hans-Jürgen Schönig <postgres@cybertec.at> writes:
> Josh Berkus wrote:
>> Frankly, I don't think there *is* any safe way to use volatile functions in 
>> subqueries -- I certainly avoid it, except now() and random() which as 
>> discussed are special cases.    Perhaps a WARNING is in order? 

> Personally I like Josh's idea. A warning would be a nice thing.

From the planner's perspective, it would have to warn about any volatile
function, which would probably be overly chatty --- remember that the
default marking for user-defined functions is "volatile".

This default may also be a good reason not to put in the anti-flattening
defenses I suggested before, because it would mean that even slight
sloppiness in the definition of a user function could cripple subquery
optimization.  I'm not sure that that's a strong argument, but it's
something to think about.

It'd be easy enough to put in the anti-flattening defenses (checks (1)
and (2) in my prior message) but I've got mixed emotions about whether
this is really a good thing to do.  Any opinions out there?
        regards, tom lane


Re: is it a known issue or just a bug?

От
Josh Berkus
Дата:
Tom,

> It'd be easy enough to put in the anti-flattening defenses (checks (1)
> and (2) in my prior message) but I've got mixed emotions about whether
> this is really a good thing to do.  Any opinions out there?

If my opinion wasn't clear, I was suggesting adding a WARNING and not doing 
anything about flattening.   I can't say that, in 5 years of developing 
applications in Postgres, that this has ever been a problem for me personally 
-- from my perspective the persons reporting the issue needs to re-code their 
query, it's not what sequences were meant for.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco