Обсуждение: Outer where pushed down

Поиск
Список
Период
Сортировка

Outer where pushed down

От
Gaetano Mendola
Дата:
Hi all,

consider this view:

CREATE OR REPLACE VIEW v_current_connection AS
SELECT ul.id_user
FROM   user_login ul,      current_connection cc
WHERE ul.id_user = cc.id_user;


And this is the explain on a usage of that view:

# explain select * from v_current_connection_test where sp_connected_test(id_user) = FALSE;
                 QUERY PLAN
 
----------------------------------------------------------------------------------------------------------Hash Join
(cost=42.79..1325.14rows=451 width=5)  Hash Cond: ("outer".id_user = "inner".id_user)  ->  Seq Scan on user_login ul
(cost=0.00..1142.72rows=27024 width=4)        Filter: (sp_connected_test(id_user) = false)  ->  Hash
(cost=40.49..40.49rows=919 width=5)        ->  Index Scan using idx_connected on current_connection cc
(cost=0.00..40.49rows=919 width=5)              Index Cond: (connected = true)
 
(7 rows)

apart my initial surprise to see that function applied at rows not returned by the view
( Tom Lane explained me that the planner is able to push down the outer condition )
why postgres doesn't apply that function at table current_connection given the fact are extimated
only 919 vs 27024 rows?


redefining the view:

CREATE OR REPLACE VIEW v_current_connection AS
SELECT cc.id_user
FROM   user_login ul,      current_connection cc
WHERE ul.id_user = cc.id_user;

then I obtain the "desidered" plan.


# explain select * from v_current_connection_test where sp_connected_test(id_user ) = FALSE;
                  QUERY PLAN
 
----------------------------------------------------------------------------------------------------------Hash Join
(cost=46.23..1193.47rows=452 width=5)  Hash Cond: ("outer".id_user = "inner".id_user)  ->  Seq Scan on user_login ul
(cost=0.00..872.48rows=54048 width=4)  ->  Hash  (cost=45.08..45.08 rows=460 width=5)        ->  Index Scan using
idx_connectedon current_connection cc  (cost=0.00..45.08 rows=460 width=5)              Index Cond: (connected = true)
           Filter: (sp_connected_test(id_user) = false)
 
(7 rows)



Is not possible in any way push postgres to apply that function to the right table ?
Shall I rewrite the views figuring out wich column is better to expose ?



Regards
Gaetano Mendola

















Re: Outer where pushed down

От
Tom Lane
Дата:
Gaetano Mendola <mendola@bigfoot.com> writes:
> CREATE OR REPLACE VIEW v_current_connection AS
> SELECT ul.id_user
> FROM   user_login ul,
>        current_connection cc
> WHERE ul.id_user = cc.id_user;

> # explain select * from v_current_connection_test where sp_connected_test(id_user) = FALSE;

> why postgres doesn't apply that function at table current_connection given the fact are extimated
> only 919 vs 27024 rows?

Because the condition is on a field of the other table.

You seem to wish that the planner would use "ul.id_user = cc.id_user"
to decide that "sp_connected_test(ul.id_user)" can be rewritten as
"sp_connected_test(cc.id_user)", but in general this is not safe.
The planner has little idea of what the datatype-specific semantics
of equality are, and none whatsoever what the semantics of your
function are.  As a real-world example: IEEE-standard floating
point math considers that +0 and -0 are different bit patterns.
They compare as equal, but it's very easy to come up with user-defined
functions that will yield different results for the two inputs.
So the proposed transformation is definitely unsafe for float8.
        regards, tom lane


Re: Outer where pushed down

От
Gaetano Mendola
Дата:
Tom Lane wrote:
> Gaetano Mendola <mendola@bigfoot.com> writes:
>> CREATE OR REPLACE VIEW v_current_connection AS
>> SELECT ul.id_user
>> FROM   user_login ul,
>>        current_connection cc
>> WHERE ul.id_user = cc.id_user;
> 
>> # explain select * from v_current_connection_test where sp_connected_test(id_user) = FALSE;
> 
>> why postgres doesn't apply that function at table current_connection given the fact are extimated
>> only 919 vs 27024 rows?
> 
> Because the condition is on a field of the other table.
> 
> You seem to wish that the planner would use "ul.id_user = cc.id_user"
> to decide that "sp_connected_test(ul.id_user)" can be rewritten as
> "sp_connected_test(cc.id_user)", but in general this is not safe.
> The planner has little idea of what the datatype-specific semantics
> of equality are, and none whatsoever what the semantics of your
> function are.  As a real-world example: IEEE-standard floating
> point math considers that +0 and -0 are different bit patterns.
> They compare as equal, but it's very easy to come up with user-defined
> functions that will yield different results for the two inputs.
> So the proposed transformation is definitely unsafe for float8.

And what about to define for each type when this is safe and let
the planner make his best choice ?
Rewriting that view the execution time passed from 4 secs to 1 sec,
that is not bad if the planner can do it autonomously. In this very
example I can decide if it's better expose one column or the other
one but in other cases not...


Regards
Gaetano Mendola