Обсуждение: Outer where pushed down
Hi all,
consider this view:
CREATE OR REPLACE VIEW v_current_connection AS
SELECT ul.id_user
FROM user_login ul, current_connection cc
WHERE ul.id_user = cc.id_user;
And this is the explain on a usage of that view:
# explain select * from v_current_connection_test where sp_connected_test(id_user) = FALSE;
QUERY PLAN
----------------------------------------------------------------------------------------------------------Hash Join
(cost=42.79..1325.14rows=451 width=5) Hash Cond: ("outer".id_user = "inner".id_user) -> Seq Scan on user_login ul
(cost=0.00..1142.72rows=27024 width=4) Filter: (sp_connected_test(id_user) = false) -> Hash
(cost=40.49..40.49rows=919 width=5) -> Index Scan using idx_connected on current_connection cc
(cost=0.00..40.49rows=919 width=5) Index Cond: (connected = true)
(7 rows)
apart my initial surprise to see that function applied at rows not returned by the view
( Tom Lane explained me that the planner is able to push down the outer condition )
why postgres doesn't apply that function at table current_connection given the fact are extimated
only 919 vs 27024 rows?
redefining the view:
CREATE OR REPLACE VIEW v_current_connection AS
SELECT cc.id_user
FROM user_login ul, current_connection cc
WHERE ul.id_user = cc.id_user;
then I obtain the "desidered" plan.
# explain select * from v_current_connection_test where sp_connected_test(id_user ) = FALSE;
QUERY PLAN
----------------------------------------------------------------------------------------------------------Hash Join
(cost=46.23..1193.47rows=452 width=5) Hash Cond: ("outer".id_user = "inner".id_user) -> Seq Scan on user_login ul
(cost=0.00..872.48rows=54048 width=4) -> Hash (cost=45.08..45.08 rows=460 width=5) -> Index Scan using
idx_connectedon current_connection cc (cost=0.00..45.08 rows=460 width=5) Index Cond: (connected = true)
Filter: (sp_connected_test(id_user) = false)
(7 rows)
Is not possible in any way push postgres to apply that function to the right table ?
Shall I rewrite the views figuring out wich column is better to expose ?
Regards
Gaetano Mendola
Gaetano Mendola <mendola@bigfoot.com> writes:
> CREATE OR REPLACE VIEW v_current_connection AS
> SELECT ul.id_user
> FROM user_login ul,
> current_connection cc
> WHERE ul.id_user = cc.id_user;
> # explain select * from v_current_connection_test where sp_connected_test(id_user) = FALSE;
> why postgres doesn't apply that function at table current_connection given the fact are extimated
> only 919 vs 27024 rows?
Because the condition is on a field of the other table.
You seem to wish that the planner would use "ul.id_user = cc.id_user"
to decide that "sp_connected_test(ul.id_user)" can be rewritten as
"sp_connected_test(cc.id_user)", but in general this is not safe.
The planner has little idea of what the datatype-specific semantics
of equality are, and none whatsoever what the semantics of your
function are. As a real-world example: IEEE-standard floating
point math considers that +0 and -0 are different bit patterns.
They compare as equal, but it's very easy to come up with user-defined
functions that will yield different results for the two inputs.
So the proposed transformation is definitely unsafe for float8.
regards, tom lane
Tom Lane wrote: > Gaetano Mendola <mendola@bigfoot.com> writes: >> CREATE OR REPLACE VIEW v_current_connection AS >> SELECT ul.id_user >> FROM user_login ul, >> current_connection cc >> WHERE ul.id_user = cc.id_user; > >> # explain select * from v_current_connection_test where sp_connected_test(id_user) = FALSE; > >> why postgres doesn't apply that function at table current_connection given the fact are extimated >> only 919 vs 27024 rows? > > Because the condition is on a field of the other table. > > You seem to wish that the planner would use "ul.id_user = cc.id_user" > to decide that "sp_connected_test(ul.id_user)" can be rewritten as > "sp_connected_test(cc.id_user)", but in general this is not safe. > The planner has little idea of what the datatype-specific semantics > of equality are, and none whatsoever what the semantics of your > function are. As a real-world example: IEEE-standard floating > point math considers that +0 and -0 are different bit patterns. > They compare as equal, but it's very easy to come up with user-defined > functions that will yield different results for the two inputs. > So the proposed transformation is definitely unsafe for float8. And what about to define for each type when this is safe and let the planner make his best choice ? Rewriting that view the execution time passed from 4 secs to 1 sec, that is not bad if the planner can do it autonomously. In this very example I can decide if it's better expose one column or the other one but in other cases not... Regards Gaetano Mendola