Обсуждение: Optimizer problem in 8.1.6
Maybe this is already solved in more advanced releases, but just in case. VIEW active_users: SELECT * FROM users WHERE active AND ((field IS NULL) OR (NOT field)); Table users has index on text field login. EXPLAIN SELECT * from active_users where login='xxx'; QUERY PLAN ---------------------------------------------------------------------------------- Index Scan using active_users on users u (cost=0.00..5.97 rows=1 width=131) Index Cond: ("login" = 'xxx'::text) Filter: (active AND ((field1 IS NULL) OR (NOT field1))) So far, everything OK. Now, combined (sorry for the convoluted query, it is build automatically by an app). EXPLAIN SELECT DISTINCT p.id FROM partes_tecnicos p, rel_usr_sector_parte_tecnico r, active_users u WHERE ((r.id_parte_tecnico=p.id AND r.id_usr=u.id AND u.login='xxx' AND r.id_sector=p.id_sector_actual AND p.id_cola_por_ambito=1) OR p.id_cola_por_ambito=1) AND p.id_situacion!=6; ----- Unique (cost=1016.84..22057814.97 rows=219 width=4) -> Nested Loop (cost=1016.84..19607287.64 rows=980210931 width=4) -> Nested Loop (cost=8.07..2060.25 rows=100959 width=4) -> Index Scan using partes_tecnicos_pkey on partes_tecnicos p (cost=0.00..33.00 rows=219 width=4) Filter: ((id_cola_por_ambito = 1) AND (id_situacion <> 6)) -> Materialize (cost=8.07..12.68 rows=461 width=0) -> Seq Scan on rel_usr_sector_parte_tecnico r (cost=0.00..7.61 rows=461 width=0) -> Materialize (cost=1008.77..1105.86 rows=9709 width=0) -> Seq Scan on users u (cost=0.00..999.06 rows=9709 width=0) Filter: (active AND ((field1 IS NULL) OR (NOT field1))) Notice the seq. scan on users. It is solved using: EXPLAIN SELECT DISTINCT p.id FROM partes_tecnicos p, pt.rel_usr_sector_parte_tecnico r, (SELECT id FROM active_users WHERE login='xxx') u WHERE ((r.id_parte_tecnico=p.id AND r.id_usr=u.id AND r.id_sector=p.id_sector_actual AND p.id_cola_por_ambito=1) OR p.id_cola_por_ambito=1 ) AND p.id_situacion!=6; ------------------------------------------------------------------------------------------------------------- Unique (cost=18.65..2323.23 rows=219 width=4) -> Nested Loop (cost=18.65..2070.83 rows=100959 width=4) -> Index Scan using partes_tecnicos_pkey on partes_tecnicos p (cost=0.00..33.00 rows=219 width=4) Filter: ((id_cola_por_ambito = 1) AND (id_situacion <> 6)) -> Materialize (cost=18.65..23.26 rows=461 width=0) -> Nested Loop (cost=0.00..18.19 rows=461 width=0) -> Index Scan using active_users on users u (cost=0.00..5.97 rows=1 width=0) Index Cond: ("login" = 'xxx'::text) Filter: (active AND ((field1 IS NULL) OR (NOT field1))) -> Seq Scan on rel_usr_sector_parte_tecnico r (cost=0.00..7.61 rows=461 width=0) (10 rows) Thanks! Fernando.
On Jun 22, 2007, at 10:16 , Fernando Schapachnik wrote: > EXPLAIN SELECT DISTINCT p.id Can you provide EXPLAIN ANALYZE? I suspect that when you rewrote the query it changed how the planner took into account the statistics. If your statistics are off, perhaps this changes how the planner rewrites the query. Michael Glaesemann grzm seespotcode net
On Jun 22, 2007, at 10:16 , Fernando Schapachnik wrote: > Maybe this is already solved in more advanced releases, but just in > case. > > VIEW active_users: > SELECT * FROM users WHERE active AND ((field IS NULL) OR (NOT field)); You could also rewrite this as SELECT * FROM users WHERE active AND field IS NOT TRUE; IS NOT TRUE returns true for false and null: SELECT true IS NOT TRUE AS "true" , false IS NOT TRUE AS "false" , null IS NOT TRUE AS "null"; true | false | null ------+-------+------ f | t | t This might be a little easier to read. Michael Glaesemann grzm seespotcode net
Fernando Schapachnik <fernando@mecon.gov.ar> writes: > Now, combined (sorry for the convoluted query, it is build > automatically by an app). > EXPLAIN SELECT DISTINCT p.id > FROM partes_tecnicos p, > rel_usr_sector_parte_tecnico r, active_users u > WHERE ((r.id_parte_tecnico=p.id AND r.id_usr=u.id AND > u.login='xxx' AND r.id_sector=p.id_sector_actual AND > p.id_cola_por_ambito=1) > OR p.id_cola_por_ambito=1) > AND p.id_situacion!=6; Is this query really what you want to do? Because the OR overrides all the join conditions, meaning that rows having p.id_cola_por_ambito=1 AND p.id_situacion!=6 must produce Cartesian products against every row in each of the other tables. I think your SQL-building app is broken. regards, tom lane
En un mensaje anterior, Tom Lane escribió: > Fernando Schapachnik <fernando@mecon.gov.ar> writes: > > Now, combined (sorry for the convoluted query, it is build > > automatically by an app). > > > EXPLAIN SELECT DISTINCT p.id > > FROM partes_tecnicos p, > > rel_usr_sector_parte_tecnico r, active_users u > > WHERE ((r.id_parte_tecnico=p.id AND r.id_usr=u.id AND > > u.login='xxx' AND r.id_sector=p.id_sector_actual AND > > p.id_cola_por_ambito=1) > > OR p.id_cola_por_ambito=1) > > AND p.id_situacion!=6; > > Is this query really what you want to do? Because the OR overrides all > the join conditions, meaning that rows having p.id_cola_por_ambito=1 > AND p.id_situacion!=6 must produce Cartesian products against every > row in each of the other tables. > > I think your SQL-building app is broken. Yes, yes, we found this while working on improving the query. I just wanted to point out that the optimizer was doing a sequential scan in a situation it could unfould de active_users definition, add the login='xxx' clause, and use the index on the users table. Thanks. Fernando.
Fernando Schapachnik <fernando@mecon.gov.ar> writes: > En un mensaje anterior, Tom Lane escribi�: >> Is this query really what you want to do? Because the OR overrides all >> the join conditions, meaning that rows having p.id_cola_por_ambito=1 >> AND p.id_situacion!=6 must produce Cartesian products against every >> row in each of the other tables. > Yes, yes, we found this while working on improving the query. I just > wanted to point out that the optimizer was doing a sequential scan > in a situation it could unfould de active_users definition, add the > login='xxx' clause, and use the index on the users table. What for, when the query requires fetching the entire table anyway? regards, tom lane
En un mensaje anterior, Tom Lane escribió: > Fernando Schapachnik <fernando@mecon.gov.ar> writes: > > Now, combined (sorry for the convoluted query, it is build > > automatically by an app). > > > EXPLAIN SELECT DISTINCT p.id > > FROM partes_tecnicos p, > > rel_usr_sector_parte_tecnico r, active_users u > > WHERE ((r.id_parte_tecnico=p.id AND r.id_usr=u.id AND > > u.login='xxx' AND r.id_sector=p.id_sector_actual AND > > p.id_cola_por_ambito=1) > > OR p.id_cola_por_ambito=1) > > AND p.id_situacion!=6; > > Is this query really what you want to do? Because the OR overrides all > the join conditions, meaning that rows having p.id_cola_por_ambito=1 > AND p.id_situacion!=6 must produce Cartesian products against every > row in each of the other tables. A rewritten query still exhibits the same behavior: VACUUM verbose ANALYZE users; [...] INFO: analyzing "users" INFO: "users": scanned 778 of 778 pages, containing 22320 live rows and 3 dead rows; 3000 rows in sample, 22320 estimated total rows EXPLAIN ANALYZE SELECT DISTINCT p.id FROM partes_tecnicos p WHERE p.id IN (SELECT r.id_parte_tecnico FROM rel_usr_sector_parte_tecnico r, active_users u WHERE (r.id_usr=u.id AND u.login='xxx' AND r.id_sector=p.id_sector_actual AND p.id_cola_por_ambito=1) OR p.id_cola_por_ambito=1) AND p.id_situacion!=6; Unique (cost=0.00..19045387.60 rows=177 width=4) (actual time=0.331..997.593 rows=209 loops=1) -> Index Scan using partes_tecnicos_pkey on partes_tecnicos p (cost=0.00..19045387.16 rows=177 width=4) (actual time=0.323..995.797 rows=209 loops=1) Filter: ((id_situacion <> 6) AND (subplan)) SubPlan -> Result (cost=8.07..90878.33 rows=4493367 width=4) (actual time=0.028..3.250 rows=178 loops=254) One-Time Filter: ($0 = 1) -> Nested Loop (cost=8.07..90878.33 rows=4493367 width=4) (actual time=0.025..2.393 rows=216 loops=209) -> Seq Scan on users u (cost=0.00..1002.92 rows=9747 width=0) (actual time=0.009..0.009 rows=1 loops=209) Filter: (active AND ((field1 IS NULL) OR (NOT field1))) -> Materialize (cost=8.07..12.68 rows=461 width=4) (actual time=0.004..0.800 rows=216 loops=209) -> Seq Scan on rel_usr_sector_parte_tecnico r (cost=0.00..7.61 rows=461 width=4) (actual time=0.008..2.128 rows=488 loops=1) Total runtime: 998.552 ms (12 rows) Notice again the seq scan on users instead of using the index and the very off estimate. Thanks. Fernando.
Fernando Schapachnik <fernando@mecon.gov.ar> writes: > A rewritten query still exhibits the same behavior: > EXPLAIN ANALYZE SELECT DISTINCT p.id > FROM partes_tecnicos p > WHERE > p.id IN > (SELECT r.id_parte_tecnico FROM > rel_usr_sector_parte_tecnico r, active_users u > WHERE (r.id_usr=u.id AND u.login='xxx' AND > r.id_sector=p.id_sector_actual AND > p.id_cola_por_ambito=1) > OR p.id_cola_por_ambito=1) > AND p.id_situacion!=6; [ shrug... ] This is still telling the system to perform a Cartesian-product join when p.id_cola_por_ambito=1. A sane formulation of the query might look like EXPLAIN ANALYZE SELECT DISTINCT p.id FROM partes_tecnicos p WHERE (p.id_cola_por_ambito=1 OR p.id IN (SELECT r.id_parte_tecnico FROM rel_usr_sector_parte_tecnico r, active_users u WHERE (r.id_usr=u.id AND u.login='xxx' AND r.id_sector=p.id_sector_actual))) AND p.id_situacion!=6; ie, get the constant term out of the sub-select. This is not exactly the same thing though --- in particular, what do you intend should happen if p.id has no matches whatsoever in r.id_parte_tecnico, yet p.id_cola_por_ambito=1? regards, tom lane
En un mensaje anterior, Michael Glaesemann escribió: > > On Jun 22, 2007, at 10:16 , Fernando Schapachnik wrote: > > >EXPLAIN SELECT DISTINCT p.id > > Can you provide EXPLAIN ANALYZE? I suspect that when you rewrote the > query it changed how the planner took into account the statistics. If > your statistics are off, perhaps this changes how the planner > rewrites the query. Sure. The DB is VACUUM'ed daily, and the users database only received a few updates per day. This is from the rewrote one: --------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=18.65..2838.38 rows=268 width=4) (actual time=0.265..1503.554 rows=209 loops=1) -> Nested Loop (cost=18.65..2529.51 rows=123548 width=4) (actual time=0.257..1127.666 rows=101992 loops=1) -> Index Scan using partes_tecnicos_pkey on partes_tecnicos p (cost=0.00..39.89 rows=268 width=4) (actual time=0.025..2.115 rows=209 loops=1) Filter: ((id_cola_por_ambito = 1) AND (id_situacion <> 6)) -> Materialize (cost=18.65..23.26 rows=461 width=0) (actual time=0.005..1.817 rows=488 loops=209) -> Nested Loop (cost=0.00..18.19 rows=461 width=0) (actual time=0.209..5.670 rows=488 loops=1) -> Index Scan using active_users on users u (cost=0.00..5.97 rows=1 width=0) (actual time=0.141..0.147 rows=1 loops=1) Index Cond: ("login" = 'xxx'::text) Filter: (active AND ((field1 IS NULL) OR (NOT field1))) -> Seq Scan on rel_usr_sector_parte_tecnico r (cost=0.00..7.61 rows=461 width=0) (actual time=0.053..1.995 rows=488 loops=1) Total runtime: 1504.500 ms (11 rows) The original one is taking a *lot* of time (more than an hour by now). Thanks! Fernando.