Обсуждение: Optimizer problem in 8.1.6

Поиск
Список
Период
Сортировка

Optimizer problem in 8.1.6

От
Fernando Schapachnik
Дата:
Maybe this is already solved in more advanced releases, but just in
case.

VIEW active_users:
SELECT * FROM users WHERE active AND ((field IS NULL) OR (NOT field));

Table users has index on text field login.

EXPLAIN SELECT * from active_users where login='xxx';
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Index Scan using active_users on users u  (cost=0.00..5.97 rows=1
width=131)
   Index Cond: ("login" = 'xxx'::text)
   Filter: (active AND ((field1 IS NULL) OR (NOT field1)))

So far, everything OK.

Now, combined (sorry for the convoluted query, it is build
automatically by an app).

EXPLAIN SELECT DISTINCT p.id
FROM partes_tecnicos p,
rel_usr_sector_parte_tecnico r, active_users u
WHERE ((r.id_parte_tecnico=p.id AND r.id_usr=u.id AND
u.login='xxx' AND r.id_sector=p.id_sector_actual AND
p.id_cola_por_ambito=1)
OR p.id_cola_por_ambito=1)
AND p.id_situacion!=6;

-----
 Unique  (cost=1016.84..22057814.97 rows=219 width=4)
   ->  Nested Loop  (cost=1016.84..19607287.64 rows=980210931 width=4)
         ->  Nested Loop  (cost=8.07..2060.25 rows=100959 width=4)
               ->  Index Scan using partes_tecnicos_pkey on
partes_tecnicos p  (cost=0.00..33.00 rows=219 width=4)
                     Filter: ((id_cola_por_ambito = 1) AND
(id_situacion <> 6))
               ->  Materialize  (cost=8.07..12.68 rows=461 width=0)
                     ->  Seq Scan on rel_usr_sector_parte_tecnico r
(cost=0.00..7.61 rows=461 width=0)
         ->  Materialize  (cost=1008.77..1105.86 rows=9709 width=0)
               ->  Seq Scan on users u  (cost=0.00..999.06
rows=9709 width=0)
                     Filter: (active AND ((field1 IS NULL) OR
(NOT field1)))

Notice the seq. scan on users.

It is solved using:

EXPLAIN SELECT DISTINCT p.id
FROM partes_tecnicos p, pt.rel_usr_sector_parte_tecnico r,
(SELECT id FROM active_users WHERE
login='xxx') u
WHERE ((r.id_parte_tecnico=p.id AND r.id_usr=u.id
AND r.id_sector=p.id_sector_actual AND p.id_cola_por_ambito=1)
OR p.id_cola_por_ambito=1
) AND p.id_situacion!=6;


-------------------------------------------------------------------------------------------------------------
 Unique  (cost=18.65..2323.23 rows=219 width=4)
   ->  Nested Loop  (cost=18.65..2070.83 rows=100959 width=4)
         ->  Index Scan using partes_tecnicos_pkey on partes_tecnicos
p  (cost=0.00..33.00 rows=219 width=4)
               Filter: ((id_cola_por_ambito = 1) AND (id_situacion <>
6))
         ->  Materialize  (cost=18.65..23.26 rows=461 width=0)
               ->  Nested Loop  (cost=0.00..18.19 rows=461 width=0)
                     ->  Index Scan using active_users on users u
(cost=0.00..5.97 rows=1 width=0)
                           Index Cond: ("login" = 'xxx'::text)
                           Filter: (active AND ((field1 IS NULL)
OR (NOT field1)))
                     ->  Seq Scan on rel_usr_sector_parte_tecnico r
(cost=0.00..7.61 rows=461 width=0)
(10 rows)


Thanks!

Fernando.

Re: Optimizer problem in 8.1.6

От
Michael Glaesemann
Дата:
On Jun 22, 2007, at 10:16 , Fernando Schapachnik wrote:

> EXPLAIN SELECT DISTINCT p.id

Can you provide EXPLAIN ANALYZE? I suspect that when you rewrote the
query it changed how the planner took into account the statistics. If
your statistics are off, perhaps this changes how the planner
rewrites the query.

Michael Glaesemann
grzm seespotcode net



Re: Optimizer problem in 8.1.6

От
Michael Glaesemann
Дата:
On Jun 22, 2007, at 10:16 , Fernando Schapachnik wrote:

> Maybe this is already solved in more advanced releases, but just in
> case.
>
> VIEW active_users:
> SELECT * FROM users WHERE active AND ((field IS NULL) OR (NOT field));

You could also rewrite this as

SELECT *
FROM users
WHERE active
     AND field IS NOT TRUE;

IS NOT TRUE returns true for false and null:

SELECT true IS NOT TRUE AS "true"
     , false IS NOT TRUE AS "false"
     , null IS NOT TRUE AS "null";
true | false | null
------+-------+------
f    | t     | t

This might be a little easier to read.

Michael Glaesemann
grzm seespotcode net



Re: Optimizer problem in 8.1.6

От
Tom Lane
Дата:
Fernando Schapachnik <fernando@mecon.gov.ar> writes:
> Now, combined (sorry for the convoluted query, it is build
> automatically by an app).

> EXPLAIN SELECT DISTINCT p.id
> FROM partes_tecnicos p,
> rel_usr_sector_parte_tecnico r, active_users u
> WHERE ((r.id_parte_tecnico=p.id AND r.id_usr=u.id AND
> u.login='xxx' AND r.id_sector=p.id_sector_actual AND
> p.id_cola_por_ambito=1)
> OR p.id_cola_por_ambito=1)
> AND p.id_situacion!=6;

Is this query really what you want to do?  Because the OR overrides all
the join conditions, meaning that rows having p.id_cola_por_ambito=1
AND p.id_situacion!=6 must produce Cartesian products against every
row in each of the other tables.

I think your SQL-building app is broken.

            regards, tom lane

Re: Optimizer problem in 8.1.6

От
Fernando Schapachnik
Дата:
En un mensaje anterior, Tom Lane escribió:
> Fernando Schapachnik <fernando@mecon.gov.ar> writes:
> > Now, combined (sorry for the convoluted query, it is build
> > automatically by an app).
>
> > EXPLAIN SELECT DISTINCT p.id
> > FROM partes_tecnicos p,
> > rel_usr_sector_parte_tecnico r, active_users u
> > WHERE ((r.id_parte_tecnico=p.id AND r.id_usr=u.id AND
> > u.login='xxx' AND r.id_sector=p.id_sector_actual AND
> > p.id_cola_por_ambito=1)
> > OR p.id_cola_por_ambito=1)
> > AND p.id_situacion!=6;
>
> Is this query really what you want to do?  Because the OR overrides all
> the join conditions, meaning that rows having p.id_cola_por_ambito=1
> AND p.id_situacion!=6 must produce Cartesian products against every
> row in each of the other tables.
>
> I think your SQL-building app is broken.

Yes, yes, we found this while working on improving the query. I just
wanted to point out that the optimizer was doing a sequential scan
in a situation it could unfould de active_users definition, add the
login='xxx' clause, and use the index on the users table.

Thanks.

Fernando.

Re: Optimizer problem in 8.1.6

От
Tom Lane
Дата:
Fernando Schapachnik <fernando@mecon.gov.ar> writes:
> En un mensaje anterior, Tom Lane escribi�:
>> Is this query really what you want to do?  Because the OR overrides all
>> the join conditions, meaning that rows having p.id_cola_por_ambito=1
>> AND p.id_situacion!=6 must produce Cartesian products against every
>> row in each of the other tables.

> Yes, yes, we found this while working on improving the query. I just
> wanted to point out that the optimizer was doing a sequential scan
> in a situation it could unfould de active_users definition, add the
> login='xxx' clause, and use the index on the users table.

What for, when the query requires fetching the entire table anyway?

            regards, tom lane

Re: Optimizer problem in 8.1.6

От
Fernando Schapachnik
Дата:
En un mensaje anterior, Tom Lane escribió:
> Fernando Schapachnik <fernando@mecon.gov.ar> writes:
> > Now, combined (sorry for the convoluted query, it is build
> > automatically by an app).
>
> > EXPLAIN SELECT DISTINCT p.id
> > FROM partes_tecnicos p,
> > rel_usr_sector_parte_tecnico r, active_users u
> > WHERE ((r.id_parte_tecnico=p.id AND r.id_usr=u.id AND
> > u.login='xxx' AND r.id_sector=p.id_sector_actual AND
> > p.id_cola_por_ambito=1)
> > OR p.id_cola_por_ambito=1)
> > AND p.id_situacion!=6;
>
> Is this query really what you want to do?  Because the OR overrides all
> the join conditions, meaning that rows having p.id_cola_por_ambito=1
> AND p.id_situacion!=6 must produce Cartesian products against every
> row in each of the other tables.

A rewritten query still exhibits the same behavior:

VACUUM verbose ANALYZE users;
[...]
INFO:  analyzing "users"
INFO:  "users": scanned 778 of 778 pages, containing 22320 live
rows and 3 dead rows; 3000 rows in sample, 22320 estimated total rows

EXPLAIN ANALYZE SELECT DISTINCT p.id
FROM partes_tecnicos p
WHERE
p.id IN
        (SELECT r.id_parte_tecnico FROM
        rel_usr_sector_parte_tecnico r, active_users u
        WHERE (r.id_usr=u.id AND u.login='xxx' AND
        r.id_sector=p.id_sector_actual AND
            p.id_cola_por_ambito=1)
OR p.id_cola_por_ambito=1)
AND p.id_situacion!=6;

 Unique  (cost=0.00..19045387.60 rows=177 width=4) (actual
time=0.331..997.593 rows=209 loops=1)
   ->  Index Scan using partes_tecnicos_pkey on partes_tecnicos p
(cost=0.00..19045387.16 rows=177 width=4) (actual time=0.323..995.797
rows=209 loops=1)
         Filter: ((id_situacion <> 6) AND (subplan))
         SubPlan
           ->  Result  (cost=8.07..90878.33 rows=4493367 width=4)
(actual time=0.028..3.250 rows=178 loops=254)
                 One-Time Filter: ($0 = 1)
                 ->  Nested Loop  (cost=8.07..90878.33 rows=4493367
width=4) (actual time=0.025..2.393 rows=216 loops=209)
                       ->  Seq Scan on users u  (cost=0.00..1002.92
rows=9747 width=0) (actual time=0.009..0.009 rows=1 loops=209)
                             Filter: (active AND ((field1 IS
NULL) OR (NOT field1)))
                       ->  Materialize  (cost=8.07..12.68 rows=461
width=4) (actual time=0.004..0.800 rows=216 loops=209)
                             ->  Seq Scan on
rel_usr_sector_parte_tecnico r  (cost=0.00..7.61 rows=461 width=4)
(actual time=0.008..2.128 rows=488 loops=1)
 Total runtime: 998.552 ms
(12 rows)

Notice again the seq scan on users instead of using the index and the
very off estimate.

Thanks.

Fernando.


Re: Optimizer problem in 8.1.6

От
Tom Lane
Дата:
Fernando Schapachnik <fernando@mecon.gov.ar> writes:
> A rewritten query still exhibits the same behavior:

> EXPLAIN ANALYZE SELECT DISTINCT p.id
> FROM partes_tecnicos p
> WHERE
> p.id IN
>         (SELECT r.id_parte_tecnico FROM
>         rel_usr_sector_parte_tecnico r, active_users u
>         WHERE (r.id_usr=u.id AND u.login='xxx' AND
>         r.id_sector=p.id_sector_actual AND
>             p.id_cola_por_ambito=1)
> OR p.id_cola_por_ambito=1)
> AND p.id_situacion!=6;

[ shrug... ]  This is still telling the system to perform a
Cartesian-product join when p.id_cola_por_ambito=1.

A sane formulation of the query might look like

EXPLAIN ANALYZE SELECT DISTINCT p.id
FROM partes_tecnicos p
WHERE
(p.id_cola_por_ambito=1 OR
 p.id IN
        (SELECT r.id_parte_tecnico FROM
        rel_usr_sector_parte_tecnico r, active_users u
        WHERE (r.id_usr=u.id AND u.login='xxx' AND
        r.id_sector=p.id_sector_actual)))
AND p.id_situacion!=6;

ie, get the constant term out of the sub-select.  This is not exactly
the same thing though --- in particular, what do you intend should
happen if p.id has no matches whatsoever in r.id_parte_tecnico,
yet p.id_cola_por_ambito=1?

            regards, tom lane

Re: Optimizer problem in 8.1.6

От
Fernando Schapachnik
Дата:
En un mensaje anterior, Michael Glaesemann escribió:
>
> On Jun 22, 2007, at 10:16 , Fernando Schapachnik wrote:
>
> >EXPLAIN SELECT DISTINCT p.id
>
> Can you provide EXPLAIN ANALYZE? I suspect that when you rewrote the
> query it changed how the planner took into account the statistics. If
> your statistics are off, perhaps this changes how the planner
> rewrites the query.

Sure. The DB is VACUUM'ed daily, and the users database only received
a few updates per day.

This is from the rewrote one:



---------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=18.65..2838.38 rows=268 width=4) (actual
time=0.265..1503.554 rows=209 loops=1)
   ->  Nested Loop  (cost=18.65..2529.51 rows=123548 width=4) (actual
time=0.257..1127.666 rows=101992 loops=1)
         ->  Index Scan using partes_tecnicos_pkey on partes_tecnicos
p  (cost=0.00..39.89 rows=268 width=4) (actual time=0.025..2.115
rows=209 loops=1)
               Filter: ((id_cola_por_ambito = 1) AND (id_situacion <>
6))
         ->  Materialize  (cost=18.65..23.26 rows=461 width=0) (actual
time=0.005..1.817 rows=488 loops=209)
               ->  Nested Loop  (cost=0.00..18.19 rows=461 width=0)
(actual time=0.209..5.670 rows=488 loops=1)
                     ->  Index Scan using active_users on users u
(cost=0.00..5.97 rows=1 width=0) (actual time=0.141..0.147 rows=1
loops=1)
                           Index Cond: ("login" =
'xxx'::text)
                           Filter: (active AND ((field1 IS NULL)
OR (NOT field1)))
                     ->  Seq Scan on rel_usr_sector_parte_tecnico r
(cost=0.00..7.61 rows=461 width=0) (actual time=0.053..1.995 rows=488
loops=1)
 Total runtime: 1504.500 ms
(11 rows)


The original one is taking a *lot* of time (more than an hour by now).

Thanks!

Fernando.