Re: Inefficient queryplan for query with intersectable

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Inefficient queryplan for query with intersectable
Дата
Msg-id 26436.1125096976@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Inefficient queryplan for query with intersectable  (Arjen van der Meijden <acmmailing@tweakers.net>)
Ответы Re: Inefficient queryplan for query with intersectable  (Arjen van der Meijden <acmmailing@tweakers.net>)
Список pgsql-performance
Arjen van der Meijden <acmmailing@tweakers.net> writes:
> As said, it chooses sequential scans or "the wrong index plans" over a
> perfectly good plan that is just not selected when the parameters are
> "too well tuned" or sequential scanning of the table is allowed.

I think some part of the problem comes from using inconsistent
datatypes.  For instance, it seems very odd that the thing is not
using a hash or something to handle

 t_0.Cat2 IN (SELECT 545 UNION SELECT ID FROM cat WHERE ParentID = 545)

seeing that it correctly guesses there are only going to be about 8 rows
in the union.  Part of the reason is that cat2 is smallint, whereas the
output of the union must be at least int, maybe wider depending on the
datatype of cat.id (which you did not show us); so the comparison isn't
hashable.  Even a smallint vs int comparison would be mergejoinable,
though, so I'm really wondering what cat.id is.

Another big part of the problem comes from poor result size estimation.
I'm not sure you can eliminate that entirely given the multiple
conditions on different columns (which'd require cross-column statistics
to really do well, which we do not have).  But you could avoid
constructs like

    WHERE ... t_1.recordtimestamp >=
      (SELECT max_date - 60 FROM last_dates WHERE table_name = 'pricetracker')

The planner is basically going to throw up its hands and make a default
guess on the selectivity of this; it's not smart enough to decide that
the sub-select probably represents a constant.  What I'd do with this
is to define a function marked STABLE for the sub-select result, perhaps
something like

create function get_last_date(tabname text, offsetdays int)
returns timestamp as $$
SELECT max_date - $2 FROM last_dates WHERE table_name = $1
$$ language sql strict stable;

(I'm guessing as to datatypes and the amount of parameterization you
need.)  Then write the query like

    WHERE ... t_1.recordtimestamp >= get_last_date('pricetracker', 60)

In this formulation the planner will be able to make a reasonable guess
about how many rows will match ... at least if your statistics are up
to date ...

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: asif ali
Дата:
Сообщение: Re: Weird performance drop after VACUUM
Следующее
От: Philip Hallstrom
Дата:
Сообщение: Re: Weird performance drop after VACUUM