Re: how do functions affect query plan?

Поиск

Список

Период

Сортировка

От	David G Johnston
Тема	Re: how do functions affect query plan?
Дата	15 мая 2014 г. 05:43:29
Msg-id	1400132604367-5803996.post@n5.nabble.com обсуждение
Ответ на	how do functions affect query plan? (常超 <chang-chao@hotmail.com>)
Ответы	Re: how do functions affect query plan?
Список	pgsql-performance

Дерево обсуждения

常超 wrote
> Hi,all
> I have a table to save received measure data.
>
>
> CREATE TABLE measure_data
> (
>   id serial NOT NULL,
>   telegram_id integer NOT NULL,
>   measure_time timestamp without time zone NOT NULL,
>   item_id integer NOT NULL,
>   val double precision,
>   CONSTRAINT measure_data_pkey PRIMARY KEY (id)
> );
>
> CREATE INDEX index_measure_data_telegram_id ON measure_data USING btree
> (telegram_id);
>
>
> in my scenario,a telegram contains measure data for multiple data items
> and timestamps,
> BTW,another table is for telegram.
>
> The SQL I used in my application is
>   select * from measure_data where telegram_id in(1,2,...,n)
> and this query used the index_measure_data_telegram_id index,as expected.
>
> In order to see the performance of my query ,
> I used the following query to search the measure data for randomly 30
> telegrams.
>
>
> explain analyze
> SELECT md.*
>   FROM measure_data md
>   where telegram_id in
>     (
>          SELECT distinct
>          trunc((132363-66484) * random() + 66484)
>          FROM generate_series(1,30) as s(telegram_id)
>     )
>   ;
>
> the 132363 and 66484 are the max and min of the telegram id,separately.
>
> What surprised me is that index is not used,instead,a seq scan is
> performed on measure_data.
> Although,intuitively,in this case,it is much wiser to use the index.
> Would you please give some clue to why this happened?
>
> "Hash Semi Join  (cost=65.00..539169.32 rows=10277280 width=28) (actual
> time=76.454..17177.054 rows=9360 loops=1)"
> "  Hash Cond: ((md.telegram_id)::double precision = (trunc(((65879::double
> precision * random()) + 66484::double precision))))"
> "  ->  Seq Scan on measure_data md  (cost=0.00..356682.60 rows=20554560
> width=28) (actual time=0.012..13874.809 rows=20554560 loops=1)"
> "  ->  Hash  (cost=52.50..52.50 rows=1000 width=8) (actual
> time=0.062..0.062 rows=30 loops=1)"
> "        Buckets: 1024  Batches: 1  Memory Usage: 2kB"
> "        ->  HashAggregate  (cost=22.50..42.50 rows=1000 width=0) (actual
> time=0.048..0.053 rows=30 loops=1)"
> "              ->  Function Scan on generate_series s  (cost=0.00..20.00
> rows=1000 width=0) (actual time=0.020..0.034 rows=30 loops=1)"
> "Total runtime: 17177.527 ms"

The planner expects to need to return half the table when you provide 1,000
distinct telegram_ids, which is best handled by scanning the whole table
sequentially and tossing out invalid data.

I am curious if the plan will be different if you added a LIMIT 30 to the
sub-query.

The root of the problem is the planner has no way of knowing whether
generate_series is going to return 1 or 1,000,000 rows so by default it (and
all functions) are assumed (by the planner) to return 1,000 rows.  By adding
an explicit limit you can better inform the planner as to how many rows you
are going to be passing up to the parent query and it will hopefully, with
knowledge of only 30 distinct values, use the index.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/how-do-functions-affect-query-plan-tp5803993p5803996.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: how do functions affect query plan?