Re: Postgresql is using seqscan when is should use indexes.

Поиск
Список
Период
Сортировка
От Russell Smith
Тема Re: Postgresql is using seqscan when is should use indexes.
Дата
Msg-id 200411082304.04359.mr-russ@pws.com.au
обсуждение исходный текст
Ответ на Postgresql is using seqscan when is should use indexes.  (Andreas Åkre Solberg <Andreas.Solberg@uninett.no>)
Список pgsql-performance
On Mon, 8 Nov 2004 09:40 pm, Andreas Åkre Solberg wrote:
> We have two tables, dst_port_hour and dst_port_day, which should be
> very similar, they both have about 50.000.000 rows. In both tables we
> have an index for period_id.
>
> We run postgresql  7.4.5 on a dedicated Debian server, with dual Intel
> Xeon 3GHz and 4GB memory.
>
> The problem is that on the dst_port_day table, postgresql is using
> seqscan, and not the index when it should. Forcing the use of the index
> by setting enable_seqscan to false, makes the query lighthening fast.
> When using seqscan, the query takes several minutes. The planner
> calculates the cost for Index scan to be much more than sequence scan.
>
> Why is our query planner misbehaving?
>
> Here are the exaplain analyze output with and without index-force:
>
>
> SET enable_seqscan=false;
>
> stager=>  explain analyze SELECT cur.portnr FROM dst_port_day cur WHERE
> cur.period_id='2779' GROUP BY cur.portnr  ORDER BY SUM(cur.octets) DESC
>   LIMIT 5;
>
dst_port_day cur  (cost=0.00..2019931.14 rows=546150 width=12) (actual time=0.038..303.801 rows=48072 loops=1)

The guess of the number of rows returned by the index scan is out by a factor of 10.  500k rows is greater than 1% of
the rows, so I think the planner is likely to choose a sequence scan at this amount, unless you have tuned things like
random page cost.

What is the selectivity like on that column?
Have you analyzed recently?

If so, you should probably increase the statistics on that column
See ALTER TABLE SET STATISTICS in the manual.

>             QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Limit  (cost=2022664.62..2022664.63 rows=5 width=12) (actual time=831.772..831.816 rows=5 loops=1)
>     ->  Sort  (cost=2022664.62..2022664.82 rows=80 width=12) (actual time=831.761..831.774 rows=5 loops=1)
>           Sort Key: sum(octets)
>           ->  HashAggregate  (cost=2022661.89..2022662.09 rows=80 width=12) (actual time=587.036..663.991 rows=16396
loops=1)
>                 ->  Index Scan using dst_port_day_period_id_key on dst_port_day cur  (cost=0.00..2019931.14
rows=546150width=12) (actual time=0.038..303.801 rows=48072 loops=1) 
>                       Index Cond: (period_id = 2779)
>   Total runtime: 836.362 ms
> (7 rows)
>
>
>
> SET enable_seqscan=true;
>
> stager=>   explain analyze SELECT cur.portnr FROM dst_port_day cur WHERE cur.period_id='2779' GROUP BY cur.portnr
ORDERBY SUM(cur.octets) DESC  LIMIT 5; 
>
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------
>   Limit  (cost=1209426.88..1209426.89 rows=5 width=12) (actual time=299053.006..299053.053 rows=5 loops=1)
>     ->  Sort  (cost=1209426.88..1209427.08 rows=80 width=12) (actual time=299052.995..299053.008 rows=5 loops=1)
>           Sort Key: sum(octets)
>           ->  HashAggregate  (cost=1209424.15..1209424.35 rows=80 width=12) (actual time=298803.273..298881.020
rows=16396loops=1) 
>                 ->  Seq Scan on dst_port_day cur  (cost=0.00..1206693.40 rows=546150 width=12) (actual
time=298299.508..298526.544rows=48072 loops=1) 
>                       Filter: (period_id = 2779)
>   Total runtime: 299057.643 ms
> (7 rows)
>

Regards

Russell Smith

В списке pgsql-performance по дате отправления:

Предыдущее
От: Alvaro Nunes Melo
Дата:
Сообщение: Re: Better Hardware, worst Results
Следующее
От: Dawid Kuroczko
Дата:
Сообщение: ext3 journalling type