single table - fighting a seq scan

Поиск
Список
Период
Сортировка
От Radoslav Nedyalkov
Тема single table - fighting a seq scan
Дата
Msg-id CANhtRiZMKJEHHK-Y6HEkYKHNULWshbks37EVDj=wLYdhZ0_rGA@mail.gmail.com
обсуждение исходный текст
Ответы Re: single table - fighting a seq scan  (Michael Lewis <mlewis@entrata.com>)
Список pgsql-general
Hi Forum, 
I'm scratching my head around the following case:

te is a 80M rows, 100GB table. It is a bare simple select over indexed attribute of it.

EXPLAIN SELECT te.id FROM te WHERE te.current_pid IN (240900026,
 240900027,
 240900028,
 -- 200 entries ...

 Gather  (cost=1000.00..61517367.85 rows=3870 width=8)
   Workers Planned: 2
   ->  Parallel Seq Scan on te  (cost=0.00..61515980.85 rows=1612 width=8)
         Filter: (current_pid = ANY ('{240900026,240900027,...240901129}'::bigint[]))
Execution time is about 5 minutes


Reducing number of current_pids to 100 changes the plan and it does index scan. (101 still does seq scan)

 Index Scan using te_current_pid_idx on te  (cost=0.57..731.26 rows=3832 width=8) (actual time=0.566..1.667 rows=600 loops=1)
   Index Cond: (current_pid = ANY ('{240900026,240900027,...240900194}'::bigint[]))
 Planning Time: 3.152 ms
 Execution Time: 1.732 ms


Selecting 200 pids rewritten with CTE goes for index too.

EXPLAIN ANALYZE
WITH cte as (
select * from unnest(ARRAY[
240900026,
 240900027,
 240900028,
...
 240901129
]))
SELECT te.id FROM te join cte on te.current_pid = cte.unnest;

                                                                                   QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.58..1097.83 rows=3847 width=8) (actual time=0.882..14.927 rows=1468 loops=1)
   CTE cte
     ->  Function Scan on unnest  (cost=0.00..1.00 rows=100 width=4) (actual time=0.025..0.043 rows=205 loops=1)
   ->  CTE Scan on cte  (cost=0.00..2.00 rows=100 width=4) (actual time=0.027..0.083 rows=205 loops=1)
   ->  Index Scan using te_current_pid_idx on te  (cost=0.57..10.57 rows=38 width=16) (actual time=0.011..0.071 rows=7 loops=205)
         Index Cond: (current_pid = cte.unnest)
 Planning Time: 2.022 ms
 Execution Time: 15.044 ms



I tried random_page_cost=1, a couple of combinations with very low
cpu_index_tuple_cost and cpu_operator_cost. Only managed to get an index scan for a few more IN entries.
Did analyze. Bumped stats target for current_pid to 5000. Did not help.

I'm out of ideas. What is the right approach to solve this ?
Thank You!

Rado

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: how to "explain" some ddl
Следующее
От: Marc Millas
Дата:
Сообщение: Re: how to "explain" some ddl