single table - fighting a seq scan

Поиск

Список

Период

Сортировка

От	Radoslav Nedyalkov
Тема	single table - fighting a seq scan
Дата	14 июля 2020 г. 20:29:03
Msg-id	CANhtRiZMKJEHHK-Y6HEkYKHNULWshbks37EVDj=wLYdhZ0_rGA@mail.gmail.com обсуждение исходный текст
Ответы	Re: single table - fighting a seq scan (Michael Lewis <mlewis@entrata.com>)
Список	pgsql-general

Дерево обсуждения

Hi Forum,

I'm scratching my head around the following case:

te is a 80M rows, 100GB table. It is a bare simple select over indexed attribute of it.

EXPLAIN SELECT te.id FROM te WHERE te.current_pid IN (240900026,
240900027,
240900028,
-- 200 entries ...

Gather (cost=1000.00..61517367.85 rows=3870 width=8)
Workers Planned: 2
-> Parallel Seq Scan on te (cost=0.00..61515980.85 rows=1612 width=8)
Filter: (current_pid = ANY ('{240900026,240900027,...240901129}'::bigint[]))
Execution time is about 5 minutes

Reducing number of current_pids to 100 changes the plan and it does index scan. (101 still does seq scan)

Index Scan using te_current_pid_idx on te (cost=0.57..731.26 rows=3832 width=8) (actual time=0.566..1.667 rows=600 loops=1)
Index Cond: (current_pid = ANY ('{240900026,240900027,...240900194}'::bigint[]))
Planning Time: 3.152 ms
Execution Time: 1.732 ms

Selecting 200 pids rewritten with CTE goes for index too.

EXPLAIN ANALYZE
WITH cte as (
select * from unnest(ARRAY[
240900026,
240900027,
240900028,
...
240901129
]))
SELECT te.id FROM te join cte on te.current_pid = cte.unnest;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1.58..1097.83 rows=3847 width=8) (actual time=0.882..14.927 rows=1468 loops=1)
CTE cte
-> Function Scan on unnest (cost=0.00..1.00 rows=100 width=4) (actual time=0.025..0.043 rows=205 loops=1)
-> CTE Scan on cte (cost=0.00..2.00 rows=100 width=4) (actual time=0.027..0.083 rows=205 loops=1)
-> Index Scan using te_current_pid_idx on te (cost=0.57..10.57 rows=38 width=16) (actual time=0.011..0.071 rows=7 loops=205)
Index Cond: (current_pid = cte.unnest)
Planning Time: 2.022 ms
Execution Time: 15.044 ms

I tried random_page_cost=1, a couple of combinations with very low
cpu_index_tuple_cost and cpu_operator_cost. Only managed to get an index scan for a few more IN entries.

Did analyze. Bumped stats target for current_pid to 5000. Did not help.

I'm out of ideas. What is the right approach to solve this ?

Thank You!

Rado

В списке pgsql-general по дате отправления:

Предыдущее

От: Tom Lane
Дата: 14 июля 2020 г., 20:05:14
Сообщение: Re: how to "explain" some ddl

Следующее

От: Marc Millas
Дата: 14 июля 2020 г., 21:43:51
Сообщение: Re: how to "explain" some ddl

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

single table - fighting a seq scan

Предыдущее

Следующее