Why Postgres doesn't use TID scan?

Поиск

Список

Период

Сортировка

От	Vladimir Ryabtsev
Тема	Why Postgres doesn't use TID scan?
Дата	18 декабря 2018 г. 04:16:08
Msg-id	CAMqTPq=hNg0GYFU0X+xmuKy8R2ARk1+A_uQpS+Mnf71MYpBKzg@mail.gmail.com обсуждение исходный текст
Ответы	Re: Why Postgres doesn't use TID scan?
Список	pgsql-performance

Дерево обсуждения

I want to clean a large log table by chunks. I write such a query:

delete from categorization.log

where ctid in (

select ctid from categorization.log

where timestamp < now() - interval '2 month'

limit 1000

)

But I am getting the following weird plan:

[Plan 1]

Delete on log (cost=74988058.17..77101421.77 rows=211334860 width=36)

-> Merge Semi Join (cost=74988058.17..77101421.77 rows=211334860 width=36)

Merge Cond: (log.ctid = "ANY_subquery".ctid)

-> Sort (cost=74987967.33..76044641.63 rows=422669720 width=6)

Sort Key: log.ctid

-> Seq Scan on log (cost=0.00..8651368.20 rows=422669720 width=6)

-> Sort (cost=90.83..93.33 rows=1000 width=36)

Sort Key: "ANY_subquery".ctid

-> Subquery Scan on "ANY_subquery" (cost=0.00..41.00 rows=1000 width=36)

-> Limit (cost=0.00..31.00 rows=1000 width=6)

-> Seq Scan on log log_1 (cost=0.00..11821391.10 rows=381284367 width=6)

Filter: ("timestamp" < (now() - '2 mons'::interval))

And it takes infinity to complete (with any number in LIMIT from 1 to 1000).

However if I extract CTIDs manually:

select array_agg(ctid) from (

select ctid from s.log

where timestamp < now() - interval '2 month'

limit 5

) v

and substitute the result inside the DELETE query, it does basic TID scan and completes in just milliseconds:

explain

delete from s.log

where ctid = any('{"(3020560,1)","(3020560,2)","(3020560,3)","(3020560,4)","(3020560,5)"}'::tid[])

[Plan 2]

Delete on log (cost=0.01..20.06 rows=5 width=6)

-> Tid Scan on log (cost=0.01..20.06 rows=5 width=6)

TID Cond: (ctid = ANY ('{"(3020560,1)","(3020560,2)","(3020560,3)","(3020560,4)","(3020560,5)"}'::tid[]))

In case the table's definition helps:

CREATE TABLE s.log

(

article_id bigint NOT NULL,

topic_id integer NOT NULL,

weight double precision NOT NULL,

cat_system character varying(50) NOT NULL,

lang character varying(5) NOT NULL,

is_final boolean NOT NULL,

comment character varying(50),

"timestamp" timestamp without time zone DEFAULT now()

)

Number of rows ~ 423M

n_live_tup = 422426725

last_vacuum = 2018-10-22

Postgres version(): PostgreSQL 10.3 (Ubuntu 10.3-1.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit

Why does this query want to use Seq Scan and Sort on a 423M rows table?

How to fix this (reduce it to Plan 2)?

Vlad

В списке pgsql-performance по дате отправления:

Предыдущее

От: Mariel Cherkassky
Дата: 16 декабря 2018 г., 14:58:45
Сообщение: Re: pgbench results arent accurate

Следующее

От: Tom Lane
Дата: 18 декабря 2018 г., 04:40:37
Сообщение: Re: Why Postgres doesn't use TID scan?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Why Postgres doesn't use TID scan?

Предыдущее

Следующее