Re: Delete tables difference involves seq scan

Поиск
Список
Период
Сортировка
От Danylo Hlynskyi
Тема Re: Delete tables difference involves seq scan
Дата
Msg-id CANZg+yfVLZ7edD8OjDoz3+soWsG=NQOe7DBZvo=ZXr-O1FFquw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Delete tables difference involves seq scan  (Danylo Hlynskyi <abcz2.uprola@gmail.com>)
Список pgsql-performance
I was able to speedup original query a lot by using CTE. It still uses seq scan on `diff` table, but looks like it does this once: explain with diff as (select id from drug_refills_diff) delete from drug_refills where id in (select id from diff); QUERY PLAN --------------------------------------------------------------------------------------------------- Delete on drug_refills (cost=989844.94..990366.86 rows=456888836 width=46) CTE diff -> Seq Scan on drug_refills_diff (cost=0.00..720404.88 rows=11975088 width=16) -> Nested Loop (cost=269440.05..269961.98 rows=456888836 width=46) -> HashAggregate (cost=269439.48..269441.48 rows=200 width=56) Group Key: diff.id -> CTE Scan on diff (cost=0.00..239501.76 rows=11975088 width=56) -> Index Scan using drug_refills_pkey on drug_refills (cost=0.57..2.59 rows=1 width=22) Index Cond: (id = diff.id) (9 rows) 2017-12-01 15:17 GMT+02:00 Danylo Hlynskyi : > Oh, sorry, this happens on Postgresql 9.6.6. I've checked that on > Postgresql 10.0 query plan from setup (1) > now uses two seq scans, like in setup (2). > > > 2017-12-01 15:03 GMT+02:00 Danylo Hlynskyi : > >> Hello. I want to remove rows from first table, that exist in second >> (equality is done using PK). However I experience seq scan on second table, >> which counters my intuition - I think it should be index-only. Because >> tables are large, performance of query is very bad. >> >> However I got mixed results when trying to reproduce this behavior on >> syntetic tables. Here I'll show 3 different plans, which I got for the same >> query. >> >> 1. Setup is: >> --------------------------- >> create table diff (id uuid constraint diff_pkey primary key); >> create table origin (id uuid constraint origin_pkey primary key); >> --------------------------- >> >> The query generates correct plan, which performs only index scans: >> >> explain delete from origin where exists (select id from diff where >> origin.id = diff.id); >> QUERY PLAN >> ------------------------------------------------------------ >> ------------------------------- >> Delete on origin (cost=0.30..105.56 rows=1850 width=12) >> -> Merge Semi Join (cost=0.30..105.56 rows=1850 width=12) >> Merge Cond: (origin.id = diff.id) >> -> Index Scan using origin_pkey on origin (cost=0.15..38.90 >> rows=1850 width=22) >> -> Index Scan using diff_pkey on diff (cost=0.15..38.90 >> rows=1850 width=22) >> (5 rows) >> >> 2. Setup is: >> -------------------------------- >> create table origin (id uuid constraint origin_pkey primary key, data >> jsonb); >> create table diff (id uuid constraint diff_pkey primary key, data jsonb); >> -------------------------------- >> >> The query generates plan with two seq scans: >> >> explain delete from origin where exists (select id from diff where >> origin.id = diff.id); >> QUERY PLAN >> ------------------------------------------------------------ >> --------------- >> Delete on origin (cost=34.08..69.49 rows=1070 width=12) >> -> Hash Semi Join (cost=34.08..69.49 rows=1070 width=12) >> Hash Cond: (origin.id = diff.id) >> -> Seq Scan on origin (cost=0.00..20.70 rows=1070 width=22) >> -> Hash (cost=20.70..20.70 rows=1070 width=22) >> -> Seq Scan on diff (cost=0.00..20.70 rows=1070 width=22) >> (6 rows) >> >> 3. My real `origin` table has 26 fields and 800 billion rows, real `diff` >> table has 12 million rows and the query generates plan with nested loop and >> seq scan on `diff` table: >> >> explain delete from drug_refills origin where exists (select id from >> drug_refills_diff diff where origin.id = diff.id); >> QUERY PLAN >> ------------------------------------------------------------ >> ---------------------------------------------- >> Delete on drug_refills origin (cost=0.57..22049570.11 rows=11975161 >> width=12) >> -> Nested Loop (cost=0.57..22049570.11 rows=11975161 width=12) >> -> Seq Scan on drug_refills_diff diff (cost=0.00..720405.61 >> rows=11975161 width=22) >> -> Index Scan using drug_refills_pkey on drug_refills origin >> (cost=0.57..1.77 rows=1 width=22) >> Index Cond: (id = diff.id) >> (5 rows) >> >> I have run ANALYZE on both tables, but it didn't help. Here are column >> types in origin and diff (same schema), if that matters: >> >> uuid >> timestamp with time zone >> timestamp with time zone >> character varying(255) >> character varying(255) >> character varying(1024) >> numeric(10,4) >> integer >> numeric(14,8) >> numeric(14,8) >> numeric(14,8) >> numeric(14,8) >> numeric(14,8) >> character varying(16) >> character varying(16) >> character varying(16) >> character varying(16) >> character varying(16) >> character varying(16) >> date >> jsonb >> text[] >> uuid >> uuid >> uuid >> uuid >> >> >

В списке pgsql-performance по дате отправления:

Предыдущее
От: Danylo Hlynskyi
Дата:
Сообщение: Re: Delete tables difference involves seq scan
Следующее
От: Roman Konoval
Дата:
Сообщение: Bad plan for ltree predicate <@