Plan for update ... where a is not distinct from b

Поиск
Список
Период
Сортировка
От Peter J. Holzer
Тема Plan for update ... where a is not distinct from b
Дата
Msg-id 20171128085016.ia7x25ty5krlykeh@hjp.at
обсуждение исходный текст
Ответы Re: Plan for update ... where a is not distinct from b  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Plan for update ... where a is not distinct from b  (Laurenz Albe <laurenz.albe@cybertec.at>)
Список pgsql-general
[PostgreSQL 9.5.10 on x86_64-pc-linux-gnu, compiled by gcc (Debian
4.9.2-10) 4.9.2, 64-bit]

I noticed that an update was taking a long time and found this:

UPDATE public.facttable_imf_ifs p
SET [...lots of columns...]
FROM cleansing.cls_imf_ifs_facttable_imf_ifs c, cleansing.cleansing_change_type ct
WHERE   (p.macrobondtimeseries is not distinct from c.macrobondtimeseries) AND (p.date is not distinct from c.date)
ANDc.cleansing_change_type_id = ct.cleansing_change_type_id   AND ct.cleansing_change_type_desc_short IN
('UPDATED_NEW')
;


╔════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                                                       QUERY PLAN
                                ║ 

╟────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Update on facttable_imf_ifs p  (cost=1.09..1978353972070.75 rows=7969398 width=1498)
                                ║ 
║   ->  Nested Loop  (cost=1.09..1978353972070.75 rows=7969398 width=1498)
                                ║ 
║         Join Filter: ((NOT ((p.macrobondtimeseries)::text IS DISTINCT FROM (c.macrobondtimeseries)::text)) AND (NOT
(p.dateIS DISTINCT FROM c.date))) ║ 
║         ->  Seq Scan on facttable_imf_ifs p  (cost=0.00..1071317.45 rows=20727045 width=51)
                                ║ 
║         ->  Materialize  (cost=1.09..2028790.72 rows=5454160 width=1472)
                                ║ 
║               ->  Hash Join  (cost=1.09..2001519.92 rows=5454160 width=1472)
                                ║ 
║                     Hash Cond: (c.cleansing_change_type_id = ct.cleansing_change_type_id)
                                ║ 
║                     ->  Seq Scan on cls_imf_ifs_facttable_imf_ifs c  (cost=0.00..1824258.62 rows=32724962 width=1470)
                                ║ 
║                     ->  Hash  (cost=1.07..1.07 rows=1 width=8)
                                ║ 
║                           ->  Seq Scan on cleansing_change_type ct  (cost=0.00..1.07 rows=1 width=8)
                                ║ 
║                                 Filter: ((cleansing_change_type_desc_short)::text = 'UPDATED_NEW'::text)
                                ║ 

╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

Am I correct to assume that the result of Materialize is a flat table
(in memory) without any indexes and that the nested loop has to scan
that for each of the 20 million rows of the target table?

That's going to take a long time ...

Replacing the "is not distinct from" with "=" (which is possible in this
case because both columns are not null (and indeed the primary key), gives me this plan:


╔════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                                         QUERY PLAN
    ║ 

╟────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Update on facttable_imf_ifs p  (cost=2611816.89..4435860.27 rows=7969425 width=1498)
    ║ 
║   ->  Merge Join  (cost=2611816.89..4435860.27 rows=7969425 width=1498)
    ║ 
║         Merge Cond: (((p.macrobondtimeseries)::text = (c.macrobondtimeseries)::text) AND (p.date = c.date))
    ║ 
║         ->  Index Scan using facttable_imf_ifs_pkey on facttable_imf_ifs p  (cost=0.56..1541107.94 rows=20727117
width=51)║ 
║         ->  Sort  (cost=2611811.12..2625446.52 rows=5454160 width=1472)
    ║ 
║               Sort Key: c.macrobondtimeseries, c.date
    ║ 
║               ->  Hash Join  (cost=1.09..2001519.92 rows=5454160 width=1472)
    ║ 
║                     Hash Cond: (c.cleansing_change_type_id = ct.cleansing_change_type_id)
    ║ 
║                     ->  Seq Scan on cls_imf_ifs_facttable_imf_ifs c  (cost=0.00..1824258.62 rows=32724962 width=1470)
    ║ 
║                     ->  Hash  (cost=1.07..1.07 rows=1 width=8)
    ║ 
║                           ->  Seq Scan on cleansing_change_type ct  (cost=0.00..1.07 rows=1 width=8)
    ║ 
║                                 Filter: ((cleansing_change_type_desc_short)::text = 'UPDATED_NEW'::text)
    ║ 

╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

which obviously looks a lot nicer, but even replacing
(A is not distinct from B)
with the equivalent
(A = B or A is null and B is null)
gives a different plan:


╔═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                                                                                          QUERY PLAN

╟───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Update on facttable_imf_ifs p  (cost=5192.25..28394567337.17 rows=7969813 width=1498)
║   ->  Nested Loop  (cost=5192.25..28394567337.17 rows=7969813 width=1498)
║         ->  Hash Join  (cost=1.09..2001519.92 rows=5454160 width=1472)
║               Hash Cond: (c.cleansing_change_type_id = ct.cleansing_change_type_id)
║               ->  Seq Scan on cls_imf_ifs_facttable_imf_ifs c  (cost=0.00..1824258.62 rows=32724962 width=1470)
║               ->  Hash  (cost=1.07..1.07 rows=1 width=8)
║                     ->  Seq Scan on cleansing_change_type ct  (cost=0.00..1.07 rows=1 width=8)
║                           Filter: ((cleansing_change_type_desc_short)::text = 'UPDATED_NEW'::text)
║         ->  Bitmap Heap Scan on facttable_imf_ifs p  (cost=5191.16..5205.65 rows=1 width=51)
║               Recheck Cond: (((macrobondtimeseries)::text = (c.macrobondtimeseries)::text) OR (macrobondtimeseries IS
NULL))
║               Filter: ((((macrobondtimeseries)::text = (c.macrobondtimeseries)::text) OR ((macrobondtimeseries IS
NULL)AND (c.macrobondtimeseries IS NULL))) AND ((date = c.date) OR ((date IS NULL) AND (c.date IS NULL)))) 
║               ->  BitmapOr  (cost=5191.16..5191.16 rows=866 width=0)
║                     ->  Bitmap Index Scan on facttable_imf_ifs_pkey  (cost=0.00..7.06 rows=866 width=0)
║                           Index Cond: ((macrobondtimeseries)::text = (c.macrobondtimeseries)::text)
║                     ->  Bitmap Index Scan on facttable_imf_ifs_pkey  (cost=0.00..2.07 rows=1 width=0)
║                           Index Cond: (macrobondtimeseries IS NULL)

╚═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
       hp

--   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp@hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

В списке pgsql-general по дате отправления:

Предыдущее
От: "Ivan E. Panchenko"
Дата:
Сообщение: Re: jsonb and where clause?
Следующее
От: Bjorn T Johansen
Дата:
Сообщение: Re: jsonb and where clause?