Re: Long Running Update - My Solution

Поиск
Список
Период
Сортировка
От Harry Mantheakis
Тема Re: Long Running Update - My Solution
Дата
Msg-id 4E09A389.80604@riskcontrollimited.com
обсуждение исходный текст
Ответ на Re: Long Running Update - My Solution  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список pgsql-performance
Hello Kevin

 > If you use EXPLAIN with both statements...

Yes, the plans are indeed very different.

Here is the statement, set to update up to 100,000 records, which took
about 5 seconds to complete:


UPDATE
   table_A
SET
   field_1 = table_B.field_1
, field_2 = table_B.field_2
FROM
   table_B
WHERE
   table_B.tb_id >= 0
AND
   table_B.tb_id <= 100000
AND
   table_B.tb_id = table_A.ta_id
;


The query plan for the above is:


Nested Loop  (cost=0.00..2127044.47 rows=73620 width=63)
   ->  Index Scan using table_B_pkey on table_B  (cost=0.00..151830.75
rows=73620 width=20)
         Index Cond: ((tb_id >= 0) AND (tb_id <= 100000))
   ->  Index Scan using table_A_pkey on table_A  (cost=0.00..26.82
rows=1 width=47)
         Index Cond: (table_A.ta_id = table_B.tb_id)


Now, if I change the first AND clause to update 1M records, as follows:


table_B.id <= 1000000


I get the following - quite different - query plan:


Hash Join  (cost=537057.49..8041177.88 rows=852150 width=63)
   Hash Cond: (table_A.ta_id = table_B.tb_id)
   ->  Seq Scan on table_A  (cost=0.00..3294347.71 rows=145561171 width=47)
   ->  Hash  (cost=521411.62..521411.62 rows=852150 width=20)
         ->  Bitmap Heap Scan on table_B  (cost=22454.78..521411.62
rows=852150 width=20)
               Recheck Cond: ((tb_id >= 0) AND (tb_id <= 1000000))
               ->  Bitmap Index Scan on table_B_pkey
(cost=0.00..22241.74 rows=852150 width=0)
                     Index Cond: ((tb_id >= 0) AND (tb_id <= 1000000))


Note: When I tried updating 1M records, the command was still running
after 25 minutes before I killed it.

The sequential scan in the later plan looks expensive, and (I think)
supports what others have since mentioned, namely that when the
optimizer moves to using sequential scans (working off the disk) things
get a lot slower.

For me, the penny has finally dropped on why I should use EXPLAIN for
bulk operations.

Thanks too, to Greg Smith, Robert Klemme and Thomas for all the feedback.

Kind regards

Harry Mantheakis
London, UK




В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: Long Running Update - My Solution
Следующее
От: Craig McIlwee
Дата:
Сообщение: Slow performance when querying millions of rows