Re: Adding new field to big table

Тема: Re: Adding new field to big table
Дата: ,
(см: обсуждение, исходный текст)
Ответ на: Re: Adding new field to big table  (Jeff Janes)
Список: pgsql-performance

Скрыть дерево обсуждения

Adding new field to big table  ("acanada", )
 Re: Adding new field to big table  (Jeff Janes, )
  Re: Adding new field to big table  ("acanada", )
   Re: Adding new field to big table  (Jeff Janes, )
    Re: Adding new field to big table  ("acanada", )
     Re: Adding new field to big table  (Jeff Janes, )
      Re: Adding new field to big table  (, )

Jeff I think adding the new table is the best way to handle this issue.

From:        Jeff Janes <>
To:        acanada <>,
Cc:        postgres performance list <>
Date:        03/18/2014 02:05 AM
Subject:        Re: [PERFORM] Adding new field to big table
Sent by:        

On Monday, March 17, 2014, acanada <> wrote:

Jeff and Jeffrey thank you for your tips.
This is the explain of the query:
x=> explain update document as d set "svmConfidence" = st.svmconfidence from svm_confidence_id_tmp as st where =;
                                               QUERY PLAN                                                
 Update on document d  (cost=4204242.82..61669685.86 rows=124515592 width=284)
   ->  Hash Join  (cost=4204242.82..61669685.86 rows=124515592 width=284)
         Hash Cond: ( =
         ->  Seq Scan on document d  (cost=0.00..8579122.97 rows=203066697 width=270)
         ->  Hash  (cost=1918213.92..1918213.92 rows=124515592 width=18)
               ->  Seq Scan on svm_confidence_id_tmp st  (cost=0.00..1918213.92 rows=124515592 width=18)
(6 rows)

It's not using the index, most of the rows are beeing updated.
I'm trying with the CTAS solution.

Once this hash join spills to disk, the performance is going to get very bad.  The problem is that the outer table is going to get split into batches and written to disk.  If this were just a select, that would not be a problem because when it reads each batch back in, that is all it needs to do as the temp file contains all the necessary info.  But with an update, each batch that it reads back in and matches to the inner side, it then needs to back to the physical table to do the update, using the ctid saved in the batches to find the table tuple.  So in effect this adds a nested loop from the hashed copy of the table to the real copy, and the locality of reference between those is poor when there are many batches.  I'm pretty sure that the extra cost of doing this look up is not taken into account by the planner.  But, if it chooses a different plan than a hash join, that other plan might also have the same problem.

Some things for you to consider, other than CTAS:

1) Are you analyzing your temporary table before you do the update?  That might switch it to a different plan.

2) Make work_mem as large as you can stand, just for the one session that runs the update, to try to avoid spilling to disk.

3) If you set enable_hashjoin off temporarily in this session, what plan do you get?

0) Why are you creating the temporary table?  You must have some process that comes up with the value for the new column to put in the temporary table, why not just stick it directly into the original table?

Some things for the PostgreSQL hackers to consider:

1) When the hash spills to disk, it seems to write to disk the entire row that is going to be updated (except for the one column which is going to be overwritten) plus that tuple's ctid.  It doesn't seem like this is necessary, it should only need to write the ctid and the join key (and perhaps any quals?).  Since it has to visit the old row anyway to set its cmax, it can pull out the rest of the data to make the new tuple while it is there.  If it wrote a lot less data to the temp tables it could make a lot less batches for the same work_mem, and here the cost is directly proportional to the number of batches.  (Also, for the table not being updated, it writes the ctid to temp space when there seems to be no use for it.)

2) Should the planner account for the scattered reads needed to join to the original table on ctid for update from whatever materialized version of the table is created?  Of course all other plans would also need to be similarly instrumented.  I think most of them would have the same problem as the hash_join.  The one type I can think of that doesn't would be a merge join in which there is strong correlation between the merge key and the ctid order on the table to be updated.

3) It seems like the truly efficient way to run such an update on a very large data set would be join the two tables (hash or merge), then sort the result on the ctid of the updated table, then do a "merge join" between that sorted result and the physical table.  I don't think such a method currently is known to the planner, is it?  How hard would it be to make it?

The example I have been using is:

alter table pgbench_accounts add filler2 text;
create table foo as select aid, md5(aid::text) from pgbench_accounts;
explain (verbose) update pgbench_accounts set filler2 =md5 from foo where pgbench_accounts.aid=foo.aid;



-- Confidentiality Notice -- This email message, including all the attachments, is for the sole use of the intended recipient(s) and contains confidential information. Unauthorized use or disclosure is prohibited. If you are not the intended recipient, you may not use, disclose, copy or disseminate this information. If you are not the intended recipient, please contact the sender immediately by reply email and destroy all copies of the original message, including attachments.

В списке pgsql-performance по дате сообщения:

От: Elanchezhiyan Elango
Сообщение: Checkpoints and slow queries
От: Andreas Joseph Krogh
Сообщение: Optimize query for listing un-read messages