Обсуждение: Possible future performance improvement: sort updates/deletes by ctid

Поиск
Список
Период
Сортировка

Possible future performance improvement: sort updates/deletes by ctid

От
Tom Lane
Дата:
We've had a couple of discussions recently revolving around the
inefficiency of using hashjoin/hashaggregation output to update a target
table, because of the resulting very random access pattern.  I believe
this same mechanism is underlying the slowness of Stephen Denne's
"alternate query" described here:
http://archives.postgresql.org/pgsql-performance/2008-01/msg00227.php

I made up the attached doubtless-oversimplified test case to model what
he was seeing.  It's cut down about 4x from the table size he describes,
but the UPDATE still takes forever --- I gave up waiting after 2 hours,
when it had deleted about a fifth of its hashjoin temp files, suggesting
that the total runtime would be about 10 hours.

A brute force idea for fixing this is to sort the intended update or
delete operations of an UPDATE/DELETE command according to the target
table's ctid, which is available for free anyway since the executor top
level must have it to perform the operation.  I made up an even more
brute force patch (also attached) that forces that to happen for every
UPDATE or DELETE --- obviously we'd not want that for real, it's just
for crude performance testing.  With that patch, I got the results

                                                                  QUERY PLAN
                       

-----------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=6075623.03..6085623.05 rows=4000008 width=618) (actual time=2078726.637..3371944.124 rows=4000000 loops=1)
   Sort Key: df.ctid
   Sort Method:  external merge  Disk: 2478992kB
   ->  Hash Join  (cost=123330.50..1207292.72 rows=4000008 width=618) (actual time=20186.510..721120.455 rows=4000000
loops=1)
         Hash Cond: (df.document_id = d.id)
         ->  Seq Scan on document_file df  (cost=0.00..373334.08 rows=4000008 width=614) (actual
time=11.775..439993.807rows=4000000 loops=1) 
         ->  Hash  (cost=57702.00..57702.00 rows=4000200 width=8) (actual time=19575.885..19575.885 rows=4000000
loops=1)
               ->  Seq Scan on document d  (cost=0.00..57702.00 rows=4000200 width=8) (actual time=0.039..14335.615
rows=4000000loops=1) 
 Total runtime: 3684037.097 ms

or just over an hour runtime --- still not exactly speedy, but it
certainly compares favorably to the estimated 10 hours for unsorted
updates.

This is with default shared_buffers (32MB) and work_mem (1MB);
a more aggressive work_mem would have meant fewer hash batches and fewer
sort runs and hence better performance in both cases, but with the
majority of the runtime going into the sort step here, I think that the
sorted update would benefit much more.

Nowhere near a workable patch of course, but seems like food for
thought.

            regards, tom lane

drop table if exists document;
drop table if exists document_file ;

create table document (document_type_id int, id int primary key);
create table document_file (document_type_id int, document_id int primary key,
   filler char(600));

insert into document_file select x,x,'z' from generate_series(1,4000000) x;
insert into document select x,x from generate_series(1,4000000) x;

analyze document_file;
analyze document;

set enable_mergejoin = false;

explain analyze UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id
=document_id; 
Index: src/backend/optimizer/prep/preptlist.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/optimizer/prep/preptlist.c,v
retrieving revision 1.88
diff -c -r1.88 preptlist.c
*** src/backend/optimizer/prep/preptlist.c    1 Jan 2008 19:45:50 -0000    1.88
--- src/backend/optimizer/prep/preptlist.c    30 Jan 2008 03:06:30 -0000
***************
*** 32,37 ****
--- 32,38 ----
  #include "optimizer/var.h"
  #include "parser/analyze.h"
  #include "parser/parsetree.h"
+ #include "parser/parse_clause.h"
  #include "parser/parse_coerce.h"


***************
*** 103,108 ****
--- 104,120 ----
              tlist = list_copy(tlist);

          tlist = lappend(tlist, tle);
+
+         /*
+          * Force the query result to be sorted by CTID, for better update
+          * speed.  (Note: we expect parse->sortClause to be NIL here,
+          * but this code will do no harm if it's not.)
+          */
+         parse->sortClause = addTargetToSortList(NULL, tle,
+                                                 parse->sortClause, tlist,
+                                                 SORTBY_DEFAULT,
+                                                 SORTBY_NULLS_DEFAULT,
+                                                 NIL, false);
      }

      /*

Re: Possible future performance improvement: sort updates/deletes by ctid

От
"Stephen Denne"
Дата:
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> doubtless-oversimplified
It looks equivalent.

> With that patch, I got the results
...
>    ->  Hash Join  (cost=123330.50..1207292.72 rows=4000008
> width=618) (actual time=20186.510..721120.455 rows=4000000 loops=1)

The plan from here is equivalent to the query plan that I had.
In an update query, does the actual time = 721120 mean that after 12 minutes it had completed figuring out what to
update,and what to? 

> This is with default shared_buffers (32MB) and work_mem (1MB);

I had tried a few larger settings, and though I had fewer temp files created, they still took longer than I was willing
towait to process. 
I did figure out that contention with the background writer or checkpoint processing probably wasn't a large
contributor.

How hard is it to match, recognise potential benefit, and rewrite the query from

UPDATE ONLY document_file AS df SET document_type_id =        d.document_type_id FROM document AS d WHERE d.id =
document_id;

to

UPDATE ONLY document_file AS df SET document_type_id =
(SELECT d.document_type_id FROM document AS d WHERE d.id = document_id);

Which is several orders of magnitude faster for me.

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 

__________________________________________________________________ This email has been scanned by the DMZGlobal
BusinessQuality              Electronic Messaging Suite. 
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________



Re: Possible future performance improvement: sort updates/deletes by ctid

От
Tom Lane
Дата:
"Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:
> How hard is it to match, recognise potential benefit, and rewrite the query from

> UPDATE ONLY document_file AS df SET document_type_id = 
>         d.document_type_id FROM document AS d WHERE d.id = document_id;

> to

> UPDATE ONLY document_file AS df SET document_type_id = 
> (SELECT d.document_type_id FROM document AS d WHERE d.id = document_id);

> Which is several orders of magnitude faster for me.

At the planner level that would be entirely the wrong way to go about
it, because that's forcing the equivalent of a nestloop join, which is
very unlikely to be faster for the numbers of rows that we're talking
about here.  The reason it looks faster to you is that the benefits of
updating the document_file rows in ctid order outweigh the costs of the
dumb join strategy ... but what we want to achieve here is to have both
benefits, or at least to give the planner the opportunity to make a
cost-driven decision about what to do.
        regards, tom lane


Re: Possible future performance improvement: sort updates/deletes by ctid

От
"Stephen Denne"
Дата:
> At the planner level that would be entirely the wrong way to go about
> it, because that's forcing the equivalent of a nestloop join, which is
> very unlikely to be faster for the numbers of rows that we're talking
> about here.  The reason it looks faster to you is that the benefits of
> updating the document_file rows in ctid order outweigh the
> costs of the
> dumb join strategy ... but what we want to achieve here is to
> have both
> benefits, or at least to give the planner the opportunity to make a
> cost-driven decision about what to do.

Ok.

Here are some more data points, using a smaller table, v8.2.6:


Seq Scan on document_file df  (cost=0.00..208480.85 rows=25101 width=662) (actual time=0.239..773.834 rows=25149
loops=1)SubPlan   ->  Index Scan using pk_document_id on document d  (cost=0.00..8.27 rows=1 width=4) (actual
time=0.011..0.015rows=1 loops=25149)         Index Cond: (id = $0) 
Total runtime: 4492.363 ms



vs


Hash Join  (cost=1048.85..6539.32 rows=25149 width=666) (actual time=575.079..1408.363 rows=25149 loops=1) Hash Cond:
(df.document_id= d.id) ->  Seq Scan on document_file df  (cost=0.00..4987.49 rows=25149 width=662) (actual
time=60.724..824.195rows=25149 loops=1) ->  Hash  (cost=734.49..734.49 rows=25149 width=8) (actual time=40.271..40.271
rows=25149loops=1)       ->  Seq Scan on document d  (cost=0.00..734.49 rows=25149 width=8) (actual time=0.055..22.559
rows=25149loops=1) 
Total runtime: 34961.504 ms


These are fairly repeatable for me after doing a vacuum full analyze of the two tables.


Have I simply not tuned postgres so that it knows it has everything on a single old IDE drive, not split over a few
setsof raided SSD drives, hence random_page_cost should perhaps be larger than 4.0? Would that make the second estimate
largerthan the first estimate? 

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 

__________________________________________________________________ This email has been scanned by the DMZGlobal
BusinessQuality              Electronic Messaging Suite. 
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________



Re: Possible future performance improvement: sort updates/deletes by ctid

От
Bruce Momjian
Дата:
Added to TODO:

* Sort large UPDATE/DELETEs so it is done in heap order
 http://archives.postgresql.org/pgsql-hackers/2008-01/msg01119.php


---------------------------------------------------------------------------

Tom Lane wrote:
> We've had a couple of discussions recently revolving around the
> inefficiency of using hashjoin/hashaggregation output to update a target
> table, because of the resulting very random access pattern.  I believe
> this same mechanism is underlying the slowness of Stephen Denne's
> "alternate query" described here:
> http://archives.postgresql.org/pgsql-performance/2008-01/msg00227.php
> 
> I made up the attached doubtless-oversimplified test case to model what
> he was seeing.  It's cut down about 4x from the table size he describes,
> but the UPDATE still takes forever --- I gave up waiting after 2 hours,
> when it had deleted about a fifth of its hashjoin temp files, suggesting
> that the total runtime would be about 10 hours.
> 
> A brute force idea for fixing this is to sort the intended update or
> delete operations of an UPDATE/DELETE command according to the target
> table's ctid, which is available for free anyway since the executor top
> level must have it to perform the operation.  I made up an even more
> brute force patch (also attached) that forces that to happen for every
> UPDATE or DELETE --- obviously we'd not want that for real, it's just
> for crude performance testing.  With that patch, I got the results
> 
>                                                                   QUERY PLAN
                        
 
>
-----------------------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=6075623.03..6085623.05 rows=4000008 width=618) (actual time=2078726.637..3371944.124 rows=4000000
loops=1)
>    Sort Key: df.ctid
>    Sort Method:  external merge  Disk: 2478992kB
>    ->  Hash Join  (cost=123330.50..1207292.72 rows=4000008 width=618) (actual time=20186.510..721120.455 rows=4000000
loops=1)
>          Hash Cond: (df.document_id = d.id)
>          ->  Seq Scan on document_file df  (cost=0.00..373334.08 rows=4000008 width=614) (actual
time=11.775..439993.807rows=4000000 loops=1)
 
>          ->  Hash  (cost=57702.00..57702.00 rows=4000200 width=8) (actual time=19575.885..19575.885 rows=4000000
loops=1)
>                ->  Seq Scan on document d  (cost=0.00..57702.00 rows=4000200 width=8) (actual time=0.039..14335.615
rows=4000000loops=1)
 
>  Total runtime: 3684037.097 ms
> 
> or just over an hour runtime --- still not exactly speedy, but it
> certainly compares favorably to the estimated 10 hours for unsorted
> updates.
> 
> This is with default shared_buffers (32MB) and work_mem (1MB);
> a more aggressive work_mem would have meant fewer hash batches and fewer
> sort runs and hence better performance in both cases, but with the
> majority of the runtime going into the sort step here, I think that the
> sorted update would benefit much more.
> 
> Nowhere near a workable patch of course, but seems like food for
> thought.
> 
>             regards, tom lane
> 

Content-Description: bighash.sql

> drop table if exists document;
> drop table if exists document_file ;
> 
> create table document (document_type_id int, id int primary key);
> create table document_file (document_type_id int, document_id int primary key,
>    filler char(600));
> 
> insert into document_file select x,x,'z' from generate_series(1,4000000) x;
> insert into document select x,x from generate_series(1,4000000) x;
> 
> analyze document_file;
> analyze document;
> 
> set enable_mergejoin = false;
> 
> explain analyze UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE
d.id= document_id;
 

Content-Description: ctid-sort.patch

> Index: src/backend/optimizer/prep/preptlist.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/optimizer/prep/preptlist.c,v
> retrieving revision 1.88
> diff -c -r1.88 preptlist.c
> *** src/backend/optimizer/prep/preptlist.c    1 Jan 2008 19:45:50 -0000    1.88
> --- src/backend/optimizer/prep/preptlist.c    30 Jan 2008 03:06:30 -0000
> ***************
> *** 32,37 ****
> --- 32,38 ----
>   #include "optimizer/var.h"
>   #include "parser/analyze.h"
>   #include "parser/parsetree.h"
> + #include "parser/parse_clause.h"
>   #include "parser/parse_coerce.h"
>   
>   
> ***************
> *** 103,108 ****
> --- 104,120 ----
>               tlist = list_copy(tlist);
>   
>           tlist = lappend(tlist, tle);
> + 
> +         /*
> +          * Force the query result to be sorted by CTID, for better update
> +          * speed.  (Note: we expect parse->sortClause to be NIL here,
> +          * but this code will do no harm if it's not.)
> +          */
> +         parse->sortClause = addTargetToSortList(NULL, tle,
> +                                                 parse->sortClause, tlist,
> +                                                 SORTBY_DEFAULT,
> +                                                 SORTBY_NULLS_DEFAULT,
> +                                                 NIL, false);
>       }
>   
>       /*

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +