Performance cost of a sort-merge join

Поиск
Список
Период
Сортировка
От Yang Zhang
Тема Performance cost of a sort-merge join
Дата
Msg-id 9066fa251002220836o2097d19fr8a0af442388d3415@mail.gmail.com
обсуждение исходный текст
Ответы Re: Performance cost of a sort-merge join  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Hi, I have the given tables and query, but I'm confused why the cost
of the query is so high. I've left it running over night. By
comparison, a "select * from metarelcloud_transactionlog order by
transactionid" takes on the order of seconds/minutes (at least in
MySQL). As far as I can tell, the complex query below basically just
consists of two sorts (three if PG doesn't in fact reuse the leaf
sorts). Why the enormous estimated page IO count (cost) on the second
sort? There are roughly 30 tuples per transactionid, so the join
shouldn't produce a vastly exploded dataset. Thanks in advance.

tpcc=# \d metarelcloud_graph
      Table "public.metarelcloud_graph"
  Column  |         Type          | Modifiers
----------+-----------------------+-----------
 tableid1 | character varying(20) | not null
 tupleid1 | integer               | not null
 tableid2 | character varying(20) | not null
 tupleid2 | integer               | not null
 node1    | integer               | not null
 node2    | integer               | not null
 weight   | integer               | not null
Indexes:
    "metarelcloud_graph_pkey" PRIMARY KEY, btree (tableid1, tupleid1,
tableid2, tupleid2)

tpcc=# \d metarelcloud_transactionlog
                                       Table
"public.metarelcloud_transactionlog"
       Column        |         Type          |
       Modifiers

---------------------+-----------------------+--------------------------------------------------------------------------
 id                  | integer               | not null default
nextval('metarelcloud_transactionlog_id_seq'::regclass)
 transactionid       | integer               | not null
 queryid             | smallint              | not null
 tableid             | character varying(30) | not null
 tupleid             | integer               | not null
 querytype           | character varying     | not null
 graphpartition      | smallint              |
 replicatedpartition | smallint              |
 justifiedpartition  | smallint              |
 hashpartition       | smallint              |
 nodeid              | integer               |
 manualpartition     | smallint              |
Indexes:
    "metarelcloud_transactionlog_pkey" PRIMARY KEY, btree (id)
Check constraints:
    "metarelcloud_transactionlog_graphpartition_check" CHECK
(graphpartition >= 0)
    "metarelcloud_transactionlog_hashpartition_check" CHECK (hashpartition >= 0)
    "metarelcloud_transactionlog_justifiedpartition_check" CHECK
(justifiedpartition >= 0)
    "metarelcloud_transactionlog_manualpartition_check" CHECK
(manualpartition >= 0)
    "metarelcloud_transactionlog_querytype_check" CHECK
(querytype::text = ANY (ARRAY['select'::character varying,
'insert'::character varying, 'delete'::character varying,
'update'::character varying]::text[]))
    "metarelcloud_transactionlog_replicatedpartition_check" CHECK
(replicatedpartition >= 0)

tpcc=# analyze metarelcloud_transactionlog;
ANALYZE

tpcc=# explain insert into metarelcloud_graph (node1, node2, tableid1,
tupleid1, tableid2, tupleid2, weight)
select 0, 0, a.tableid, a.tupleid, b.tableid, b.tupleid, count(*)
from metarelcloud_transactionlog a, metarelcloud_transactionlog b
where a.transactionid = b.transactionid
          and (a.tableid, a.tupleid) < (b.tableid, b.tupleid)
group by a.tableid, a.tupleid, b.tableid, b.tupleid;
                                                          QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan "*SELECT*"  (cost=968062444010.30..1088362355018.20
rows=2673331355731 width=180)
   ->  GroupAggregate  (cost=968062444010.30..1041579056292.91
rows=2673331355731 width=26)
         ->  Sort  (cost=968062444010.30..974745772399.63
rows=2673331355731 width=26)
               Sort Key: a.tableid, a.tupleid, b.tableid, b.tupleid
               ->  Merge Join  (cost=16817274.69..160416950669.79
rows=2673331355731 width=26)
                     Merge Cond: (a.transactionid = b.transactionid)
                     Join Filter: (ROW((a.tableid)::text, a.tupleid) <
ROW((b.tableid)::text, b.tupleid))
                     ->  Sort  (cost=8408637.34..8534662.95
rows=50410244 width=17)
                           Sort Key: a.transactionid
                           ->  Seq Scan on metarelcloud_transactionlog
a  (cost=0.00..925543.44 rows=50410244 width=17)
                     ->  Materialize  (cost=8408637.34..9038765.39
rows=50410244 width=17)
                           ->  Sort  (cost=8408637.34..8534662.95
rows=50410244 width=17)
                                 Sort Key: b.transactionid
                                 ->  Seq Scan on
metarelcloud_transactionlog b  (cost=0.00..925543.44 rows=50410244
width=17)
(14 rows)

--
Yang Zhang
http://www.mit.edu/~y_z/

В списке pgsql-general по дате отправления:

Предыдущее
От: Allan Kamau
Дата:
Сообщение: Re: helo
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Performance cost of a sort-merge join