slower merge join on sorted data chosen over nested loop

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема slower merge join on sorted data chosen over nested loop
Дата
Msg-id s34574e9.084@gwmta.wicourts.gov
обсуждение исходный текст
Ответы Re: slower merge join on sorted data chosen over nested loop  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
In both the 8.1beta2 and using a build from this morning's
dev snapshot, this query ran slower than expected:

select count(*) from "DbTranRepository" AS "dtr" inner join "DbTranLogRecord" AS "dtlr"   on (    "dtlr"."countyNo"
 = "dtr"."countyNo"       and "dtlr"."tranImageSeqNo" = "dtr"."tranImageSeqNo"       and "dtlr"."tableName"      =
'Cal'     ) where "dtr"."countyNo" = 40   and "dtr"."timestampValue" between '2005-09-30 00:00:00' and '2005-10-01
00:00:00';

Fresh restore from dump.  ANALYZE run.  No other activity.
Autovacuum was disabled prior to the restore and test.
8GB dual Xeon SuSE 9.3 system hitting RAID 5.


With the OS cache flushed by dd:
Aggregate  (cost=176218.64..176218.65 rows=1 width=0) (actual time=32081.623..32081.626 rows=1 loops=1)  ->  Merge Join
(cost=175627.39..176218.09 rows=218 width=0) (actual time=31774.573..32071.776 rows=3007 loops=1)        Merge Cond:
("outer"."?column3?"= "inner"."?column3?")        ->  Sort  (cost=48913.06..48975.72 rows=25067 width=17) (actual
time=5828.984..5954.706rows=39690 loops=1)              Sort Key: (dtr."tranImageSeqNo")::numeric              ->
BitmapHeap Scan on "DbTranRepository" dtr  (cost=297.07..47081.47 rows=25067 width=17) (actual time=69.056..5560.895
rows=39690loops=1)                    Recheck Cond: ((("countyNo")::smallint = 40) AND (("timestampValue")::timestamp
withtime zone >= '2005-09-30 00:00:00-05'::timestamp with time zone) AND (("timestampValue")::timestamp with time zone
<='2005-10-01 00:00:00-05'::timestamp with time zone))                    ->  Bitmap Index Scan on
"DbTranRepository_timestamp" (cost=0.00..297.07 rows=25067 width=0) (actual time=52.675..52.675 rows=39690 loops=1)
                    Index Cond: ((("countyNo")::smallint = 40) AND (("timestampValue")::timestamp with time zone >=
'2005-09-3000:00:00-05'::timestamp with time zone) AND (("timestampValue")::timestamp with time zone <= '2005-10-01
00:00:00-05'::timestampwith time zone))        ->  Sort  (cost=126714.34..126945.94 rows=92640 width=17) (actual
time=25130.457..25551.291rows=109605 loops=1)              Sort Key: (dtlr."tranImageSeqNo")::numeric              ->
BitmapHeap Scan on "DbTranLogRecord" dtlr  (cost=1037.84..119071.84 rows=92640 width=17) (actual
time=203.528..23588.122rows=114113 loops=1)                    Recheck Cond: ((40 = ("countyNo")::smallint) AND
(("tableName")::text= 'Cal'::text))                    ->  Bitmap Index Scan on "DbTranLogRecord_tableName"
(cost=0.00..1037.84rows=92640 width=0) (actual time=146.412..146.412 rows=114113 loops=1)
IndexCond: ((40 = ("countyNo")::smallint) AND (("tableName")::text = 'Cal'::text))Total runtime: 32091.211 ms 


With the OS cache populated by previous run of this plan:
Aggregate  (cost=176218.64..176218.65 rows=1 width=0) (actual time=4510.590..4510.593 rows=1 loops=1)  ->  Merge Join
(cost=175627.39..176218.09rows=218 width=0) (actual time=4203.451..4500.785 rows=3007 loops=1)        Merge Cond:
("outer"."?column3?"= "inner"."?column3?")        ->  Sort  (cost=48913.06..48975.72 rows=25067 width=17) (actual
time=686.388..812.160rows=39690 loops=1)              Sort Key: (dtr."tranImageSeqNo")::numeric              ->  Bitmap
HeapScan on "DbTranRepository" dtr  (cost=297.07..47081.47 rows=25067 width=17) (actual time=30.432..427.463 rows=39690
loops=1)                   Recheck Cond: ((("countyNo")::smallint = 40) AND (("timestampValue")::timestamp with time
zone>= '2005-09-30 00:00:00-05'::timestamp with time zone) AND (("timestampValue")::timestamp with time zone <=
'2005-10-0100:00:00-05'::timestamp with time zone))                    ->  Bitmap Index Scan on
"DbTranRepository_timestamp" (cost=0.00..297.07 rows=25067 width=0) (actual time=19.089..19.089 rows=39690 loops=1)
                    Index Cond: ((("countyNo")::smallint = 40) AND (("timestampValue")::timestamp with time zone >=
'2005-09-3000:00:00-05'::timestamp with time zone) AND (("timestampValue")::timestamp with time zone <= '2005-10-01
00:00:00-05'::timestampwith time zone))        ->  Sort  (cost=126714.34..126945.94 rows=92640 width=17) (actual
time=2705.007..3124.966rows=109605 loops=1)              Sort Key: (dtlr."tranImageSeqNo")::numeric              ->
BitmapHeap Scan on "DbTranLogRecord" dtlr  (cost=1037.84..119071.84 rows=92640 width=17) (actual time=96.046..1195.302
rows=114113loops=1)                    Recheck Cond: ((40 = ("countyNo")::smallint) AND (("tableName")::text =
'Cal'::text))                   ->  Bitmap Index Scan on "DbTranLogRecord_tableName"  (cost=0.00..1037.84 rows=92640
width=0)(actual time=54.917..54.917 rows=114113 loops=1)                          Index Cond: ((40 =
("countyNo")::smallint)AND (("tableName")::text = 'Cal'::text))Total runtime: 4520.130 ms 


Without the option to use merge join, it picked a plan which ran in
one-third to one-half the time, depending on caching.


dtr=> set enable_mergejoin=off


With the OS cache flushed by dd:
Aggregate  (cost=298741.48..298741.49 rows=1 width=0) (actual time=14289.207..14289.210 rows=1 loops=1)  ->  Nested
Loop (cost=0.00..298740.94 rows=218 width=0) (actual time=143.436..14277.124 rows=3007 loops=1)        ->  Index Scan
using"DbTranRepository_timestamp" on "DbTranRepository" dtr  (cost=0.00..49419.45 rows=25067 width=17) (actual
time=33.625..11510.723rows=39690 loops=1)              Index Cond: ((("countyNo")::smallint = 40) AND
(("timestampValue")::timestampwith time zone >= '2005-09-30 00:00:00-05'::timestamp with time zone) AND
(("timestampValue")::timestampwith time zone <= '2005-10-01 00:00:00-05'::timestamp with time zone))        ->  Index
Scanusing "DbTranLogRecordPK" on "DbTranLogRecord" dtlr  (cost=0.00..9.93 rows=1 width=17) (actual time=0.062..0.062
rows=0loops=39690)              Index Cond: ((40 = (dtlr."countyNo")::smallint) AND ((dtlr."tranImageSeqNo")::numeric =
("outer"."tranImageSeqNo")::numeric))             Filter: (("tableName")::text = 'Cal'::text)Total runtime: 14289.305
ms


With the OS cache populated by previous run of this plan:
Aggregate  (cost=298741.48..298741.49 rows=1 width=0) (actual time=1476.593..1476.596 rows=1 loops=1)  ->  Nested Loop
(cost=0.00..298740.94rows=218 width=0) (actual time=1.373..1464.720 rows=3007 loops=1)        ->  Index Scan using
"DbTranRepository_timestamp"on "DbTranRepository" dtr  (cost=0.00..49419.45 rows=25067 width=17) (actual
time=0.083..412.268rows=39690 loops=1)              Index Cond: ((("countyNo")::smallint = 40) AND
(("timestampValue")::timestampwith time zone >= '2005-09-30 00:00:00-05'::timestamp with time zone) AND
(("timestampValue")::timestampwith time zone <= '2005-10-01 00:00:00-05'::timestamp with time zone))        ->  Index
Scanusing "DbTranLogRecordPK" on "DbTranLogRecord" dtlr  (cost=0.00..9.93 rows=1 width=17) (actual time=0.019..0.019
rows=0loops=39690)              Index Cond: ((40 = (dtlr."countyNo")::smallint) AND ((dtlr."tranImageSeqNo")::numeric =
("outer"."tranImageSeqNo")::numeric))             Filter: (("tableName")::text = 'Cal'::text)Total runtime: 1476.681 ms 


listen_addresses = '*'          # what IP interface(s) to listen on;
max_connections = 100                   # note: increasing max_connections costs
shared_buffers = 20000                  # min 16 or max_connections*2, 8KB each
work_mem = 10240                        # min 64, size in KB
max_fsm_pages = 100000                  # min max_fsm_relations*16, 6 bytes each
wal_buffers = 20                        # min 4, 8KB each
effective_cache_size = 917504           # typically 8KB each
random_page_cost = 2                    # units are one sequential page fetch
log_line_prefix = '[%m] %p %q<%u %d %r> '                       # Special values:
stats_start_collector = on
stats_row_level = on
autovacuum = false                      # enable autovacuum subprocess?
autovacuum_naptime = 10         # time between autovacuum runs, in secs
autovacuum_vacuum_threshold = 1 # min # of tuple updates before
autovacuum_analyze_threshold = 1        # min # of tuple updates before
autovacuum_vacuum_scale_factor = 0.2    # fraction of rel size before
autovacuum_analyze_scale_factor = 0.1   # fraction of rel size before
lc_messages = 'C'                       # locale for system error message
lc_monetary = 'C'                       # locale for monetary formatting
lc_numeric = 'C'                        # locale for number formatting
lc_time = 'C'                           # locale for time formatting
sql_inheritance = off

If random_page_cost is reduced to 1.2, index scans are used
instead of bitmap index scans.  That makes little difference for
the run time with this query -- at least when its data is cached.

           Table "public.DbTranRepository"     Column      |          Type          | Modifiers
------------------+------------------------+-----------countyNo         | "CountyNoT"            | not
nulltranImageSeqNo  | "TranImageSeqNoT"      | not nulltimestampValue   | "TimestampT"           | not
nulltransactionImage| "ImageT"               |status           | character(1)           | not nullqueryName        |
"QueryNameT"          |runDuration      | numeric(15,0)          |userId           | "UserIdT"
|functionalArea  | character varying(50)  |sourceRef        | character varying(255) |url              | "URLT"
       |tranImageSize    | numeric(15,0)          | 
Indexes:   "DbTranRepositoryPK" PRIMARY KEY, btree ("countyNo", "tranImageSeqNo")   "DbTranRepository_timestamp" btree
("countyNo","timestampValue")   "DbTranRepository_userId" btree ("countyNo", "userId", "timestampValue") 

        Table "public.DbTranLogRecord"    Column     |       Type        | Modifiers
----------------+-------------------+-----------countyNo       | "CountyNoT"       | not nulltranImageSeqNo |
"TranImageSeqNoT"| not nulllogRecordSeqNo | "LogRecordSeqNoT" | not nulloperation      | "OperationT"      | not
nulltableName     | "TableNameT"      | not null 
Indexes:   "DbTranLogRecordPK" PRIMARY KEY, btree ("countyNo", "tranImageSeqNo", "logRecordSeqNo")
"DbTranLogRecord_tableName"btree ("countyNo", "tableName", operation) 

Need any other info?
-Kevin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: COPY FROM with CSV header
Следующее
От: Tom Lane
Дата:
Сообщение: Vote needed: revert beta2 changes or not?