Обсуждение: optimizer tuning/forcing correct index use
Greetings,
I have a query that sometimes refuses to use a sensible index.
Some info first:
Version:        PostgreSQL v 7.2
OS:     SuSE Linux 7.2 w/Kernel 2.4.14
HW:     Dual Athlon MP1800, 768MB RAM       DB exists on a single fast SCSI drive.
DB was loaded with a copy of our production system data (currently
running on a SQLServer database).  'vacuumdb -a -z' was run after
load, no rows have been added or deleted since vacuum.
Here is the query:
explain analyze
select  t.tb_order_number, f.account_number, f.clearing_account_id,  f.symbol_full_name, f.fill_quantity,
f.buy_or_sell, f.trader_username, f.fill_price, f.fill_quantity, f.fill_ts,  f.last_update_ts, s.symbol_type,
s.base_symbol_full_name, s.symbol_name, ca.clearing_account_number, ca.clearing_firm,  co.tick_value
 
from  fill f,  tb_order t,  symbol s,  clearing_account ca,  contract co
where  f.fill_ts > '2002-02-06 00:00:00' and  f.fill_ts <= '2002-02-08 23:59:59' and  t.order_id = f.order_id and
s.full_name= f.symbol_full_name and  co.contract_name = s.contract_name and  ca.clearing_account_id =
f.clearing_account_idand  s.symbol_name not in ('ISLD:TESTA', 'ISLD:TESTB', 'ISLD:TESTC');
 
Total number of rows in the tables of the join:
fill            : 7674725
tb_order        : 20059204
symbol            : 559
clearing_account    : 57
contract        : 13
When the time range (defined by fill_ts) is under 48 hours, the
following execution plan is used:
Nested Loop  (cost=35.98..477477.54 rows=37403 width=170) (actual time=6.38..7868.65 rows=143582 loops=1) ->  Hash Join
(cost=35.98..256958.14 rows=37403 width=152) (actual time=6.21..3937.04 rows=143582 loops=1)       ->  Hash Join
(cost=1.71..256229.53rows=45364 width=90) (actual time=0.50..2685.68 rows=143582 loops=1)             ->  Index Scan
usingfill_ak2 on fill f  (cost=0.00..255314.58 rows=69239 width=66) (actual time=0.26..1875.38 rows=143582 loops=1)
       ->  Hash  (cost=1.57..1.57 rows=57 width=24) (actual time=0.20..0.20 rows=0 loops=1)                   ->  Seq
Scanon clearing_account ca  (cost=0.00..1.57 rows=57 width=24) (actual time=0.05..0.14 rows=57 loops=1)       ->  Hash
(cost=32.89..32.89rows=547 width=62) (actual time=5.66..5.66 rows=0 loops=1)             ->  Hash Join
(cost=1.16..32.89rows=547 width=62) (actual time=0.21..4.73 rows=559 loops=1)                   ->  Seq Scan on symbol
s (cost=0.00..20.78 rows=547 width=48) (actual time=0.05..2.43 rows=559 loops=1)                   ->  Hash
(cost=1.13..1.13rows=13 width=14) (actual time=0.11..0.11 rows=0 loops=1)                         ->  Seq Scan on
contractco  (cost=0.00..1.13 rows=13 width=14) (actual time=0.06..0.08 rows=13 loops=1) ->  Index Scan using order_pk
ontb_order t  (cost=0.00..5.88 rows=1 width=18) (actual time=0.02..0.02 rows=1 loops=143582)
 
Total runtime: 7934.64 msec
Somewhere above 48 hours the plan changes to this:
Nested Loop  (cost=35.98..644155.97 rows=56104 width=170) (actual time=48504.60..128512.95 rows=196483 loops=1) ->
HashJoin  (cost=35.98..313376.24 rows=56104 width=152) (actual time=48450.25..121546.48 rows=196483 loops=1)       ->
HashJoin  (cost=1.71..312300.45 rows=68045 width=90) (actual time=48444.62..119822.68 rows=196483 loops=1)
-> Seq Scan on fill f  (cost=0.00..310928.88 rows=103859 width=66) (actual time=48444.38..118737.56 rows=196483
loops=1)            ->  Hash  (cost=1.57..1.57 rows=57 width=24) (actual time=0.19..0.19 rows=0 loops=1)
  ->  Seq Scan on clearing_account ca  (cost=0.00..1.57 rows=57 width=24) (actual time=0.05..0.13 rows=57 loops=1)
->  Hash  (cost=32.89..32.89 rows=547 width=62) (actual time=5.57..5.57 rows=0 loops=1)             ->  Hash Join
(cost=1.16..32.89rows=547 width=62) (actual time=0.21..4.64 rows=559 loops=1)                   ->  Seq Scan on symbol
s (cost=0.00..20.78 rows=547 width=48) (actual time=0.05..2.34 rows=559 loops=1)                   ->  Hash
(cost=1.13..1.13rows=13 width=14) (actual time=0.11..0.11 rows=0 loops=1)                         ->  Seq Scan on
contractco  (cost=0.00..1.13 rows=13 width=14) (actual time=0.07..0.09 rows=13 loops=1) ->  Index Scan using order_pk
ontb_order t  (cost=0.00..5.88 rows=1 width=18) (actual time=0.03..0.03 rows=1 loops=196483)
 
Total runtime: 128598.23 msec
The critical part is driving the fill table query from a sequential scan
rather than a scan of fill_ak2.
I believe I understand why the optimizer changes the plan: it thinks
that the larger time range will make the fill_ak2 index scan a more
expensive option than a sequential scan of the fill table.  Can
someone recommend a way for me to show the optimizer the error of its
ways?  (short of using 'set enable_seqscan = no', that is).
Incidentally, in our current SQL server database I use a hint to tell
the optimizer specifically what index to use.  When using Oracle
databases in the past, I've had to do the same thing.  I understand
the PostgreSQL developers are against this approach, preferring to
make the optimizer smarter instead.  I agree with this in principal,
however, it seems to me that until the optimizer is perfect, this type
of feature is needed.  Comments?
-K
			
		On Tuesday 19 March 2002 09:28 am, Tom Lane wrote: > Kelly Burkhart <kelly@tradebotsystems.com> writes: > > The critical part is driving the fill table query from a sequential scan > > rather than a scan of fill_ak2. > > Have you done an ANALYZE or VACUUM ANALYZE lately? If so, what do you > get from > > select * from pg_stats where tablename = 'fill'; > select * from pg_class where relname = 'fill'; > > Offhand I am guessing that the table is fairly well ordered by fill_ts > and the planner is underestimating the effects of this. There is a > provision in there to try to account for data ordering, but it's new > code in 7.2 and doubtless still needs refinement. I've attached the results of those queries. The table was loaded in one week chunks, mostly but not entirely in order. 11/2001-3/2002 was loaded first in order, then 1/2001-10/2001 was loaded some time later. Each dump file is probably in order, if you trust that SQL Server used the index order in the query (which you probably can). In the future, when/if this database becomes production, each week we will nuke all rows older than 1 year (or maybe 2 years... who knows). So after a point, the db should grow little. -K
Kelly Burkhart <kelly@tradebotsystems.com> writes:
>> Offhand I am guessing that the table is fairly well ordered by fill_ts
>> and the planner is underestimating the effects of this.  There is a
>> provision in there to try to account for data ordering, but it's new
>> code in 7.2 and doubtless still needs refinement.
fill      | fill_ts               |         0 |         8 |     152655 | {"2001-10-22 15:28:07-05","2001-10-22
15:28:16-05","2001-10-2212:43:28-05","2001-08-13 08:49:19-05","2001-08-13 08:49:41-05","2001-09-25
16:13:41-05","2001-10-1009:04:33-05","2001-10-22 14:50:05-05","2001-10-31 14:05:43-06","2002-01-07 13:35:48-06"} |
{0.002,0.002,0.001,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667}| {"2001-01-10
14:13:01-06","2001-07-1013:05:01-05","2001-09-10 09:26:08-05","2001-10-15 09:01:54-05","2001-11-02
09:49:58-06","2001-11-2809:36:33-06","2001-12-21 08:38:03-06","2002-01-15 09:34:59-06","2002-02-04
11:34:20-06","2002-02-2509:39:55-06","2002-03-08 14:43:10-06"} |   -0.546947
 
Hmm.  So the correlation of fill_ts with physical position is actually
negative, according to the analyze results.  Still, -0.54 represents
rather strong correlation which would reduce the cost of the index scan.
There was some discussion a couple weeks ago on the pgsql-bugs list about
changing the equation the planner uses to estimate the effects of
correlation order.  Are you interested in experimenting?  I previously
said:
: If you look in cost_index (see approx. lines 270-340 in
: src/backend/optimizer/path/costsize.c) you'll see that it computes
: access cost estimates for both the perfectly sequential case and
: the perfectly uncorrelated case, and then tries to interpolate
: between them.  I have reasonable faith in both of the endpoint
: estimation methods, but very little in the interpolation equation ---
: it was chosen on the spur of the moment and hasn't really been tested.
: 
: It might be interesting to replace csquared with just
: fabs(indexCorrelation) to see if the results are better.
        regards, tom lane
			
		Kelly Burkhart <kelly@tradebotsystems.com> writes:
> The critical part is driving the fill table query from a sequential scan
> rather than a scan of fill_ak2.
Have you done an ANALYZE or VACUUM ANALYZE lately?  If so, what do you
get from
select * from pg_stats where tablename = 'fill';
select * from pg_class where relname = 'fill';
Offhand I am guessing that the table is fairly well ordered by fill_ts
and the planner is underestimating the effects of this.  There is a
provision in there to try to account for data ordering, but it's new
code in 7.2 and doubtless still needs refinement.
        regards, tom lane
			
		On Tuesday 19 March 2002 11:32 am, Tom Lane wrote: <snip> > > Hmm. So the correlation of fill_ts with physical position is actually > negative, according to the analyze results. Still, -0.54 represents > rather strong correlation which would reduce the cost of the index scan. > > There was some discussion a couple weeks ago on the pgsql-bugs list about > changing the equation the planner uses to estimate the effects of > correlation order. Are you interested in experimenting? I previously > > said: > : If you look in cost_index (see approx. lines 270-340 in > : src/backend/optimizer/path/costsize.c) you'll see that it computes > : access cost estimates for both the perfectly sequential case and > : the perfectly uncorrelated case, and then tries to interpolate > : between them. I have reasonable faith in both of the endpoint > : estimation methods, but very little in the interpolation equation --- > : it was chosen on the spur of the moment and hasn't really been tested. > : > : It might be interesting to replace csquared with just > : fabs(indexCorrelation) to see if the results are better. in costsize.c:334 I've changed: csquared = indexCorrelation * indexCorrelation; to: csquared = fabs(indexCorrelation); It looks like the estimated cost of the fill index scan is lower, but not enough to change the plan. I'll be happy to run more tests if you can think of other things to try. -K