Обсуждение: New Optimizer Behaviour In 7.0b1
I have been playing a bit with this new release. There are lots of new possible plans, which is really great. I have been using the query shown below to study optimizer changes. select d0.d0f1, count(f.f1) from dim0 d0, fact1 f where d0.d0key = f.d0key and d0.d0f1 between '1999-11-01' and '1999-12-01' group by d0.d0f1 Table setup is : Table "fact1" size 300000 rowsAttribute | Type | Modifier -----------+---------+----------d0key | integer |d1key | integer |f1 | integer | Index: fact1_q1 on d0key Table "dim0" size 900 rowsAttribute | Type | Modifier -----------+-------------+----------d0key | integer |d0f1 | timestamp |d0f2 | varchar(20) |d0f3 | varchar(20) | Indices: dim0_pk on d0key, dim0_q1 on d0f1 Explain is : Aggregate (cost=12205.78..12372.44 rows=3333 width=20) -> Group (cost=12205.78..12289.11 rows=33333 width=20) -> Sort (cost=12205.78..12205.78 rows=33333 width=20) -> Hash Join (cost=21.75..9371.33 rows=33333 width=20) -> Seq Scan on fact1 f (cost=0.00..4765.00 rows=300000 width=8) -> Hash (cost=21.50..21.50 rows=100 width=12) -> Seq Scanon dim0 d0 (cost=0.00..21.50 rows=100 width=12) Initially this ran fairly slowly : 8-10s , the query scans about 9000 out the 300000 in the big table(fact1). A bit of tweeking with the set variables : ( these are new -see src/backend/commands/variable.c ) set cpu_tuple_cost = '0.6'; set enable_hashjoin = 'off'; set enable_mergejoin = 'off'; gave a new plan : Aggregate (cost=0.00..18476945.83 rows=3333 width=20) -> Group (cost=0.00..18476862.50 rows=33333 width=20) -> Nested Loop (cost=0.00..18476779.16 rows=33333 width=20) -> Index Scan using dim0_q1 on dim0 d0 (cost=0.00..81.98 rows=100 width=12) -> Index Scan using fact1_q1 on fact1 f (cost=0.00..4016.97 rows=1500 width=8) which is devestatiingly fast... about 1 s. Note that the table order is reversed and that the index on the big table ( fact1) is used. However it seems a bit on the brutal side to have to coerce the optimizer this way ( after all hash joins are generally good), is there any way to get a reasonably sensible use of indexes without such desperate measures ? P.s : I realize that this is beta 1..... I am impressed, I have had no problems relinking php4 and subsequently apache for use with this release - seems like a very good quality beta 1. well done guys! Mark (markir@ihug.co.nz,mark.kirkwood@hnz.co.nz ) P.p.s : hopefully this is not going to appear twice on this list, my first send bounced.
Mark Kirkwood <markir@ihug.co.nz> writes:
> I have been using the query shown below to study optimizer changes.
> [ results snipped ]
> However it seems a bit on the brutal side to have to coerce the
> optimizer this way ( after all hash joins are generally good), is
> there any way to get a reasonably sensible use of indexes without such
> desperate measures ?
Obviously we'd like to get the optimizer to do the right thing without
being beaten over the head ;-).  As you see, we're not there yet.
I will be the first to say that the 7.0 optimizer is still in a pretty
crude state: I have made drastic changes to its model of plan costs,
and have not yet had much time to tune the results.  I do appreciate
reports about cases it gets wrong.
In this case I guess the first question to ask is whether its
selectivity estimates are any good.  It seems to be estimating that your
"d0.d0f1 between '1999-11-01' and '1999-12-01'" clause will select about
100 of the 900 rows in dim0; is that anywhere near right?  Also, in the
nested-loop plan we can see that it thinks about 1500 rows from fact1
will match on "d0.d0key = f.d0key" against any given selected row from
dim0; is that on the mark?  Finally, is the estimate that the total
number of joined rows (before GROUP BY) is about 33333 any good?
If you have not done VACUUM ANALYZE recently on these two tables,
it'd be worth trying that to see if it brings the estimates any
closer to reality.
        regards, tom lane
			
		Tom Lane wrote: > In this case I guess the first question to ask is whether its > selectivity estimates are any good. It seems to be estimating that your > "d0.d0f1 between '1999-11-01' and '1999-12-01'" clause will select about > 100 of the 900 rows in dim0; is that anywhere near right? Also, in the > nested-loop plan we can see that it thinks about 1500 rows from fact1 > will match on "d0.d0key = f.d0key" against any given selected row from > dim0; is that on the mark? Finally, is the estimate that the total > number of joined rows (before GROUP BY) is about 33333 any good? > > If you have not done VACUUM ANALYZE recently on these two tables, > it'd be worth trying that to see if it brings the estimates any > closer to reality. > > regards, tom lane Tom, Here is the row data for comparison with the selectivity estimates: select count(*) from dim0 d0 where d0.d0f1 between '1999-11-01' and '1999-12-01' 31 rows select count(*) from fact1 where d0key = <value> 3000 rows total number of joined rows before group by 9000 rows ( i.e : there are only 3 distinct d0key values in fact1 for the "month" , and each one has 3000 rows ) It looks like the estimate on the big table ( fact1 ) are right order of magnitude, but the small table ( dim0 ) ones are too high (and presumably ) throwing the rest off I did a vacuum analyze of these tables again, just in case....( no change to the plans) Cheers Mark
Mark Kirkwood <markir@ihug.co.nz> writes:
> It looks like the estimate on the big table ( fact1 ) are right order of
> magnitude, but the small table ( dim0 ) ones are too high (and presumably )
> throwing the rest off
At least part of the problem is that the selectivity code doesn't know
how to deal with date/time datatypes :-(.  I was waiting for Thomas to
commit the great date/time reorganization before I wrote that code.
But he's done now, so I've gone in and fixed convert_to_scalar to know
about all the surviving date/time types.
If you like, you can pick up the updated version of 
src/backend/utils/adt/selfuncs.c from the CVS server, or grab a
snapshot tarball dated later than this message, and see whether
the selectivity estimates get any better.
        regards, tom lane
			
		Tom Lane wrote: > > If you like, you can pick up the updated version of > src/backend/utils/adt/selfuncs.c from the CVS server, or grab a > snapshot tarball dated later than this message, and see whether > the selectivity estimates get any better. > > regards, tom lane I will give one of those a go.... Cheers Mark
>I am searching after the date formatting solution, but didn't found a
>correct one.
>
>So example ORACLE has
>    - to_char function to make any kind of date format from a date
type:
>to_char(now,'yyyy.mm.dd')
>    - to_date to makes a date type from any kind of string: to_date
>('1999.11.23','yyyy.mm.dd')
>(sniped...)
I have wrestled a bit with this myself - being an Oracle dba ( but i'm
alright now....),
the date <-> char conversion is very different is Postgresql.
However I have found that the "date_part" and "interval" functions seem
to do the job ok,
sometimes combined with the casting operator  "::datetime".
e.g : ( for release 7.0 )
date -> number
db1=> select date_part('year',now());date_part
-----------     2000
and  char -> interval
db1=> select interval('4 hours'::interval);?column?
----------04:00
and char -> datetime via interval
db1=> select ('1999-01-01 '||interval('4 hours'::interval))::datetime;       ?column?
------------------------1999-01-01 04:00:00+13
I hope this helps a bit...
Cheers
Mark
			
		
On Thu, 2 Mar 2000, Mark Kirkwood wrote:
> >I am searching after the date formatting solution, but didn't found a
> >correct one.
> >
> >So example ORACLE has
> >    - to_char function to make any kind of date format from a date
> type:
> >to_char(now,'yyyy.mm.dd')
> >    - to_date to makes a date type from any kind of string: to_date
> >('1999.11.23','yyyy.mm.dd')
> >(sniped...)
> 
> I have wrestled a bit with this myself - being an Oracle dba ( but i'm
> alright now....),
> the date <-> char conversion is very different is Postgresql.
> 
> However I have found that the "date_part" and "interval" functions seem
> to do the job ok,
> sometimes combined with the casting operator  "::datetime".
> 
> e.g : ( for release 7.0 )
> date -> number
The release 7.0 has to_char/to_date/to_timestamp functions too,
and it is very compatible with Oracle's to_char(). See the PostgreSQL
documentation for more details.
                    Karel