Обсуждение: Superfluous merge/sort

Поиск
Список
Период
Сортировка

Superfluous merge/sort

От
Anuradha Ratnaweera
Дата:
Question in brief: does the planner/optimizer take into account the
foreign key constraints?

If the answer is "no", please stop reading here.

Here comes the details.  Let me give a simple example to illustrate the
situation.

1. We have two tables t1 and t2.

   create table t1 (
       id integer primary key,
       dummy integer
   );

   create table t2 (
       id integer,
       dummy integer
   );

2. We create indexes on all the non-pkey fields.

   create index t1_dummy_idx on t1(dummy);
   create index t2_id_idx on t2(id);
   create index t2_dummy_idx on t2(dummy);

3. We make t2(id) a foreign key of t1(id).

   alter table t2 add constraint t2_fkey foreign key (id) references t1(id);

4. Populate "t1" with unique "id"s from 0 to 19999 with a dummy value.

   copy "t1" from stdin;
   0            654
   1            86097
   ...
   19998        93716
   19999        9106
   \.

5. Populate "t2" with 50000 "id"s with a normal distribution.

   copy "t2" from stdin;
   8017         98659
   11825        5946
   ...
   8202         35994
   8436         19729
   \.

Now we are ready to go ...


- First query is to find the "dummy" values with highest frequency.

  => explain select dummy from t2 group by dummy order by count(*) desc limit 10;
  NOTICE:  QUERY PLAN:

  Limit  (cost=2303.19..2303.19 rows=10 width=4)
    ->  Sort  (cost=2303.19..2303.19 rows=5000 width=4)
          ->  Aggregate  (cost=0.00..1996.00 rows=5000 width=4)
                ->  Group  (cost=0.00..1871.00 rows=50000 width=4)
                      ->  Index Scan using t2_dummy_idx on t2  (cost=0.00..1746.00 rows=50000 width=4)

  EXPLAIN


- Second query is esseitially the same, but we do a merge with "t1" on
  the foreign key (just for the sake of illustrating the point).

  => explain select t2.dummy from t1, t2 where t1.id = t2.id group by t2.dummy order by count(*) desc limit 10;
  NOTICE:  QUERY PLAN:

  Limit  (cost=7643.60..7643.60 rows=10 width=12)
    ->  Sort  (cost=7643.60..7643.60 rows=5000 width=12)
          ->  Aggregate  (cost=7086.41..7336.41 rows=5000 width=12)
                ->  Group  (cost=7086.41..7211.41 rows=50000 width=12)
                      ->  Sort  (cost=7086.41..7086.41 rows=50000 width=12)
                            ->  Merge Join  (cost=0.00..3184.00 rows=50000 width=12)
                                  ->  Index Scan using t1_pkey on t1  (cost=0.00..638.00 rows=20000 width=4)
                                  ->  Index Scan using t2_id_idx on t2  (cost=0.00..1746.00 rows=50000 width=8)

  EXPLAIN


Does this mean that the planner/optimizer doesn't take into account the
foreign key constraint?

    Anuradha

--

Debian GNU/Linux (kernel 2.4.21-pre4)

Don't worry about the world coming to an end today.  It's already tomorrow
in Australia.
        -- Charles Schulz


Re: Superfluous merge/sort

От
Stephan Szabo
Дата:
On Tue, 25 Feb 2003, Anuradha Ratnaweera wrote:

> Question in brief: does the planner/optimizer take into account the
> foreign key constraints?
>
> If the answer is "no", please stop reading here.

Not really.  However, as a note, from t1,t2 where t1.id=t2.id is not
necessarily an identity even with the foreign key due to NULLs.


Re: Superfluous merge/sort

От
Anuradha Ratnaweera
Дата:
On Tue, Feb 25, 2003 at 07:56:14AM -0800, Stephan Szabo wrote:
>
> On Tue, 25 Feb 2003, Anuradha Ratnaweera wrote:
>
> > Question in brief: does the planner/optimizer take into account the
> > foreign key constraints?
> >
> > If the answer is "no", please stop reading here.
>
> Not really.  However, as a note, from t1,t2 where t1.id=t2.id is not
> necessarily an identity even with the foreign key due to NULLs.

"not null" doesn't make a difference, either :-(

    Anuradha

--

Debian GNU/Linux (kernel 2.4.21-pre4)

It's not Camelot, but it's not Cleveland, either.
        -- Kevin White, Mayor of Boston


Re: Superfluous merge/sort

От
Stephan Szabo
Дата:
On Wed, 26 Feb 2003, Anuradha Ratnaweera wrote:

> On Tue, Feb 25, 2003 at 07:56:14AM -0800, Stephan Szabo wrote:
> >
> > On Tue, 25 Feb 2003, Anuradha Ratnaweera wrote:
> >
> > > Question in brief: does the planner/optimizer take into account the
> > > foreign key constraints?
> > >
> > > If the answer is "no", please stop reading here.
> >
> > Not really.  However, as a note, from t1,t2 where t1.id=t2.id is not
> > necessarily an identity even with the foreign key due to NULLs.
>
> "not null" doesn't make a difference, either :-(

No, but the two queries you gave aren't equivalent without a not null
constraint and as such treating the second as the first is simply wrong
without it. ;)

The big thing is that checking this would be a cost to all queries (or at
least any queries with joins).  You'd probably have to come up with a
consistent set of rules on when the optimization applies (*) and then show
that there's a reasonable way to check for the case that's significantly
not expensive (right now I think it'd involve looking at the constraint
table, making sure that all columns of the constraint are referenced and
only in simple ways).

(*) - I haven't done enough checking to say that the following is
  sufficient, but it'll give an idea:
 Given t1 and t2 where t2 is the foreign key table and t1 is the
  primary key table in a foreign key constraint, a select that has no
  column references to t1 other than to the key fields of the foreign key
  directly in the where clause where the condition is simply
  t1.pcol = t2.fcol (or reversed) and all key fields of the constraint
  are so referenced then there exist two possible optimizations
   if all of the foreign key constraint columns in t2 are marked as
    not null, the join to t1 is redundant and it and the conditions
     that reference it can be simply removed
   otherwise, the join to t1 and the conditions that reference may be
    replaced with a set of conditions (t2.fcol1 is not null [and t2.fcol2
    is not null ...]) anded to any other where clause elements