Обсуждение: Query planner and foreign key constraints

Поиск
Список
Период
Сортировка

Query planner and foreign key constraints

От
Christian Schröder
Дата:
Hi list,
in our PostgreSQL 8.2.9 database I have these tables:

    create table table1 (
        key1 char(12),
        key2 integer,
        primary key (key1, key2)
    );

    create table table2 (
        key1 char(12),
        key2 integer,
        key3 varchar(20),
        primary key (key1, key2, key3),
        foreign key (key1, key2) references table1 (key1, key2)
    );

Table1 has 896576 rows. Table2 has 1630788 rows. The statistics target
of the columns key1 and key2 in both tables has been set to 1000. Both
tables have been analyzed.
When I join both tables using key1 and key2 there will be exactly
1630788 rows because for each row in table2 there *must* exist a row in
table1. But the query planner doesn't think so:

# explain analyze select * from table1 inner join table2 using (key1, key2);
                                                                  QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..94916.58 rows=39560 width=44) (actual
time=0.103..7105.960 rows=1630788 loops=1)
   Merge Cond: ((table1.key1 = table2.key1) AND (table1.key2 = table2.key2))
   ->  Index Scan using table1_pkey on table1  (cost=0.00..22677.65
rows=896576 width=20) (actual time=0.038..689.059 rows=896576 loops=1)
   ->  Index Scan using table2_pkey on table2  (cost=0.00..59213.16
rows=1630788 width=44) (actual time=0.049..1108.220 rows=1630788 loops=1)
 Total runtime: 7525.492 ms
(5 rows)

You can also find the query plan at
http://explain-analyze.info/query_plans/2648-query-plan-1371.

What can I do to make the query planner realize that the join will have
1630788 rows? This join is part of a view which I then use in other
joins and this wrong assumption leads to really bad performance.

Regards,
    Christian

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer



Re: Query planner and foreign key constraints

От
"Filip Rembiałkowski"
Дата:


2008/12/30 Christian Schröder <cs@deriva.de>
Hi list,
in our PostgreSQL 8.2.9 database I have these tables:

  create table table1 (
      key1 char(12),
      key2 integer,
      primary key (key1, key2)
  );

  create table table2 (
      key1 char(12),
      key2 integer,
      key3 varchar(20),
      primary key (key1, key2, key3),
      foreign key (key1, key2) references table1 (key1, key2)
  );

Table1 has 896576 rows. Table2 has 1630788 rows. The statistics target of the columns key1 and key2 in both tables has been set to 1000. Both tables have been analyzed.
When I join both tables using key1 and key2 there will be exactly 1630788 rows because for each row in table2 there *must* exist a row in table1. But the query planner doesn't think so:

# explain analyze select * from table1 inner join table2 using (key1, key2);
                                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Merge Join  (cost=0.00..94916.58 rows=39560 width=44) (actual time=0.103..7105.960 rows=1630788 loops=1)
 Merge Cond: ((table1.key1 = table2.key1) AND (table1.key2 = table2.key2))
 ->  Index Scan using table1_pkey on table1  (cost=0.00..22677.65 rows=896576 width=20) (actual time=0.038..689.059 rows=896576 loops=1)
 ->  Index Scan using table2_pkey on table2  (cost=0.00..59213.16 rows=1630788 width=44) (actual time=0.049..1108.220 rows=1630788 loops=1)
Total runtime: 7525.492 ms
(5 rows)

You can also find the query plan at http://explain-analyze.info/query_plans/2648-query-plan-1371.

What can I do to make the query planner realize that the join will have 1630788 rows? This join is part of a view which I then use in other joins and this wrong assumption leads to really bad performance.

just a guess, but - did you try to declare NOT NULL on FK columns?

your assumption that "for each row in table2 there *must* exist a row in table1" will be enforced then.

maybe the planner will make use of this ...


--
Filip Rembiałkowski

Re: Query planner and foreign key constraints

От
Christian Schröder
Дата:
Filip Rembiałkowski wrote:
>
>       create table table1 (
>           key1 char(12),
>           key2 integer,
>           primary key (key1, key2)
>       );
>
>       create table table2 (
>           key1 char(12),
>           key2 integer,
>           key3 varchar(20),
>           primary key (key1, key2, key3),
>           foreign key (key1, key2) references table1 (key1, key2)
>       );
>
>
> just a guess, but - did you try to declare NOT NULL on FK columns?
>
> your assumption that "for each row in table2 there *must* exist a row
> in table1" will be enforced then.
>
> maybe the planner will make use of this ...
All columns are implictly declared "not null" because they are part of
the primary key of the tables:

# \d table1
       Table "public.table1"
 Column |     Type      | Modifiers
--------+---------------+-----------
 key1   | character(12) | not null
 key2   | integer       | not null
Indexes:
    "table1_pkey" PRIMARY KEY, btree (key1, key2)

# \d table2
           Table "public.table2"
 Column |         Type          | Modifiers
--------+-----------------------+-----------
 key1   | character(12)         | not null
 key2   | integer               | not null
 key3   | character varying(20) | not null
Indexes:
    "table2_pkey" PRIMARY KEY, btree (key1, key2, key3)
Foreign-key constraints:
    "table2_key1_fkey" FOREIGN KEY (key1, key2) REFERENCES table1(key1,
key2)

Regards,
    Christian

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer



Re: Query planner and foreign key constraints

От
Christian Schröder
Дата:
Christian Schröder wrote:
> in our PostgreSQL 8.2.9 database I have these tables:
>
>    create table table1 (
>        key1 char(12),
>        key2 integer,
>        primary key (key1, key2)
>    );
>
>    create table table2 (
>        key1 char(12),
>        key2 integer,
>        key3 varchar(20),
>        primary key (key1, key2, key3),
>        foreign key (key1, key2) references table1 (key1, key2)
>    );
>
> Table1 has 896576 rows. Table2 has 1630788 rows. The statistics target
> of the columns key1 and key2 in both tables has been set to 1000. Both
> tables have been analyzed.
> When I join both tables using key1 and key2 there will be exactly
> 1630788 rows because for each row in table2 there *must* exist a row
> in table1. But the query planner doesn't think so:
>
> # explain analyze select * from table1 inner join table2 using (key1,
> key2);
>                                                                  QUERY
> PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------

>
> Merge Join  (cost=0.00..94916.58 rows=39560 width=44) (actual
> time=0.103..7105.960 rows=1630788 loops=1)
>   Merge Cond: ((table1.key1 = table2.key1) AND (table1.key2 =
> table2.key2))
>   ->  Index Scan using table1_pkey on table1  (cost=0.00..22677.65
> rows=896576 width=20) (actual time=0.038..689.059 rows=896576 loops=1)
>   ->  Index Scan using table2_pkey on table2  (cost=0.00..59213.16
> rows=1630788 width=44) (actual time=0.049..1108.220 rows=1630788 loops=1)
> Total runtime: 7525.492 ms
> (5 rows)
>
> You can also find the query plan at
> http://explain-analyze.info/query_plans/2648-query-plan-1371.
>
> What can I do to make the query planner realize that the join will
> have 1630788 rows? This join is part of a view which I then use in
> other joins and this wrong assumption leads to really bad performance.
I have not yet found any solution. My queries still take several minutes
to complete. :-(
No ideas at all?

Regards,
    Christian

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer


Re: Query planner and foreign key constraints

От
Christian Schröder
Дата:
Christian Schröder wrote:
> When I join both tables using key1 and key2 there will be exactly
> 1630788 rows because for each row in table2 there *must* exist a row
> in table1. But the query planner doesn't think so:
>
> # explain analyze select * from table1 inner join table2 using (key1,
> key2);
>                                                                  QUERY
> PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------

>
> Merge Join  (cost=0.00..94916.58 rows=39560 width=44) (actual
> time=0.103..7105.960 rows=1630788 loops=1)
>   Merge Cond: ((table1.key1 = table2.key1) AND (table1.key2 =
> table2.key2))
>   ->  Index Scan using table1_pkey on table1  (cost=0.00..22677.65
> rows=896576 width=20) (actual time=0.038..689.059 rows=896576 loops=1)
>   ->  Index Scan using table2_pkey on table2  (cost=0.00..59213.16
> rows=1630788 width=44) (actual time=0.049..1108.220 rows=1630788 loops=1)
> Total runtime: 7525.492 ms
> (5 rows)
A wrong assumption about the number of rows in a join seems to be one
major cause for the long running queries that we experience. Here is
another example (part of a larger query):

->  Hash Join  (cost=18.73..369864.68 rows=33583 width=24) (actual
time=2.994..9636.586 rows=883068 loops=1)
      Hash Cond: ((b."ID_ISSUER_GROUP" = ic."ID_ISSUER_GROUP") AND
(substr((b."ISIN")::text, 1, 2) = (ic."ID_COUNTRY")::text))
      ->  Seq Scan on "Z_BASE" b  (cost=0.00..265745.99 rows=883099
width=20) (actual time=0.048..5380.554 rows=883099 loops=1)
      ->  Hash  (cost=9.89..9.89 rows=589 width=14) (actual
time=2.793..2.793 rows=589 loops=1)
            ->  Seq Scan on "ISSUER_CODES" ic  (cost=0.00..9.89 rows=589
width=14) (actual time=0.047..1.151 rows=589 loops=1)

This join has about 25 times more rows than the query planner thinks. In
my naive thinking, all further planning is simply wild guessing ...

What can I do to address this issue? I tried to create a functional
index on substr(b."ISIN", 1, 2), but execution time became even worse.
(I cancelled the query after several minutes.) Is there any way to tell
the query planner about the (non-)selectivity of the hash condition?
Would it help to increase the statistics target of one of the columns?

Regards,
    Christian

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer