Обсуждение: bizarre query performance question

Поиск
Список
Период
Сортировка

bizarre query performance question

От
"H. William Connors II"
Дата:
I have two fairly simple tables as described below.  The relationship
between them is through assignment_id.  The problem is when I try to
join these two tables the planner does a sequential scan on
fa_assignment_detail and the query takes forever to resolve.  I've run
the usual vacuum and analyze commands with no changes.  I'm not sure how
long the query actually takes to resolve as its been running for over 30
minutes now (FYI this is on a 8 core IBM Power5 550 with 8 GB of RAM)
running RedHat Enterprise 9 and postgresql 8.3.3.  Any thoughts?

\d fa_assignment
                      Table "public.fa_assignment"
     Column      |            Type             |       Modifiers
-----------------+-----------------------------+------------------------
 scenario_id     | integer                     | not null
 prospect_id     | integer                     | not null
 assignment_id   | integer                     | not null
 valid           | boolean                     | not null default false
 modified        | boolean                     | not null default true
 modify_ts       | timestamp without time zone |
 modify_username | character varying(32)       |
Indexes:
    "pk_fa_assignment" PRIMARY KEY, btree (scenario_id, prospect_id)
    "fa_assignment_idx1" btree (assignment_id) CLUSTER
    "fa_assignment_idx2" btree (scenario_id, assignment_id)
    "fa_assignment_idx3" btree (prospect_id)
Foreign-key constraints:
    "fk_fa_prospect" FOREIGN KEY (prospect_id) REFERENCES
fa_prospect(prospect_id) DEFERRABLE
    "fk_fa_scenario" FOREIGN KEY (scenario_id) REFERENCES
fa_scenario(scenario_id) DEFERRABLE



\d fa_assignment_detail
                  Table "public.fa_assignment_detail"
     Column      |            Type             |       Modifiers
-----------------+-----------------------------+------------------------
 assignment_id   | integer                     | not null
 type            | character varying(8)        | not null
 resource_id     | integer                     |
 create_ts       | timestamp without time zone | not null
 create_username | character varying(32)       | not null
 modify_ts       | timestamp without time zone |
 modify_username | character varying(32)       |
 locked          | boolean                     | not null default false
 locked_ts       | timestamp without time zone |
 locked_username | character varying(32)       |
Indexes:
    "pk_fa_assignment_detail" PRIMARY KEY, btree (assignment_id, type)
    "fa_assignment_detail_idx1" btree (resource_id)
    "fa_assignment_detail_idx2" btree (assignment_id)
Foreign-key constraints:
    "fk_fa_resource1" FOREIGN KEY (resource_id) REFERENCES
fa_resource(resource_id) DEFERRABLE



fa_assignment has 44184945 records
fa_assignment_detail has 82196027 records



explain select * from fa_assignment fa JOIN fa_assignment_detail fad ON
(fad.assignment_id = fa.assignment_id) where fa.scenario_id = 0;

                                              QUERY
PLAN
-------------------------------------------------------------------------------------------------------
 Hash Join  (cost=581289.72..4940729.76 rows=9283104 width=91)
   Hash Cond: (fad.assignment_id = fa.assignment_id)
   ->  Seq Scan on fa_assignment_detail fad  (cost=0.00..1748663.60
rows=82151360 width=61)
   ->  Hash  (cost=484697.74..484697.74 rows=4995439 width=30)
         ->  Bitmap Heap Scan on fa_assignment fa
(cost=93483.75..484697.74 rows=4995439 width=30)
               Recheck Cond: (scenario_id = 0)
               ->  Bitmap Index Scan on fa_assignment_idx2
(cost=0.00..92234.89 rows=4995439 width=0)
                     Index Cond: (scenario_id = 0)
(8 rows)


Re: bizarre query performance question

От
Lennin Caro
Дата:


--- On Wed, 10/1/08, H. William Connors II <bconnors@rochgrp.com> wrote:

> From: H. William Connors II <bconnors@rochgrp.com>
> Subject: [PERFORM] bizarre query performance question
> To: pgsql-performance@postgresql.org
> Date: Wednesday, October 1, 2008, 8:34 PM
> I have two fairly simple tables as described below.  The
> relationship
> between them is through assignment_id.  The problem is when
> I try to
> join these two tables the planner does a sequential scan on
>
> fa_assignment_detail and the query takes forever to
> resolve.  I've run
> the usual vacuum and analyze commands with no changes.
> I'm not sure how
> long the query actually takes to resolve as its been
> running for over 30
> minutes now (FYI this is on a 8 core IBM Power5 550 with 8
> GB of RAM)
> running RedHat Enterprise 9 and postgresql 8.3.3.  Any
> thoughts?
>
> \d fa_assignment
>                       Table
> "public.fa_assignment"
>      Column      |            Type             |
> Modifiers
> -----------------+-----------------------------+------------------------
>  scenario_id     | integer                     | not null
>  prospect_id     | integer                     | not null
>  assignment_id   | integer                     | not null
>  valid           | boolean                     | not null
> default false
>  modified        | boolean                     | not null
> default true
>  modify_ts       | timestamp without time zone |
>  modify_username | character varying(32)       |
> Indexes:
>     "pk_fa_assignment" PRIMARY KEY, btree
> (scenario_id, prospect_id)
>     "fa_assignment_idx1" btree (assignment_id)
> CLUSTER
>     "fa_assignment_idx2" btree (scenario_id,
> assignment_id)
>     "fa_assignment_idx3" btree (prospect_id)
> Foreign-key constraints:
>     "fk_fa_prospect" FOREIGN KEY (prospect_id)
> REFERENCES
> fa_prospect(prospect_id) DEFERRABLE
>     "fk_fa_scenario" FOREIGN KEY (scenario_id)
> REFERENCES
> fa_scenario(scenario_id) DEFERRABLE
>
>
>
> \d fa_assignment_detail
>                   Table
> "public.fa_assignment_detail"
>      Column      |            Type             |
> Modifiers
> -----------------+-----------------------------+------------------------
>  assignment_id   | integer                     | not null
>  type            | character varying(8)        | not null
>  resource_id     | integer                     |
>  create_ts       | timestamp without time zone | not null
>  create_username | character varying(32)       | not null
>  modify_ts       | timestamp without time zone |
>  modify_username | character varying(32)       |
>  locked          | boolean                     | not null
> default false
>  locked_ts       | timestamp without time zone |
>  locked_username | character varying(32)       |
> Indexes:
>     "pk_fa_assignment_detail" PRIMARY KEY, btree
> (assignment_id, type)
>     "fa_assignment_detail_idx1" btree
> (resource_id)
>     "fa_assignment_detail_idx2" btree
> (assignment_id)
> Foreign-key constraints:
>     "fk_fa_resource1" FOREIGN KEY (resource_id)
> REFERENCES
> fa_resource(resource_id) DEFERRABLE
>
>
>
> fa_assignment has 44184945 records
> fa_assignment_detail has 82196027 records
>
>
>
> explain select * from fa_assignment fa JOIN
> fa_assignment_detail fad ON
> (fad.assignment_id = fa.assignment_id) where fa.scenario_id
> = 0;
>
>                                               QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=581289.72..4940729.76 rows=9283104
> width=91)
>    Hash Cond: (fad.assignment_id = fa.assignment_id)
>    ->  Seq Scan on fa_assignment_detail fad
> (cost=0.00..1748663.60
> rows=82151360 width=61)
>    ->  Hash  (cost=484697.74..484697.74 rows=4995439
> width=30)
>          ->  Bitmap Heap Scan on fa_assignment fa
> (cost=93483.75..484697.74 rows=4995439 width=30)
>                Recheck Cond: (scenario_id = 0)
>                ->  Bitmap Index Scan on
> fa_assignment_idx2
> (cost=0.00..92234.89 rows=4995439 width=0)
>                      Index Cond: (scenario_id = 0)
> (8 rows)
>
>

The Fk for the table fa_assignment_detail to fa_assignment is nor relationate whit the column assignment_id





Re: bizarre query performance question

От
"H. William Connors II"
Дата:
Lennin Caro wrote:
>
> --- On Wed, 10/1/08, H. William Connors II <bconnors@rochgrp.com> wrote:
>
>
>> From: H. William Connors II <bconnors@rochgrp.com>
>> Subject: [PERFORM] bizarre query performance question
>> To: pgsql-performance@postgresql.org
>> Date: Wednesday, October 1, 2008, 8:34 PM
>> I have two fairly simple tables as described below.  The
>> relationship
>> between them is through assignment_id.  The problem is when
>> I try to
>> join these two tables the planner does a sequential scan on
>>
>> fa_assignment_detail and the query takes forever to
>> resolve.  I've run
>> the usual vacuum and analyze commands with no changes.
>> I'm not sure how
>> long the query actually takes to resolve as its been
>> running for over 30
>> minutes now (FYI this is on a 8 core IBM Power5 550 with 8
>> GB of RAM)
>> running RedHat Enterprise 9 and postgresql 8.3.3.  Any
>> thoughts?
>>
>> \d fa_assignment
>>                       Table
>> "public.fa_assignment"
>>      Column      |            Type             |
>> Modifiers
>> -----------------+-----------------------------+------------------------
>>  scenario_id     | integer                     | not null
>>  prospect_id     | integer                     | not null
>>  assignment_id   | integer                     | not null
>>  valid           | boolean                     | not null
>> default false
>>  modified        | boolean                     | not null
>> default true
>>  modify_ts       | timestamp without time zone |
>>  modify_username | character varying(32)       |
>> Indexes:
>>     "pk_fa_assignment" PRIMARY KEY, btree
>> (scenario_id, prospect_id)
>>     "fa_assignment_idx1" btree (assignment_id)
>> CLUSTER
>>     "fa_assignment_idx2" btree (scenario_id,
>> assignment_id)
>>     "fa_assignment_idx3" btree (prospect_id)
>> Foreign-key constraints:
>>     "fk_fa_prospect" FOREIGN KEY (prospect_id)
>> REFERENCES
>> fa_prospect(prospect_id) DEFERRABLE
>>     "fk_fa_scenario" FOREIGN KEY (scenario_id)
>> REFERENCES
>> fa_scenario(scenario_id) DEFERRABLE
>>
>>
>>
>> \d fa_assignment_detail
>>                   Table
>> "public.fa_assignment_detail"
>>      Column      |            Type             |
>> Modifiers
>> -----------------+-----------------------------+------------------------
>>  assignment_id   | integer                     | not null
>>  type            | character varying(8)        | not null
>>  resource_id     | integer                     |
>>  create_ts       | timestamp without time zone | not null
>>  create_username | character varying(32)       | not null
>>  modify_ts       | timestamp without time zone |
>>  modify_username | character varying(32)       |
>>  locked          | boolean                     | not null
>> default false
>>  locked_ts       | timestamp without time zone |
>>  locked_username | character varying(32)       |
>> Indexes:
>>     "pk_fa_assignment_detail" PRIMARY KEY, btree
>> (assignment_id, type)
>>     "fa_assignment_detail_idx1" btree
>> (resource_id)
>>     "fa_assignment_detail_idx2" btree
>> (assignment_id)
>> Foreign-key constraints:
>>     "fk_fa_resource1" FOREIGN KEY (resource_id)
>> REFERENCES
>> fa_resource(resource_id) DEFERRABLE
>>
>>
>>
>> fa_assignment has 44184945 records
>> fa_assignment_detail has 82196027 records
>>
>>
>>
>> explain select * from fa_assignment fa JOIN
>> fa_assignment_detail fad ON
>> (fad.assignment_id = fa.assignment_id) where fa.scenario_id
>> = 0;
>>
>>                                               QUERY
>> PLAN
>> -------------------------------------------------------------------------------------------------------
>>  Hash Join  (cost=581289.72..4940729.76 rows=9283104
>> width=91)
>>    Hash Cond: (fad.assignment_id = fa.assignment_id)
>>    ->  Seq Scan on fa_assignment_detail fad
>> (cost=0.00..1748663.60
>> rows=82151360 width=61)
>>    ->  Hash  (cost=484697.74..484697.74 rows=4995439
>> width=30)
>>          ->  Bitmap Heap Scan on fa_assignment fa
>> (cost=93483.75..484697.74 rows=4995439 width=30)
>>                Recheck Cond: (scenario_id = 0)
>>                ->  Bitmap Index Scan on
>> fa_assignment_idx2
>> (cost=0.00..92234.89 rows=4995439 width=0)
>>                      Index Cond: (scenario_id = 0)
>> (8 rows)
>>
>>
>>
>
> The Fk for the table fa_assignment_detail to fa_assignment is nor relationate whit the column assignment_id
>
>
>
>
>
>
That is because assignment_id is because there can be many records in
fa_assignment that use the same assignment_id and thus it isn't unique
there.  I can join other tables not related through a foreign key using
an index so I'm unclear why this situation is different.


Re: bizarre query performance question

От
Richard Huxton
Дата:
H. William Connors II wrote:
> fa_assignment has 44184945 records
> fa_assignment_detail has 82196027 records
>
> explain select * from fa_assignment fa JOIN fa_assignment_detail fad ON
> (fad.assignment_id = fa.assignment_id) where fa.scenario_id = 0;
>
>                                              QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------
>
> Hash Join  (cost=581289.72..4940729.76 rows=9283104 width=91)

Are you really expecting 9 million rows in the result? If so, this is
probably a reasonable plan.

>   Hash Cond: (fad.assignment_id = fa.assignment_id)
>   ->  Seq Scan on fa_assignment_detail fad  (cost=0.00..1748663.60
> rows=82151360 width=61)
>   ->  Hash  (cost=484697.74..484697.74 rows=4995439 width=30)
>         ->  Bitmap Heap Scan on fa_assignment fa
> (cost=93483.75..484697.74 rows=4995439 width=30)
>               Recheck Cond: (scenario_id = 0)
>               ->  Bitmap Index Scan on fa_assignment_idx2
> (cost=0.00..92234.89 rows=4995439 width=0)
>                     Index Cond: (scenario_id = 0)

It's restricting on scenario_id, building a bitmap to identify which
disk-blocks will contain one or more matching rows and then scanning
those. If those 5 million scenario_id=0 rows are spread over 10% of the
blocks then that's a good idea.

If it was expecting only a handful of rows with scenario_id=0 then I'd
expect it to switch to a "standard" index scan.

If your work_mem is small try something like:
  set work_mem = '50MB';
before running the query - maybe even larger.

--
  Richard Huxton
  Archonet Ltd