Обсуждение: A plan returned by explain doesn't make sense to me

Поиск
Список
Период
Сортировка

A plan returned by explain doesn't make sense to me

От
"Nick Fankhauser"
Дата:
Hi-

I'm trying to squeeze better performance out of a query, but the explain
returns a plan that really puzzles me.

A little background is in order- the table that I think is central to my
problem is what I call an "assignment table" I'm working with court data, so
I have people and cases. A person can be involved in 0 or more cases and a
case can involve 1 or more persons. To represent this, I have the
actor_case_assignment table, which is essentially three fields- a primary
key, an actor_id (person), and a case_id. I have four indexes: One for each
id, and then a *unique* index on (actor_id, case_id) because although either
id on it's own may be duplicated many times in the assignment table, no
person should ever be assigned to a case more than once,

The problem is that I have a view using this table that runs way too slow.
It looks like this:

create view criminal_detail as
select case_data.case_id,
       case_data.court_id,
       actor_case_assignment.actor_id,
       charge.charge_count,
       charge.charge_number,
       charge.charge_type,
       charge.charge_version,
       charge.charge_class,
       charge.charge_text,
       charge.charge_qualifier,
       charge.charge_code,
       charge.charge_desc,
       charge.plea_code,
       charge.plea_desc,
       charge.charge_date,
       charge.statute_reference,
       charge.charge_status_desc,
       disposition_number,
       cr_disposition_date,
       cr_disposition_desc,
       cr_disposition_code,
       cr_disposition_type,
       cr_disposition_mode
  from (charge left outer join criminal_disposition on
         (charge.charge_id = criminal_disposition.charge_id)
       ),
        actor_case_assignment,
        case_data
        where actor_case_assignment.actor_case_assignment_id =
              charge.actor_case_assignment_id
          and actor_case_assignment.case_id = case_data.case_id
;

The explain looks like this:

develop=# explain SELECT * FROM CRIMINAL_DETAIL WHERE case_id =
'102SC01353';
NOTICE:  QUERY PLAN:

Hash Join  (cost=155.06..9263.84 rows=1 width=320)
  ->  Hash Join  (cost=120.53..6034.05 rows=79880 width=260)
        ->  Seq Scan on charge  (cost=0.00..2664.80 rows=79880 width=184)
        ->  Hash  (cost=109.82..109.82 rows=4282 width=76)
              ->  Seq Scan on criminal_disposition  (cost=0.00..109.82
rows=4282 width=76)
  ->  Hash  (cost=34.53..34.53 rows=4 width=60)
        ->  Nested Loop  (cost=0.00..34.53 rows=4 width=60)
              ->  Index Scan using case_data_case_id on case_data
(cost=0.00..4.01 rows=1 width=24)
              ->  Index Scan using actor_case_assignment_both on
actor_case_assignment  (cost=0.00..30.42 rows=7 width=36)

The part that looks wrong to me is that on the last line of the explain, it
is using the index "actor_case_assignment_both" (the index on
actor_id,case_id) even though actor_id is never referenced in this query and
and index on case_id alone is available.

Looking from the outside in, this query should be very fast- an index scan
on case_id reduces the number of matches in actor_case_assignment to about
5, and then then joins these 5 rows to other tables based on indexed primary
(unique) keys. The question is, how can I re-arrange my query to help the
planner come to the same conclusion, and most puzzling, why does the planner
choose an index that involves actor_id?

Many thanks to those of you who read through all of this! Any suggestions?

-Nick

--------------------------------------------------------------------------
Nick Fankhauser  nickf@ontko.com  Phone 1.765.935.4283  Fax 1.765.962.9788
Ray Ontko & Co.     Software Consulting Services     http://www.ontko.com/


Re: A plan returned by explain doesn't make sense to me

От
Tom Lane
Дата:
"Nick Fankhauser" <nickf@ontko.com> writes:
> The part that looks wrong to me is that on the last line of the explain, it
> is using the index "actor_case_assignment_both" (the index on
> actor_id,case_id) even though actor_id is never referenced in this query and
> and index on case_id alone is available.

That seems strange to me also, particularly if the index column ordering
is indeed actor_id,case_id and not the other way round --- without some
constraint on actor_id it shouldn't be *possible* to use such an index.
Could you send me a schema dump (pg_dump -s) for experimentation?
Also, which PG version are you using?

            regards, tom lane

Re: A plan returned by explain doesn't make sense to me

От
"Nick Fankhauser"
Дата:
> That seems strange to me also, particularly if the index column ordering
> is indeed actor_id,case_id and not the other way round

Tom-

Actually, it *is* the other way around- I didn't realize that could make a
difference. Here's the line that creates it:

create unique index actor_case_assignment_both on
actor_case_assignment(case_id,actor_id);

I reversed the order, and now the explain looks more like I expected:

develop=# explain SELECT * FROM CRIMINAL_DETAIL WHERE case_id = '102SC01353'
ORDER BY CHARGE_COUNT,CHARGE_NUMBER;
NOTICE:  QUERY PLAN:

Sort  (cost=9263.85..9263.85 rows=1 width=308)
  ->  Hash Join  (cost=155.06..9263.84 rows=1 width=308)
        ->  Hash Join  (cost=120.53..6034.05 rows=79880 width=260)
              ->  Seq Scan on charge  (cost=0.00..2664.80 rows=79880
width=184)
              ->  Hash  (cost=109.82..109.82 rows=4282 width=76)
                    ->  Seq Scan on criminal_disposition  (cost=0.00..109.82
rows=4282 width=76)
        ->  Hash  (cost=34.53..34.53 rows=4 width=48)
              ->  Nested Loop  (cost=0.00..34.53 rows=4 width=48)
                    ->  Index Scan using case_data_case_id on case_data
(cost=0.00..4.01 rows=1 width=24)
                    ->  Index Scan using actor_case_assignment_case_id on
actor_case_assignment  (cost=0.00..30.42 rows=7 width=24)

I think this solves my immediate problem, but it seems like even with the
reversed order, the planner shouldn't have chosen the combined index to
drive the query, so I'm happy for now, but I fear that I've added a task to
someone's list in the developer's enclave.

I'll attach a schema dump to an off-list email to to you. Although we're not
incredibly proud of it, I think the company would consider it proprietary &
not to be posted publicly.

I'm running postgresql v7.1.3 on Debian

Thanks for the help, & let me know if there is any other info I can pass
along to help figure out what is happening.

-Nick



Re: A plan returned by explain doesn't make sense to me

От
Tom Lane
Дата:
"Nick Fankhauser" <nickf@ontko.com> writes:
>> That seems strange to me also, particularly if the index column ordering
>> is indeed actor_id,case_id and not the other way round

> Actually, it *is* the other way around- I didn't realize that could make a
> difference.

Ah.  Okay, that eliminates the apparent impossibility.

> I think this solves my immediate problem, but it seems like even with the
> reversed order, the planner shouldn't have chosen the combined index to
> drive the query, so I'm happy for now, but I fear that I've added a task to
> someone's list in the developer's enclave.

The only reason the planner should choose a single-column index over
using the first column of a multi-column index is that the latter index
is likely to be physically larger and thus require more I/O to access.
So, there's no penalty in the cost calculations other than the
number-of-blocks-of-I/O estimated from the physical index size.  It
would be interesting to see the reltuples and relpages stats from
pg_class for your single- and multi-column indexes.  (However, it may be
too late now since you blew away and rebuilt the multi-column index;
we can't be entirely sure what stats the planner was seeing before.)

It's actually a standard recommendation that you not bother with an
index on a single column x if you also have one on (x,y).  Unless the
table is almost entirely read-only, the cost of updating two indexes
outweighs the I/O savings of using the smaller index for queries that
use only x.

            regards, tom lane

Re: A plan returned by explain doesn't make sense to me

От
"Nick Fankhauser"
Дата:
Tom Lane wrote:
> The only reason the planner should choose a single-column index over
> using the first column of a multi-column index is that the latter index
> is likely to be physically larger and thus require more I/O to access.
> So, there's no penalty in the cost calculations other than the
> number-of-blocks-of-I/O estimated from the physical index size.

So is a multi-column index really just two separate indexes with a
constraint added if necessary? I guess I had an idea in my head that it
would be something like an index on the concatenation of the two fields.

> It
> would be interesting to see the reltuples and relpages stats from
> pg_class for your single- and multi-column indexes.

It's easy to reverse the process. How would I get those stats?


> It's actually a standard recommendation that you not bother with an
> index on a single column x if you also have one on (x,y).

Thanks- that will make my app a bit more efficient. (But now I've got to go
back & work on tuning my query again because this apparently wasn't the
source of the poor performance.)

-Nick


Re: A plan returned by explain doesn't make sense to me

От
Tom Lane
Дата:
"Nick Fankhauser" <nickf@ontko.com> writes:
> So is a multi-column index really just two separate indexes with a
> constraint added if necessary? I guess I had an idea in my head that it
> would be something like an index on the concatenation of the two fields.

Well, it *is* an index on the concatenation of the fields --- but you
can perfectly well use that to search for values of the first field.
(You can't use it to search for values of the second field without
the first, however.)

            regards, tom lane