Обсуждение: Planner very slow on same query to slightly different tables

Поиск
Список
Период
Сортировка

Planner very slow on same query to slightly different tables

От
reina@nsi.edu (Tony Reina)
Дата:
I'm using Postgres 7.2.1 on a dual-Athlon running RedHat 7.3bigmem
with 2 Gig of RAM and a 240 Gig RAID 5 (3ware IDE RAID). I just did a
'vacuum analyze' on the database, however the same query to two
similar tables is coming up quite different. The two tables only
differ in that one ("center_out_cell") has an extra int2 field called
"target" which can take up to 8 different values.

Here are the queries:

db02=# explain select distinct area from center_out_cell where subject
= 'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1;
NOTICE:  QUERY PLAN:

Unique  (cost=87795.47..87795.80 rows=13 width=5) ->  Sort  (cost=87795.47..87795.47 rows=131 width=5)       ->  Seq
Scanon center_out_cell  (cost=0.00..87790.87 rows=131
 
width=5)

EXPLAIN
db02=# explain select distinct area from circles_cell where subject =
'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1;
NOTICE:  QUERY PLAN:

Unique  (cost=258.36..258.52 rows=6 width=5) ->  Sort  (cost=258.36..258.36 rows=64 width=5)       ->  Index Scan using
pk1circles_cellon circles_cell 
 
(cost=0.00..256.43 rows=64 width=5)

EXPLAIN


Here are the definitions for the 2 tables:

db02=# \d center_out_cell          Table "center_out_cell"  Column   |        Type        | Modifiers
------------+--------------------+-----------subject    | text               |arm        | character(1)       |target
 | smallint           |rep        | integer            |direction  | smallint           |success    | smallint
|hemisphere | character(1)       |area       | text               |filenumber | integer            |dsp_chan   | text
           |num_spikes | integer            |spike_data | double precision[] |
 
Unique keys: pk0center_out_cell,            pk1center_out_cell

where: 
db02=# \d pk1center_out_cell
Index "pk1center_out_cell"  Column   |     Type
------------+--------------subject    | textarm        | character(1)target     | smallintrep        |
integerhemisphere| character(1)area       | textfilenumber | integerdsp_chan   | textdirection  | smallint
 
unique btree
Index predicate: (success = 1)


and
db02=# \d pk0center_out_cell
Index "pk0center_out_cell"  Column   |     Type
------------+--------------subject    | textarm        | character(1)target     | smallintrep        |
integerhemisphere| character(1)area       | textfilenumber | integerdsp_chan   | textdirection  | smallint
 
unique btree
Index predicate: (success = 0)


db02=# \d circles_cell           Table "circles_cell"  Column   |        Type        | Modifiers
------------+--------------------+-----------subject    | text               |arm        | character(1)       |rep
 | integer            |direction  | smallint           |success    | smallint           |hemisphere | character(1)
|area       | text               |filenumber | integer            |dsp_chan   | text               |num_spikes |
integer           |spike_data | double precision[] |
 
Unique keys: pk0circles_cell,            pk1circles_cell

where:

db02=# \d pk1circles_cell Index "pk1circles_cell"  Column   |     Type
------------+--------------subject    | textarm        | character(1)rep        | integerdirection  |
smallinthemisphere| character(1)area       | textfilenumber | integerdsp_chan   | text
 
unique btree
Index predicate: (success = 1)

db02=# \d pk0circles_cell Index "pk0circles_cell"  Column   |     Type
------------+--------------subject    | textarm        | character(1)rep        | integerdirection  |
smallinthemisphere| character(1)area       | textfilenumber | integerdsp_chan   | text
 
unique btree
Index predicate: (success = 0)


Now I know that, due to the extra field "target", "center_out_cell"
can be as large as 8 times "circles_cell", but according to the cost
of the planner, the statement is 340 times more costly.  I think this
is because the planner is using the index in the circles_cell case and
not in the center_out_cell case. However, I don't pretend to
understand the intricasies of the planner to make an intelligent
guess. I've been trying random changes to postgresql.conf like
increasing the shared memory size, changing the random_page_cost size,
etc., but would like some help in trying to speed things up.

Here are some relevant settings from my postgresql.conf (made in an
attempt to max out buffers):

shared_buffers = 9000      # 2*max_connections, min 16 
wal_buffers = 32            # min 4
sort_mem = 64000         # min 32 
vacuum_mem = 16384          # min 1024
wal_files = 32
effective_cache_size = 1000  # default in 8k pages


Thanks in advance.
-Tony


Re: Planner very slow on same query to slightly different tables

От
Tom Lane
Дата:
reina@nsi.edu (Tony Reina) writes:
> db02=# explain select distinct area from center_out_cell where subject
> = 'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1;
> NOTICE:  QUERY PLAN:

> Unique  (cost=87795.47..87795.80 rows=13 width=5)
>   ->  Sort  (cost=87795.47..87795.47 rows=131 width=5)
>         ->  Seq Scan on center_out_cell  (cost=0.00..87790.87 rows=131
> width=5)

> Index "pk1center_out_cell"
>    Column   |     Type
> ------------+--------------
>  subject    | text
>  arm        | character(1)
>  target     | smallint
>  rep        | integer
>  hemisphere | character(1)
>  area       | text
>  filenumber | integer
>  dsp_chan   | text
>  direction  | smallint
> unique btree
> Index predicate: (success = 1)

I imagine the problem with this index is that there's no constraint for
"target" in the query; so the planner could only use the first two index
columns (subject and arm), which probably isn't very selective.  The
index used in the other query is defined differently:

> db02=# \d pk1circles_cell
>   Index "pk1circles_cell"
>    Column   |     Type
> ------------+--------------
>  subject    | text
>  arm        | character(1)
>  rep        | integer
>  direction  | smallint
>  hemisphere | character(1)
>  area       | text
>  filenumber | integer
>  dsp_chan   | text
> unique btree
> Index predicate: (success = 1)

This allows "rep" to be used in the indexscan too (and if you were to
cast properly, viz "direction = 1::smallint", then that column could be
used as well).
        regards, tom lane


Re: Planner very slow on same query to slightly

От
Tony Reina
Дата:
If I understand correctly, I tried specifying the target and even casting 
all of the smallint's, but it still is a slow estimate. Perhaps, this is 
just due to a large amount of data, but my gut is telling me that I have 
something wrong here.


db02=# explain select distinct area from center_out_cell where subject = 
'M' and arm = 'R' and rep = 10 and success = 1::smallint and direction = 
1::smallint and target = 3::smallint;
NOTICE:  QUERY PLAN:
Unique  (cost=100105115.88..100105115.93 rows=2 width=5)  ->  Sort  (cost=100105115.88..100105115.88 rows=19 width=5)
    ->  Seq Scan on center_out_cell  (cost=100000000.00..100105115.47 
 
rows=19 width=5)
EXPLAIN
db02=# explain select distinct area from center_out_cell where subject = 
'M' and arm = 'R' and rep = 10::int and success = 1::smallint and direction 
= 1::smallint and target = 3::smallint;
NOTICE:  QUERY PLAN:
Unique  (cost=100105115.88..100105115.93 rows=2 width=5)  ->  Sort  (cost=100105115.88..100105115.88 rows=19 width=5)
    ->  Seq Scan on center_out_cell  (cost=100000000.00..100105115.47 
 
rows=19 width=5)
EXPLAIN
db02=#


-Tony






At 09:47 PM 7/17/02 -0400, Tom Lane wrote:
>reina@nsi.edu (Tony Reina) writes:
> > db02=# explain select distinct area from center_out_cell where subject
> > = 'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1;
> > NOTICE:  QUERY PLAN:
>
> > Unique  (cost=87795.47..87795.80 rows=13 width=5)
> >   ->  Sort  (cost=87795.47..87795.47 rows=131 width=5)
> >         ->  Seq Scan on center_out_cell  (cost=0.00..87790.87 rows=131
> > width=5)
>
> > Index "pk1center_out_cell"
> >    Column   |     Type
> > ------------+--------------
> >  subject    | text
> >  arm        | character(1)
> >  target     | smallint
> >  rep        | integer
> >  hemisphere | character(1)
> >  area       | text
> >  filenumber | integer
> >  dsp_chan   | text
> >  direction  | smallint
> > unique btree
> > Index predicate: (success = 1)
>
>I imagine the problem with this index is that there's no constraint for
>"target" in the query; so the planner could only use the first two index
>columns (subject and arm), which probably isn't very selective.  The
>index used in the other query is defined differently:
>
> > db02=# \d pk1circles_cell
> >   Index "pk1circles_cell"
> >    Column   |     Type
> > ------------+--------------
> >  subject    | text
> >  arm        | character(1)
> >  rep        | integer
> >  direction  | smallint
> >  hemisphere | character(1)
> >  area       | text
> >  filenumber | integer
> >  dsp_chan   | text
> > unique btree
> > Index predicate: (success = 1)
>
>This allows "rep" to be used in the indexscan too (and if you were to
>cast properly, viz "direction = 1::smallint", then that column could be
>used as well).
>
>                         regards, tom lane



Inheritance a burden?

От
reina@nsi.edu (Tony Reina)
Дата:
As a followup to my slow query plans:

I've experimented with removing the inheritance schema to see how it
affects my database calls. Originally, because several fields of the
primary key (subject, arm, rep, direction, success) were common to
every table, I made those fields a separate table and had subsequent
tables inherit the fields. Just out of curiosity, I built a new
database with the same data but didn't use the inheritance (i.e. each
table had its own copy of those common fields). It looks like about a
20% increase in execution speed when I run my programs side by side.
I'm not sure if that kind of performance hit should be expected.
Anyone have an idea about this?

-Tony


reina@nsi.edu (Tony Reina) wrote in message news:<5.1.1.6.0.20020718095319.009ecec0@schubert.nsi.edu>...
> If I understand correctly, I tried specifying the target and even casting 
> all of the smallint's, but it still is a slow estimate. Perhaps, this is 
> just due to a large amount of data, but my gut is telling me that I have 
> something wrong here.
>


Re: Inheritance a burden?

От
Curt Sampson
Дата:
On 19 Jul 2002, Tony Reina wrote:

> Just out of curiosity, I built a new
> database with the same data but didn't use the inheritance (i.e. each
> table had its own copy of those common fields). It looks like about a
> 20% increase in execution speed when I run my programs side by side.

Have you tried it using the standard relational method of doing this?
(I.e., you put the common fields in one table, and the extra fields in
other tables, along with a foreign key relating the extra fields back
to the main table.) That would more accurately replacate what you were
doing with inheritance.

I have a suspicion, in fact, that inheritance may just be syntatic sugar
for doing this and adding a couple of views. :-)

Anyway, it could be that by denormalizing the data (copying it to the
other tables), you reduced the number of joins you do, and so you got a
performance increase.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



Re: Inheritance a burden?

От
Tony Reina
Дата:
At 02:17 PM 7/20/02 +0900, Curt Sampson wrote:

>Have you tried it using the standard relational method of doing this?
>(I.e., you put the common fields in one table, and the extra fields in
>other tables, along with a foreign key relating the extra fields back
>to the main table.) That would more accurately replacate what you were
>doing with inheritance.
>
>I have a suspicion, in fact, that inheritance may just be syntatic sugar
>for doing this and adding a couple of views. :-)

Yes, I thought this was the case too. I haven't specifically setup foreign 
keys, but I was under the impression that the "INHERITS" command would do this.

>Anyway, it could be that by denormalizing the data (copying it to the
>other tables), you reduced the number of joins you do, and so you got a
>performance increase.

Yes, I guess this is probably the case although it speaks against 
normalizing too much. I guess too much of a good thing is bad.

-Tony