Обсуждение: Planner very slow on same query to slightly different tables
I'm using Postgres 7.2.1 on a dual-Athlon running RedHat 7.3bigmem with 2 Gig of RAM and a 240 Gig RAID 5 (3ware IDE RAID). I just did a 'vacuum analyze' on the database, however the same query to two similar tables is coming up quite different. The two tables only differ in that one ("center_out_cell") has an extra int2 field called "target" which can take up to 8 different values. Here are the queries: db02=# explain select distinct area from center_out_cell where subject = 'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1; NOTICE: QUERY PLAN: Unique (cost=87795.47..87795.80 rows=13 width=5) -> Sort (cost=87795.47..87795.47 rows=131 width=5) -> Seq Scanon center_out_cell (cost=0.00..87790.87 rows=131 width=5) EXPLAIN db02=# explain select distinct area from circles_cell where subject = 'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1; NOTICE: QUERY PLAN: Unique (cost=258.36..258.52 rows=6 width=5) -> Sort (cost=258.36..258.36 rows=64 width=5) -> Index Scan using pk1circles_cellon circles_cell (cost=0.00..256.43 rows=64 width=5) EXPLAIN Here are the definitions for the 2 tables: db02=# \d center_out_cell Table "center_out_cell" Column | Type | Modifiers ------------+--------------------+-----------subject | text |arm | character(1) |target | smallint |rep | integer |direction | smallint |success | smallint |hemisphere | character(1) |area | text |filenumber | integer |dsp_chan | text |num_spikes | integer |spike_data | double precision[] | Unique keys: pk0center_out_cell, pk1center_out_cell where: db02=# \d pk1center_out_cell Index "pk1center_out_cell" Column | Type ------------+--------------subject | textarm | character(1)target | smallintrep | integerhemisphere| character(1)area | textfilenumber | integerdsp_chan | textdirection | smallint unique btree Index predicate: (success = 1) and db02=# \d pk0center_out_cell Index "pk0center_out_cell" Column | Type ------------+--------------subject | textarm | character(1)target | smallintrep | integerhemisphere| character(1)area | textfilenumber | integerdsp_chan | textdirection | smallint unique btree Index predicate: (success = 0) db02=# \d circles_cell Table "circles_cell" Column | Type | Modifiers ------------+--------------------+-----------subject | text |arm | character(1) |rep | integer |direction | smallint |success | smallint |hemisphere | character(1) |area | text |filenumber | integer |dsp_chan | text |num_spikes | integer |spike_data | double precision[] | Unique keys: pk0circles_cell, pk1circles_cell where: db02=# \d pk1circles_cell Index "pk1circles_cell" Column | Type ------------+--------------subject | textarm | character(1)rep | integerdirection | smallinthemisphere| character(1)area | textfilenumber | integerdsp_chan | text unique btree Index predicate: (success = 1) db02=# \d pk0circles_cell Index "pk0circles_cell" Column | Type ------------+--------------subject | textarm | character(1)rep | integerdirection | smallinthemisphere| character(1)area | textfilenumber | integerdsp_chan | text unique btree Index predicate: (success = 0) Now I know that, due to the extra field "target", "center_out_cell" can be as large as 8 times "circles_cell", but according to the cost of the planner, the statement is 340 times more costly. I think this is because the planner is using the index in the circles_cell case and not in the center_out_cell case. However, I don't pretend to understand the intricasies of the planner to make an intelligent guess. I've been trying random changes to postgresql.conf like increasing the shared memory size, changing the random_page_cost size, etc., but would like some help in trying to speed things up. Here are some relevant settings from my postgresql.conf (made in an attempt to max out buffers): shared_buffers = 9000 # 2*max_connections, min 16 wal_buffers = 32 # min 4 sort_mem = 64000 # min 32 vacuum_mem = 16384 # min 1024 wal_files = 32 effective_cache_size = 1000 # default in 8k pages Thanks in advance. -Tony
reina@nsi.edu (Tony Reina) writes: > db02=# explain select distinct area from center_out_cell where subject > = 'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1; > NOTICE: QUERY PLAN: > Unique (cost=87795.47..87795.80 rows=13 width=5) > -> Sort (cost=87795.47..87795.47 rows=131 width=5) > -> Seq Scan on center_out_cell (cost=0.00..87790.87 rows=131 > width=5) > Index "pk1center_out_cell" > Column | Type > ------------+-------------- > subject | text > arm | character(1) > target | smallint > rep | integer > hemisphere | character(1) > area | text > filenumber | integer > dsp_chan | text > direction | smallint > unique btree > Index predicate: (success = 1) I imagine the problem with this index is that there's no constraint for "target" in the query; so the planner could only use the first two index columns (subject and arm), which probably isn't very selective. The index used in the other query is defined differently: > db02=# \d pk1circles_cell > Index "pk1circles_cell" > Column | Type > ------------+-------------- > subject | text > arm | character(1) > rep | integer > direction | smallint > hemisphere | character(1) > area | text > filenumber | integer > dsp_chan | text > unique btree > Index predicate: (success = 1) This allows "rep" to be used in the indexscan too (and if you were to cast properly, viz "direction = 1::smallint", then that column could be used as well). regards, tom lane
If I understand correctly, I tried specifying the target and even casting all of the smallint's, but it still is a slow estimate. Perhaps, this is just due to a large amount of data, but my gut is telling me that I have something wrong here. db02=# explain select distinct area from center_out_cell where subject = 'M' and arm = 'R' and rep = 10 and success = 1::smallint and direction = 1::smallint and target = 3::smallint; NOTICE: QUERY PLAN: Unique (cost=100105115.88..100105115.93 rows=2 width=5) -> Sort (cost=100105115.88..100105115.88 rows=19 width=5) -> Seq Scan on center_out_cell (cost=100000000.00..100105115.47 rows=19 width=5) EXPLAIN db02=# explain select distinct area from center_out_cell where subject = 'M' and arm = 'R' and rep = 10::int and success = 1::smallint and direction = 1::smallint and target = 3::smallint; NOTICE: QUERY PLAN: Unique (cost=100105115.88..100105115.93 rows=2 width=5) -> Sort (cost=100105115.88..100105115.88 rows=19 width=5) -> Seq Scan on center_out_cell (cost=100000000.00..100105115.47 rows=19 width=5) EXPLAIN db02=# -Tony At 09:47 PM 7/17/02 -0400, Tom Lane wrote: >reina@nsi.edu (Tony Reina) writes: > > db02=# explain select distinct area from center_out_cell where subject > > = 'M' and arm = 'R' and rep = 10 and success = 1 and direction = 1; > > NOTICE: QUERY PLAN: > > > Unique (cost=87795.47..87795.80 rows=13 width=5) > > -> Sort (cost=87795.47..87795.47 rows=131 width=5) > > -> Seq Scan on center_out_cell (cost=0.00..87790.87 rows=131 > > width=5) > > > Index "pk1center_out_cell" > > Column | Type > > ------------+-------------- > > subject | text > > arm | character(1) > > target | smallint > > rep | integer > > hemisphere | character(1) > > area | text > > filenumber | integer > > dsp_chan | text > > direction | smallint > > unique btree > > Index predicate: (success = 1) > >I imagine the problem with this index is that there's no constraint for >"target" in the query; so the planner could only use the first two index >columns (subject and arm), which probably isn't very selective. The >index used in the other query is defined differently: > > > db02=# \d pk1circles_cell > > Index "pk1circles_cell" > > Column | Type > > ------------+-------------- > > subject | text > > arm | character(1) > > rep | integer > > direction | smallint > > hemisphere | character(1) > > area | text > > filenumber | integer > > dsp_chan | text > > unique btree > > Index predicate: (success = 1) > >This allows "rep" to be used in the indexscan too (and if you were to >cast properly, viz "direction = 1::smallint", then that column could be >used as well). > > regards, tom lane
As a followup to my slow query plans: I've experimented with removing the inheritance schema to see how it affects my database calls. Originally, because several fields of the primary key (subject, arm, rep, direction, success) were common to every table, I made those fields a separate table and had subsequent tables inherit the fields. Just out of curiosity, I built a new database with the same data but didn't use the inheritance (i.e. each table had its own copy of those common fields). It looks like about a 20% increase in execution speed when I run my programs side by side. I'm not sure if that kind of performance hit should be expected. Anyone have an idea about this? -Tony reina@nsi.edu (Tony Reina) wrote in message news:<5.1.1.6.0.20020718095319.009ecec0@schubert.nsi.edu>... > If I understand correctly, I tried specifying the target and even casting > all of the smallint's, but it still is a slow estimate. Perhaps, this is > just due to a large amount of data, but my gut is telling me that I have > something wrong here. >
On 19 Jul 2002, Tony Reina wrote: > Just out of curiosity, I built a new > database with the same data but didn't use the inheritance (i.e. each > table had its own copy of those common fields). It looks like about a > 20% increase in execution speed when I run my programs side by side. Have you tried it using the standard relational method of doing this? (I.e., you put the common fields in one table, and the extra fields in other tables, along with a foreign key relating the extra fields back to the main table.) That would more accurately replacate what you were doing with inheritance. I have a suspicion, in fact, that inheritance may just be syntatic sugar for doing this and adding a couple of views. :-) Anyway, it could be that by denormalizing the data (copying it to the other tables), you reduced the number of joins you do, and so you got a performance increase. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
At 02:17 PM 7/20/02 +0900, Curt Sampson wrote: >Have you tried it using the standard relational method of doing this? >(I.e., you put the common fields in one table, and the extra fields in >other tables, along with a foreign key relating the extra fields back >to the main table.) That would more accurately replacate what you were >doing with inheritance. > >I have a suspicion, in fact, that inheritance may just be syntatic sugar >for doing this and adding a couple of views. :-) Yes, I thought this was the case too. I haven't specifically setup foreign keys, but I was under the impression that the "INHERITS" command would do this. >Anyway, it could be that by denormalizing the data (copying it to the >other tables), you reduced the number of joins you do, and so you got a >performance increase. Yes, I guess this is probably the case although it speaks against normalizing too much. I guess too much of a good thing is bad. -Tony