Обсуждение: Planner estimates cost of 'like' a lot lower than '='??
I have a multiple-join select that takes ~70 seconds to execute
but if I remove one of the indexes the time drops to ~2 seconds.
In the 70 second case 'explain' estimates the cost to 17.87,
but in the 2 second case it is estimated to 3341.14.
Fiddling around revealed that the problem is that the cost of
'like' is severely underestimated, making the database use
the wrong index.
I simplified my table to a single-column (varchar(64)) 'test2'
table, and with my data select '... where value = ...' is
estimated at a cost of 756.92 but '... where value like ...'
is estimated at 2.60! (both selects return a single row).
I'm running 7.1.2 with multibyte support on FreeBSD (installed
via the ports). Vacuum analyze was run on all tables before
testing.
Details follow below.
_
Mats Lofkvist
mal@algonet.se
============================================================
The single column tests
============================================================
testdb=> \d test2
Table "test2"
Attribute | Type | Modifier
-----------+-----------------------+----------
value | character varying(64) |
Index: test2_valueindex
testdb=> \d test2_valueindex
Index "test2_valueindex"
Attribute | Type
-----------+-----------------------
value | character varying(64)
btree
testdb=> select count(*) from test2;
count
--------
118113
(1 row)
testdb=> explain select * from test2 where value = 't10k9999';
NOTICE: QUERY PLAN:
Index Scan using test2_valueindex on test2 (cost=0.00..756.92 rows=645 width=12)
EXPLAIN
testdb=> select * from test2 where value = 't10k9999';
value
----------
t10k9999
(1 row)
testdb=> explain select * from test2 where value like 't10k9999%';
NOTICE: QUERY PLAN:
Index Scan using test2_valueindex on test2 (cost=0.00..2.60 rows=1 width=12)
EXPLAIN
testdb=> select * from test2 where value like 't10k9999%';
value
----------
t10k9999
(1 row)
testdb=>
============================================================
The 'real' tests
============================================================
testdb=> \d data
Table "data"
Attribute | Type | Modifier
---------------+-----------------------+----------
key0 | character(32) | not null
key1 | character(32) | not null
key2 | character(32) | not null
value | character varying(64) | not null
longvalue | text | not null
Indices: datakey2index,
datakey1index,
dataindex,
datavalueindex
testdb=> \d dataindex
Index "dataindex"
Key2 | Type
----------+---------------
key0 | character(32)
key1 | character(32)
key2 | character(32)
unique btree
testdb=> \d datakey1index
Index "datakey1index"
Key2 | Type
------+---------------
key1 | character(32)
btree
testdb=> \d datakey2index
Index "datakey2index"
Key2 | Type
----------+---------------
key2 | character(32)
btree
testdb=> \d datavalueindex
Index "datavalueindex"
Key2 | Type
-------+-----------------------
value | character varying(64)
btree
testdb=>
testdb=> explain select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data find1
using(key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 = 'number') left
outerjoin data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where find0.key1 = 'uuuu'
andfind0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 = 'test' and find1.value
like'test_0';
NOTICE: QUERY PLAN:
Nested Loop (cost=0.00..17.87 rows=1 width=120)
-> Nested Loop (cost=0.00..13.40 rows=1 width=72)
-> Nested Loop (cost=0.00..8.92 rows=1 width=24)
-> Index Scan using datavalueindex on data find0 (cost=0.00..4.46 rows=1 width=12)
-> Index Scan using datavalueindex on data find1 (cost=0.00..4.46 rows=1 width=12)
-> Index Scan using dataindex on data ret0 (cost=0.00..4.46 rows=1 width=48)
-> Index Scan using dataindex on data ret1 (cost=0.00..4.46 rows=1 width=48)
EXPLAIN
testdb=> select now(); select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data
find1using (key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 =
'number')left outer join data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where
find0.key1= 'uuuu' and find0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 =
'test'and find1.value like 'test_0'; select now();
now
------------------------
2001-07-22 13:58:14+02
(1 row)
key0 | v0 | v1
----------------------------------+----+--------
8a7967698cae55e66e627969270c34d8 | 3 | test10
7e2d4eb1188d0e114bff6f0ccf658f59 | 3 | test20
f7c97d1ddafacc36faba09ef3be6ac9c | 3 | test30
e59c68a66f83b1fcdd8ec8e58a854fdb | 3 | test40
077cd901c5c9b88219e5c1d14acc7c41 | 3 | test50
36f6af71d8fa1331a3640675c1dd0cf7 | 3 | test60
bc0a3e2064508f70063516eb709c7654 | 3 | test70
34c376648ef62fce58e1d80f70f1327d | 3 | test80
127869c8452da6e1438795509380b946 | 3 | test90
(9 rows)
now
------------------------
2001-07-22 13:59:25+02
(1 row)
testdb=> drop index datavalueindex ;
DROP
testdb=> explain select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data find1
using(key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 = 'number') left
outerjoin data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where find0.key1 = 'uuuu'
andfind0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 = 'test' and find1.value
like'test_0';
NOTICE: QUERY PLAN:
Nested Loop (cost=0.00..3341.14 rows=1 width=120)
-> Nested Loop (cost=0.00..3336.67 rows=1 width=72)
-> Nested Loop (cost=0.00..3332.20 rows=1 width=24)
-> Index Scan using datakey2index on data find0 (cost=0.00..3327.72 rows=1 width=12)
-> Index Scan using dataindex on data find1 (cost=0.00..4.46 rows=1 width=12)
-> Index Scan using dataindex on data ret0 (cost=0.00..4.46 rows=1 width=48)
-> Index Scan using dataindex on data ret1 (cost=0.00..4.46 rows=1 width=48)
EXPLAIN
testdb=> select now(); select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data
find1using (key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 =
'number')left outer join data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where
find0.key1= 'uuuu' and find0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 =
'test'and find1.value like 'test_0'; select now();
now
------------------------
2001-07-22 14:00:08+02
(1 row)
key0 | v0 | v1
----------------------------------+----+--------
8a7967698cae55e66e627969270c34d8 | 3 | test10
127869c8452da6e1438795509380b946 | 3 | test90
34c376648ef62fce58e1d80f70f1327d | 3 | test80
bc0a3e2064508f70063516eb709c7654 | 3 | test70
36f6af71d8fa1331a3640675c1dd0cf7 | 3 | test60
077cd901c5c9b88219e5c1d14acc7c41 | 3 | test50
e59c68a66f83b1fcdd8ec8e58a854fdb | 3 | test40
f7c97d1ddafacc36faba09ef3be6ac9c | 3 | test30
7e2d4eb1188d0e114bff6f0ccf658f59 | 3 | test20
(9 rows)
now
------------------------
2001-07-22 14:00:10+02
(1 row)
testdb=>
Mats Lofkvist wrote: > > I have a multiple-join select that takes ~70 seconds to execute > but if I remove one of the indexes the time drops to ~2 seconds. > > _ > Mats Lofkvist > mal@algonet.se > I ran into a similar problem on RedHat Linux v6.1 on Intel, kernel 2.2.12-20, PostgreSQL 6.5.3 when moving a system from a single processor development box over to a dual processor production server. Dropping one of the indexes on a lookup table with roughly 68,000 records on the production box resulted in roughly a 3 fold increase in query execution speed. At the time, I thought it was an SMP issue, and have since been extremely conservative in adding indexes on SMP boxes.
haystack@email.rentzone.org (Buddy Lee Haystack) writes:
> I ran into a similar problem on RedHat Linux v6.1 on Intel, kernel
> 2.2.12-20, PostgreSQL 6.5.3 when moving a system from a single processor
> development box over to a dual processor production server. Dropping one
> of the indexes on a lookup table with roughly 68,000 records on the
> production box resulted in roughly a 3 fold increase in query execution
> speed. At the time, I thought it was an SMP issue, and have since been
> extremely conservative in adding indexes on SMP boxes.
I _am_ running it on an SMP box (FreeBSD 5.0-current from january
this year), but isn't it a bit far-fetched to assume that this is
an SMP issue? Is postgres even aware of running on an SMP box?
(if it isn't, why should the planner estimates differ depending
on if it is running on an SMP box or not?)
Forgive me for sounding negative, but I fail to see the connection.
Am I missing something?
_
Mats Lofkvist
mal@algonet.se
Mats Lofkvist <mal@algonet.se> writes:
> Fiddling around revealed that the problem is that the cost of
> 'like' is severely underestimated, making the database use
> the wrong index.
Not cost --- selectivity. How many rows actually match the criterion
WHERE find0.key2 = 'llll'
? How about
WHERE find0.value like 't10k__'
It would appear from your timings that the latter is not very selective
at all, whereas the former is quite selective. However, given the
limitations of the planner's statistical routines, I wouldn't be too
surprised if it makes the opposite guess in 7.1 and before. Notice
the difference between the estimated rows counts and reality in your
simplified test :-(. The speed differential in your join almost
certainly has nothing to do with the execution time of a single '='
or 'like' operator, and everything to do with the number of rows
coming out of the first-stage index scan. So if the planner guesses
wrong about which index is more selective for the query, it will choose
a bad plan.
How large is your dataset? Would you be willing to build a trial
installation of current sources, and see if the 7.2-to-be planner
does any better? We've done some major overhauling of the statistical
code since 7.1, and I'm curious to see its results in the field.
See our CVS server, or the nightly snapshot tarball at
http://www.ca.postgresql.org/ftpsite/dev/
Also: the overly large rows estimate for "where value = 't10k9999'"
is most likely caused by having some one extremely common value in
the column. (In 7.1 and before, the most common value is the *only*
statistic the planner has, and so a common MCV drives it to assume
that there are only a few distinct values in the column.) Often the
most common value is actually a dummy value, like an empty string.
If you have a lot of dummies, consider whether you can't replace them
with NULL. 7.1's VACUUM ANALYZE does distinguish NULLs from real
values, so this hack can help it derive somewhat less bogus stats.
regards, tom lane
Mats Lofkvist <mal@algonet.se> writes:
> Fiddling around revealed that the problem is that the cost of
> 'like' is severely underestimated, making the database use
> the wrong index.
Not cost --- selectivity. How many rows actually match the criterion
WHERE find0.key2 = 'llll'
? How about
WHERE find0.value like 't10k__'
There are 11004 rows matching key2 = 'llll' and 90 rows matching
value like 't10k__' (all 90 have key2 = 'llll').
It would appear from your timings that the latter is not very selective
at all, whereas the former is quite selective.
Did you mean the other way around?
However, given the
limitations of the planner's statistical routines, I wouldn't be too
surprised if it makes the opposite guess in 7.1 and before. Notice
the difference between the estimated rows counts and reality in your
simplified test :-(. The speed differential in your join almost
certainly has nothing to do with the execution time of a single '='
or 'like' operator, and everything to do with the number of rows
coming out of the first-stage index scan. So if the planner guesses
wrong about which index is more selective for the query, it will choose
a bad plan.
How large is your dataset? Would you be willing to build a trial
installation of current sources, and see if the 7.2-to-be planner
does any better? We've done some major overhauling of the statistical
code since 7.1, and I'm curious to see its results in the field.
See our CVS server, or the nightly snapshot tarball at
http://www.ca.postgresql.org/ftpsite/dev/
The 'data' table contains 162135 rows, the 'test2' table contains
118113 rows (the latter is a subset of the data.value column).
(I'm downloading the CVS tree right now. Do I need to do dump/restore
or can I just start it on the current data?)
Also: the overly large rows estimate for "where value = 't10k9999'"
is most likely caused by having some one extremely common value in
the column. (In 7.1 and before, the most common value is the *only*
statistic the planner has, and so a common MCV drives it to assume
that there are only a few distinct values in the column.) Often the
most common value is actually a dummy value, like an empty string.
If you have a lot of dummies, consider whether you can't replace them
with NULL. 7.1's VACUUM ANALYZE does distinguish NULLs from real
values, so this hack can help it derive somewhat less bogus stats.
Yes, there are very common values, but none can be considered dummies
(i.e. they can't be replaced by null in a production database).
data.key0 is an object id, data.key1 and data.key2 is a two-part
object member name and data.longValue is the member value. data.value
is data.longValue truncated to make it possible to index a prefix of
data.longValue with databases not supporting this explicitly.
When running the tests, data contained ~11k objects each having about
a dozen members, some with unique values but may with common values.
Maybe I'm mistaken in assuming that the simplified test points at the
problem with 'real' test, but aren't cost estimates comparable between
two different explains? If they should be I still don't understand how
"where value = 'xxx'" can be estimated to return 600 times more rows
than "where value like 'xxx%'" (this is what happens in my simplified
test).
regards, tom lane
thanks for the reply,
_
Mats Lofkvist
mal@algonet.se
Mats Lofkvist <mal@algonet.se> writes:
> There are 11004 rows matching key2 = 'llll' and 90 rows matching
> value like 't10k__' (all 90 have key2 = 'llll').
Hmph. On that basis, one would think the planner made the right choice
the first time. Curious. Do you have locale support enabled? If so,
what locale are you using in the database?
> (I'm downloading the CVS tree right now. Do I need to do dump/restore
> or can I just start it on the current data?)
You'll need to dump/reload. I wouldn't advise running CVS tip on your
production database, even if it were compatible ;-). Set it up as a
playpen installation, instead. To do this, give configure a --prefix
pointing at a temporary directory, plus --with-pgport to select a port
number other than the default, and when you initdb and start the
postmaster, specify a data directory inside the temp area.
> I still don't understand how
> "where value = 'xxx'" can be estimated to return 600 times more rows
> than "where value like 'xxx%'" (this is what happens in my simplified
> test).
Because the LIKE test is estimated as a range query (where value >=
'xxx' AND value < 'xxy') which uses entirely different statistics
than the equality test does.
regards, tom lane
Actually, now that I look more closely, I bet that the real failure in
this example is not in estimation of the find0 scan, but in estimation
of the find1 scan. Notice that the plan switches from using
datavalueindex for find1 (ie, it's keying off "find1.value like
'test_0'", which means that the indexscan limits are 'test' to 'tesu')
to using dataindex (since this is an inner indexscan, values are
available for all three of key0, key1, key2). Since dataindex is a
unique index, that means only one row will be fetched from the index,
as opposed to however many are selected by "where find1.value >= 'test'
AND find1.value < 'tesu'".
By eyeball, it seems obvious that the unique-index lookup should be
preferred. I am not sure why the planner is selecting the other
instead, but it probably points to bogus estimation of the LIKE range
selectivity. What do you get from both EXPLAIN and actual execution
of
select count(*) from data where value like 'test_0';
select count(*) from data where value >= 'test' and value < 'tesu';
regards, tom lane
From: Tom Lane <tgl@sss.pgh.pa.us>
Mats Lofkvist <mal@algonet.se> writes:
> There are 11004 rows matching key2 = 'llll' and 90 rows matching
> value like 't10k__' (all 90 have key2 = 'llll').
Hmph. On that basis, one would think the planner made the right choice
the first time. Curious. Do you have locale support enabled? If so,
what locale are you using in the database?
The FreeBSD port sets --enable-locale --enable-multibyte=LATIN1.
> (I'm downloading the CVS tree right now. Do I need to do dump/restore
> or can I just start it on the current data?)
You'll need to dump/reload. I wouldn't advise running CVS tip on your
production database, even if it were compatible ;-). Set it up as a
playpen installation, instead. To do this, give configure a --prefix
pointing at a temporary directory, plus --with-pgport to select a port
number other than the default, and when you initdb and start the
postmaster, specify a data directory inside the temp area.
It's all a test database, so trashing is not a (big) problem.
> I still don't understand how
> "where value = 'xxx'" can be estimated to return 600 times more rows
> than "where value like 'xxx%'" (this is what happens in my simplified
> test).
Because the LIKE test is estimated as a range query (where value >=
'xxx' AND value < 'xxy') which uses entirely different statistics
than the equality test does.
Ok, guess I should have skipped the conclusions and stayed with
describing the real problem then :-)
regards, tom lane
_
Mats Lofkvist
mal@algonet.se
From: Tom Lane <tgl@sss.pgh.pa.us>
Actually, now that I look more closely, I bet that the real failure in
this example is not in estimation of the find0 scan, but in estimation
of the find1 scan. Notice that the plan switches from using
datavalueindex for find1 (ie, it's keying off "find1.value like
'test_0'", which means that the indexscan limits are 'test' to 'tesu')
to using dataindex (since this is an inner indexscan, values are
available for all three of key0, key1, key2). Since dataindex is a
unique index, that means only one row will be fetched from the index,
as opposed to however many are selected by "where find1.value >= 'test'
AND find1.value < 'tesu'".
By eyeball, it seems obvious that the unique-index lookup should be
preferred. I am not sure why the planner is selecting the other
instead, but it probably points to bogus estimation of the LIKE range
selectivity. What do you get from both EXPLAIN and actual execution
of
select count(*) from data where value like 'test_0';
select count(*) from data where value >= 'test' and value < 'tesu';
regards, tom lane
testdb=> select count(*) from data where value like 'test_0';
count
-------
9
(1 row)
testdb=> select count(*) from data where value >= 'test' and value < 'tesu';
count
-------
10000
(1 row)
This exact query probably isn't what I would see in a production db,
but on the other hand some values searched for will be extremely
common (e.g. 'true' and 'false'), so with this table using dataindex
(for find1) will always be superior to using datavalueindex.
_
Mats Lofkvist
mal@algonet.se
Mats Lofkvist <mal@algonet.se> writes:
> EXPLAIN
> testdb=> select count(*) from data where value >= 'test' and value < 'tesu';
> count
> -------
> 10000
> (1 row)
> testdb=> explain select count(*) from data where value >= 'test' and value < 'tesu';
> NOTICE: QUERY PLAN:
> Aggregate (cost=4.46..4.46 rows=1 width=0)
> -> Index Scan using datavalueindex on data (cost=0.00..4.45 rows=1 width=0)
Sure enough, this is the source of the speed problem: when using the
index on "value", each iteration of the find1 scan will be indexscanning
10000 tuples to find the single one that passes the LIKE and other
qualifications. But the planner mistakenly thinks that the indexscan
will find only one tuple, and so it has no reason to prefer the other
index over this one.
(Unfortunately, the planner is too stupid to realize that the other
index *guarantees* to return no more than one tuple for this query,
and hence should be preferred over a mere statistical estimate of one
selected tuple. Not sure how we could incorporate such a consideration
into what's fundamentally a cost-estimate-driven process.)
I think that current sources will probably do a lot better on the range
estimation problem. I'll be interested to see what you get from these
same tests when you have the data loaded into current...
regards, tom lane
I just did a quick test before going to bed but 7.2devel seems
do to what we want:
testdb=> select count(*) from data;
count
--------
162135
(1 row)
testdb=> vacuum analyze data;
VACUUM
testdb=> explain select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data find1
using(key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 = 'number') left
outerjoin data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where find0.key1 = 'uuuu'
andfind0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 = 'test' and find1.value
like'test_0';
NOTICE: QUERY PLAN:
Nested Loop (cost=0.00..24.05 rows=1 width=318)
-> Nested Loop (cost=0.00..18.02 rows=1 width=195)
-> Nested Loop (cost=0.00..11.98 rows=1 width=72)
-> Index Scan using datavalueindex on data find0 (cost=0.00..5.94 rows=1 width=36)
-> Index Scan using dataindex on data find1 (cost=0.00..6.03 rows=1 width=36)
-> Index Scan using dataindex on data ret0 (cost=0.00..6.02 rows=1 width=123)
-> Index Scan using dataindex on data ret1 (cost=0.00..6.02 rows=1 width=123)
EXPLAIN
testdb=> select now(); select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data
find1using (key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 =
'number')left outer join data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where
find0.key1= 'uuuu' and find0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 =
'test'and find1.value like 'test_0'; select now();
now
------------------------
2001-07-23 00:35:13+02
(1 row)
key0 | v0 | v1
----------------------------------+----+--------
8a7967698cae55e66e627969270c34d8 | 3 | test10
7e2d4eb1188d0e114bff6f0ccf658f59 | 3 | test20
f7c97d1ddafacc36faba09ef3be6ac9c | 3 | test30
e59c68a66f83b1fcdd8ec8e58a854fdb | 3 | test40
077cd901c5c9b88219e5c1d14acc7c41 | 3 | test50
36f6af71d8fa1331a3640675c1dd0cf7 | 3 | test60
bc0a3e2064508f70063516eb709c7654 | 3 | test70
34c376648ef62fce58e1d80f70f1327d | 3 | test80
127869c8452da6e1438795509380b946 | 3 | test90
(9 rows)
now
------------------------
2001-07-23 00:35:14+02
(1 row)
testdb=>
testdb=>
testdb=>
testdb=> select count(*) from data where value like 'test_0';
count
-------
9
(1 row)
testdb=> explain select count(*) from data where value like 'test_0';
NOTICE: QUERY PLAN:
Aggregate (cost=5581.45..5581.45 rows=1 width=0)
-> Seq Scan on data (cost=0.00..5580.69 rows=306 width=0)
EXPLAIN
testdb=> select count(*) from data where value >= 'test' and value < 'tesu';
count
-------
10000
(1 row)
testdb=> explain select count(*) from data where value >= 'test' and value < 'tesu';
NOTICE: QUERY PLAN:
Aggregate (cost=6007.24..6007.24 rows=1 width=0)
-> Seq Scan on data (cost=0.00..5986.02 rows=8487 width=0)
EXPLAIN
testdb=>
_
Mats Lofkvist
mal@algonet.se
Mats Lofkvist <mal@algonet.se> writes:
> I just did a quick test before going to bed but 7.2devel seems
> do to what we want:
> testdb=> select count(*) from data where value like 'test_0';
> count
> -------
> 9
> (1 row)
> testdb=> explain select count(*) from data where value like 'test_0';
> NOTICE: QUERY PLAN:
> Aggregate (cost=5581.45..5581.45 rows=1 width=0)
> -> Seq Scan on data (cost=0.00..5580.69 rows=306 width=0)
> EXPLAIN
> testdb=> select count(*) from data where value >= 'test' and value < 'tesu';
> count
> -------
> 10000
> (1 row)
> testdb=> explain select count(*) from data where value >= 'test' and value < 'tesu';
> NOTICE: QUERY PLAN:
> Aggregate (cost=6007.24..6007.24 rows=1 width=0)
> -> Seq Scan on data (cost=0.00..5986.02 rows=8487 width=0)
> EXPLAIN
Those estimates do look a lot closer to reality, all right. And the
join plan is the right thing now. Excellent...
regards, tom lane
Precisely the point! There "shouldn't" be an issue with SMP boxes other than OS differences that may result in one OS more efficiently utilizing additional processors, yet there were striking differences in the execution of my queries when I dropped the index after moving my system from a single processor to a dual processor box using the identical software configuration. At the time, I *thought* it may have been a PostgreSQL issue, but it just as well may have been a Linux issue. Your similar situation on BSD leads me to believe that it may in fact be an issue with PostgreSQL after all. Disturbing isn't it? ;) Mats Lofkvist wrote: > > haystack@email.rentzone.org (Buddy Lee Haystack) writes: > > > I ran into a similar problem on RedHat Linux v6.1 on Intel, kernel > > 2.2.12-20, PostgreSQL 6.5.3 when moving a system from a single processor > > development box over to a dual processor production server. Dropping one > > of the indexes on a lookup table with roughly 68,000 records on the > > production box resulted in roughly a 3 fold increase in query execution > > speed. At the time, I thought it was an SMP issue, and have since been > > extremely conservative in adding indexes on SMP boxes. > > I _am_ running it on an SMP box (FreeBSD 5.0-current from january > this year), but isn't it a bit far-fetched to assume that this is > an SMP issue? Is postgres even aware of running on an SMP box? > (if it isn't, why should the planner estimates differ depending > on if it is running on an SMP box or not?) > > Forgive me for sounding negative, but I fail to see the connection. > Am I missing something? > > _ > Mats Lofkvist > mal@algonet.se > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)