Обсуждение: All things equal, we are still alot slower then MySQL?
Using the exact same data, and the exact same queries (dbi is cool): MySQL: 0.498u 0.150s 0:02.50 25.6% 10+1652k 0+0io 0pf+0w PgSQL: 0.494u 0.061s 0:19.78 2.7% 10+1532k 0+0io 0pf+0w The main query that appears to be "dog slow" is: SELECT distinct b.indid, b.divid, b.catid, a.id, a.mid \ FROM aecEntMain a, aecWebEntry b \ WHERE (a.id=b.idAND a.mid=b.mid) \ AND (a.status like 'active%' and b.status like 'active%') AND (a.status like'%active:ALL%' and b.status like '%active:ALL%') AND (a.representation like '%:ALL%') AND (b.indid=?and b.divid=? and b.catid=?)"; Where, unfortunately, getting rid of those LIKE comparisons will be next to impossible in the short time... >From the 'time' numbers, MySQL is running ~17sec faster, but uses up 23% more CPU to do this...so where is our slowdown? Obviously it isn't a lack of CPU...all else is equal...hardware wise, both are running on the same machine. If I get rid of the three lines above that deal with LIKE, the results are: MySQL: 0.497u 0.168s 0:01.48 43.9% 9+1519k 0+0io 0pf+0w PgSQL: 0.504u 0.052s 0:17.81 3.0% 10+1608k 0+0io 0pf+0w So, blaming things on the LIKE conditions is totally inappropriate... And looking at the EXPLAIN of the above, I have enough indices: NOTICE: QUERY PLAN: Unique (cost=1271.15 rows=5 width=84) -> Sort (cost=1271.15 rows=5 width=84) -> Nested Loop (cost=1271.15 rows=5width=84) -> Index Scan using aecwebentry_primary on aecwebentry b (cost=1269.08 rows=1 width=60) -> Index Scan using aecentmain_primary on aecentmain a (cost=2.07 rows=16560 width=24) EXPLAIN I'm starting the server as: #!/bin/tcsh setenv POSTMASTER /usr/local/db/pgsql/bin/postmaster rm /tmp/.s.P* ${POSTMASTER} -o "-F -o /usr/local/db/pgsql/errout -S 32768" \ -i -p 5432 -D/usr/local/db/pgsql/data -B 256 & So I think I'm dedicating *more* then enough resources to the server, no? Again, this data is static...hasn't changed for either database since we loaded it yesterday...a vacuum analyze has been done on the PostgreSQL database, but we haven't done anything with the MySQL one (no vacuum, no special run parameters) I'm going to be working with this company towards cleaning up the table structures over the next little while, with an eye towards moving it to PostgreSQL, but, all things considered equal except for the DB software itself...how is it that we are *so* much slower? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> Using the exact same data, and the exact same queries (dbi is cool): > MySQL: 0.498u 0.150s 0:02.50 25.6% 10+1652k 0+0io 0pf+0w > PgSQL: 0.494u 0.061s 0:19.78 2.7% 10+1532k 0+0io 0pf+0w > >From the 'time' numbers, MySQL is running ~17sec faster, but uses up 23% > more CPU to do this...so where is our slowdown? I don't remember if you gave details on the sizes of tables, but in any case I'm going to guess that you are spending almost all of your time in the optimizer. Try manipulating the parameters to force the genetic optimizer and see if it helps. Lots of quals but only two tables gives you a non-optimal case for the default exhaustive optimizer. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
On Sun, 19 Sep 1999, Thomas Lockhart wrote: > > Using the exact same data, and the exact same queries (dbi is cool): > > MySQL: 0.498u 0.150s 0:02.50 25.6% 10+1652k 0+0io 0pf+0w > > PgSQL: 0.494u 0.061s 0:19.78 2.7% 10+1532k 0+0io 0pf+0w > > >From the 'time' numbers, MySQL is running ~17sec faster, but uses up 23% > > more CPU to do this...so where is our slowdown? > > I don't remember if you gave details on the sizes of tables, but in > any case I'm going to guess that you are spending almost all of your > time in the optimizer. Try manipulating the parameters to force the > genetic optimizer and see if it helps. Lots of quals but only two > tables gives you a non-optimal case for the default exhaustive > optimizer. With default GEQO == 11 relations:0.506u 0.045s 0:19.51 2.7% 10+1596k 0+0io 0pf+0w With GEQO == 2 relations:0.522u 0.032s 0:19.47 2.8% 9+1385k 0+0io 0pf+0w If I use that big SUBSELECT that I posted earlier, with GEQO==2:0.005u 0.020s 0:07.84 0.2% 120+486k 0+0io 0pf+0w And with GEQO==11:0.008u 0.016s 0:07.83 0.1% 144+556k 0+0io 0pf+0w So, going with one large SELECT call with two SUBSELECTs in it cuts off 12secs, but its a web application, and we're still talking 5 seconds response slower...and alot less CPU being used, which is nice... But I'm trying to compare apples->apples as much as possible, and MySQL won't allow us to do that large SUBSELECT call...gives errors, so I'm guessing its unsupported... Other ideas, or am I stuck with accepting 7secs? (Realizing that as each new release comes out, that 7secs tends to have a habit of dropping with all the optimizations and cleans up we do to the server itself) If so, then I'm going to have to spend time trying to fix the tables themselves before delving into switching over to PostgreSQL...which hurts :( Okay, table sizes for the data are: aecCategory == 1170 Table = aeccategory +----------------------------------+----------------------------------+-------+ | Field | Type | Length| +----------------------------------+----------------------------------+-------+ | ppid | varchar() not null default '' | 6 | | pid | varchar() not null default '' | 6 | | id | varchar() not null default '' | 6 | | name | varchar() not null default '' | 255 | | description | varchar() | 255 | | url | varchar() | 255 | | comidsrc | int4 | 4 | | datelast | timestamp | 4 | +----------------------------------+----------------------------------+-------+ Indices: aeccategory_id aeccategory_primary aecEntMain == 16560 Table = aecentmain +----------------------------------+----------------------------------+-------+ | Field | Type | Length| +----------------------------------+----------------------------------+-------+ | id | varchar() not null default '' | 6 | | mid | char() not null default '' | 2 | | name | varchar() not null default '' | 200 | | description | text | var | | url | varchar() | 255 | | street | varchar() | 255 | | city | varchar() | 255 | | state | varchar() | 255 | | postal | varchar() | 255 | | country | varchar() | 255 | | servarea | varchar() | 255 | | business | varchar() | 255 | | representation | varchar() | 255 | | status | varchar() | 255 | | datecreate | varchar() | 14 | | whocreate | varchar() | 255 | | datelast | timestamp | 4 | | wholast | varchar() | 255 | +----------------------------------+----------------------------------+-------+ Indices: aecentmain_entityname aecentmain_primary aecWebEntry == 58316 Table = aecwebentry +----------------------------------+----------------------------------+-------+ | Field | Type | Length| +----------------------------------+----------------------------------+-------+ | indid | varchar() not null default '' | 6 | | divid | varchar() not null default '' | 6 | | catid | varchar() not null default '' | 6 | | id | varchar() not null default '' | 6 | | mid | char() not null default '' | 2 | | webdetid | int4 | 4 | | status | varchar() | 255 | | datecreate | varchar() | 14 | | whocreate | varchar() | 255 | | datelast | timestamp | 4 | | wholast | varchar() | 255 | +----------------------------------+----------------------------------+-------+ Index: aecwebentry_primary Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker <scrappy@hub.org> writes: > MySQL: 0.498u 0.150s 0:02.50 25.6% 10+1652k 0+0io 0pf+0w > PgSQL: 0.494u 0.061s 0:19.78 2.7% 10+1532k 0+0io 0pf+0w > From the 'time' numbers, MySQL is running ~17sec faster, but uses up 23% > more CPU to do this...so where is our slowdown? It's gotta be going into I/O, obviously. (I hate profilers that can't count disk accesses...) My guess is that the index scans are losing because they wind up touching too many disk pages. You show > NOTICE: QUERY PLAN: > > Unique (cost=1271.15 rows=5 width=84) > -> Sort (cost=1271.15 rows=5 width=84) > -> Nested Loop (cost=1271.15 rows=5 width=84) > -> Index Scan using aecwebentry_primary on aecwebentry b (cost=1269.08 rows=1 width=60) > -> Index Scan using aecentmain_primary on aecentmain a (cost=2.07 rows=16560 width=24) > > EXPLAIN which means this should be a great plan if the optimizer is guessing right about the selectivity of the index scans: it's estimating only one tuple returned from the aecwebentry scan, hence only one iteration of the nested scan over aecentmain, which it is estimating will yield only five output tuples to be sorted and uniquified. I am betting these estimates are off rather badly :-(. The indexscans are probably hitting way more pages than the optimizer guessed they will. It may just be that I have optimizer on the brain from having spent too much time looking at it, but this smells to me like bad-plan-resulting- from-bad-selectivity-estimation syndrome. Perhaps I can fix it for 6.6 as a part of the optimizer cleanups I am doing. I'd like to get as much info as I can about the test case. How many tuples *does* your test query produce, anyway? If you eliminate all the joining WHERE-clauses and just consider the restriction clauses for each of the tables, how many tuples? In other words, what do you get from SELECT count(*) FROM aecEntMain a WHERE (a.id=??? AND a.mid=???) AND (a.status like 'active%') AND (a.status like '%active:ALL%') AND (a.representation like '%:ALL%'); SELECT count(*) FROM aecWebEntry b WHERE (b.status like 'active%') AND (b.status like '%active:ALL%') AND (b.indid=? and b.divid=? and b.catid=?); (In the first of these, substitute a representative id/mid pair from table b for the ???, to simulate what will happen in any one iteration of the inner scan over table a.) Also, how many rows in each table? regards, tom lane
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> Using the exact same data, and the exact same queries (dbi is cool): >> MySQL: 0.498u 0.150s 0:02.50 25.6% 10+1652k 0+0io 0pf+0w >> PgSQL: 0.494u 0.061s 0:19.78 2.7% 10+1532k 0+0io 0pf+0w >>>> From the 'time' numbers, MySQL is running ~17sec faster, but uses up 23% >> more CPU to do this...so where is our slowdown? > I don't remember if you gave details on the sizes of tables, but in > any case I'm going to guess that you are spending almost all of your > time in the optimizer. No --- if he were, it'd be all CPU time, not 2.7% CPU usage. The time's got to be going into disk accesses. I'm perfectly prepared to blame the optimizer, but I think it's because of a bad plan not too much time spent making the plan... regards, tom lane
>>> MySQL: 0.498u 0.150s 0:02.50 25.6% 10+1652k 0+0io 0pf+0w >>> PgSQL: 0.494u 0.061s 0:19.78 2.7% 10+1532k 0+0io 0pf+0w > No --- if he were, it'd be all CPU time, not 2.7% CPU usage. Er, wait a second. Are we measuring backend-process runtime here, or is that the result of 'time' applied to a *client* ? regards, tom lane
> >>> MySQL: 0.498u 0.150s 0:02.50 25.6% 10+1652k 0+0io 0pf+0w > >>> PgSQL: 0.494u 0.061s 0:19.78 2.7% 10+1532k 0+0io 0pf+0w > > No --- if he were, it'd be all CPU time, not 2.7% CPU usage. > Er, wait a second. Are we measuring backend-process runtime here, > or is that the result of 'time' applied to a *client* ? Right. That was my point; unless he is firing up the backend using "time", or running it standalone, it is hard to measure real CPU time... - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Tom Lane wrote: > > The Hermit Hacker <scrappy@hub.org> writes: > > MySQL: 0.498u 0.150s 0:02.50 25.6% 10+1652k 0+0io 0pf+0w > > PgSQL: 0.494u 0.061s 0:19.78 2.7% 10+1532k 0+0io 0pf+0w > > From the 'time' numbers, MySQL is running ~17sec faster, but uses up 23% > > more CPU to do this...so where is our slowdown? > > It's gotta be going into I/O, obviously. (I hate profilers that can't > count disk accesses...) My guess is that the index scans are losing > because they wind up touching too many disk pages. You show > On that particular machine that can be verified easily, I hope. (there seems to be enough RAM). You can simply issue 10 to 100 such queries in a row. Hopefully after the first query all needed info will be in a disk cache, so the rest queries will not draw info from disk. That will be a clean experiment. -- Leon. ------- He knows he'll never have to answer for any of his theories actually being put to test. If they were, they would be contaminated by reality.
On Mon, 20 Sep 1999, Leon wrote: > Tom Lane wrote: > > > > The Hermit Hacker <scrappy@hub.org> writes: > > > MySQL: 0.498u 0.150s 0:02.50 25.6% 10+1652k 0+0io 0pf+0w > > > PgSQL: 0.494u 0.061s 0:19.78 2.7% 10+1532k 0+0io 0pf+0w > > > From the 'time' numbers, MySQL is running ~17sec faster, but uses up 23% > > > more CPU to do this...so where is our slowdown? > > > > It's gotta be going into I/O, obviously. (I hate profilers that can't > > count disk accesses...) My guess is that the index scans are losing > > because they wind up touching too many disk pages. You show > > > > On that particular machine that can be verified easily, I hope. > (there seems to be enough RAM). You can simply issue 10 to 100 such > queries in a row. Hopefully after the first query all needed info > will be in a disk cache, so the rest queries will not draw info from > disk. That will be a clean experiment. With the server started as: ${POSTMASTER} -o "-F -o /usr/local/db/pgsql/errout -S 32768" \ -i -p 5432 -D/usr/local/db/pgsql/data -B 256 & And with me being the only person on that system running against the PostgreSQL database (ie. I don't believe the SI invalidation stuff comes into play?), the time to run is the exact same each time: 1st run: 0.488u 0.056s 0:16.34 3.2% 10+1423k 0+0io 0pf+0w 2nd run: 0.500u 0.046s 0:16.34 3.3% 10+1517k 0+0io 0pf+0w 3rd run: 0.496u 0.049s 0:16.33 3.2% 9+1349k 0+0io 0pf+0w 4th run: 0.487u 0.056s 0:16.32 3.2% 14+1376k 0+0io 0pf+0w Note that the results fed back are *exactly* the same each time...the data is static, as its purely a test database... I believe that I have the buffers set "Abnormally high", as well as have provided more then sufficient sort buffer space... Using the 'optimized' query, that uses subselects, the runs are similar: 1st run: 0.467u 0.031s 0:08.26 5.9% 15+1345k 0+0io 0pf+0w 2nd run: 0.475u 0.023s 0:08.29 5.9% 15+1384k 0+0io 0pf+0w 3rd run: 0.468u 0.031s 0:08.28 5.9% 10+1325k 0+0io 0pf+0w 4th run: 0.461u 0.031s 0:08.31 5.8% 10+1362k 0+0io 0pf+0w Time is cut in half, CPU usage goes up a bit...but all runs are pretty much the same... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Sun, 19 Sep 1999, Tom Lane wrote: > How many tuples *does* your test query produce, anyway? If you Depends on what it is fed...could be 270 records returned, could be 5...depends on the values of catid, indid and divid... > eliminate all the joining WHERE-clauses and just consider the > restriction clauses for each of the tables, how many tuples? > In other words, what do you get from > > SELECT count(*) > FROM aecEntMain a > WHERE (a.id=??? AND a.mid=???) > AND (a.status like 'active%') > AND (a.status like '%active:ALL%') > AND (a.representation like '%:ALL%'); Returns 1 ... > SELECT count(*) > FROM aecWebEntry b > WHERE (b.status like 'active%') > AND (b.status like '%active:ALL%') > AND (b.indid=? and b.divid=? and b.catid=?); This one I get 39 ... > (In the first of these, substitute a representative id/mid pair from > table b for the ???, to simulate what will happen in any one iteration > of the inner scan over table a.) Also, how many rows in each table? aec=> select count(*) from aecEntMain; count ----- 16560 (1 row) aec=> select count(*) from aecWebEntry; count ----- 58316 (1 row) By doing a 'select distinct id from aecWebEntry', there are 16416 distinct id's in aecWebEntry, and 16493 distinct id's in aecEntMain, so I'm guessing that its supposed to be a 1->N relationship between the two tables...therefore, again, I'm guessing, but the first query above shoudl never return more then 1 record... If I run both queries together, as: SELECT distinct b.indid, b.divid, b.catid, a.id, a.mid FROM aecEntMaina, aecWebEntry b WHERE (a.id=b.id AND a.mid=b.mid) AND (a.status like 'active%' and b.status like'active%') AND (a.status like '%active:ALL%' and b.status like '%active:ALL%') AND (a.representationlike '%:ALL%') AND (b.indid='000001' and b.divid='100016' and b.catid='100300'); The result, in this case, is 39 records...if I change b.catid to be '100400', its only 35 records, etc... Does this help? The server isn't live, so if you want me to enable some debugging, or play with something, its not going to affect anything... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Anyone get a chance to look into this? On Sun, 19 Sep 1999, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > > MySQL: 0.498u 0.150s 0:02.50 25.6% 10+1652k 0+0io 0pf+0w > > PgSQL: 0.494u 0.061s 0:19.78 2.7% 10+1532k 0+0io 0pf+0w > > From the 'time' numbers, MySQL is running ~17sec faster, but uses up 23% > > more CPU to do this...so where is our slowdown? > > It's gotta be going into I/O, obviously. (I hate profilers that can't > count disk accesses...) My guess is that the index scans are losing > because they wind up touching too many disk pages. You show > > > NOTICE: QUERY PLAN: > > > > Unique (cost=1271.15 rows=5 width=84) > > -> Sort (cost=1271.15 rows=5 width=84) > > -> Nested Loop (cost=1271.15 rows=5 width=84) > > -> Index Scan using aecwebentry_primary on aecwebentry b (cost=1269.08 rows=1 width=60) > > -> Index Scan using aecentmain_primary on aecentmain a (cost=2.07 rows=16560 width=24) > > > > EXPLAIN > > which means this should be a great plan if the optimizer is guessing > right about the selectivity of the index scans: it's estimating only > one tuple returned from the aecwebentry scan, hence only one iteration > of the nested scan over aecentmain, which it is estimating will yield > only five output tuples to be sorted and uniquified. > > I am betting these estimates are off rather badly :-(. The indexscans > are probably hitting way more pages than the optimizer guessed they will. > > It may just be that I have optimizer on the brain from having spent too > much time looking at it, but this smells to me like bad-plan-resulting- > from-bad-selectivity-estimation syndrome. Perhaps I can fix it for 6.6 > as a part of the optimizer cleanups I am doing. I'd like to get as much > info as I can about the test case. > > How many tuples *does* your test query produce, anyway? If you > eliminate all the joining WHERE-clauses and just consider the > restriction clauses for each of the tables, how many tuples? > In other words, what do you get from > > SELECT count(*) > FROM aecEntMain a > WHERE (a.id=??? AND a.mid=???) > AND (a.status like 'active%') > AND (a.status like '%active:ALL%') > AND (a.representation like '%:ALL%'); > > SELECT count(*) > FROM aecWebEntry b > WHERE (b.status like 'active%') > AND (b.status like '%active:ALL%') > AND (b.indid=? and b.divid=? and b.catid=?); > > (In the first of these, substitute a representative id/mid pair from > table b for the ???, to simulate what will happen in any one iteration > of the inner scan over table a.) Also, how many rows in each table? > > regards, tom lane > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker <scrappy@hub.org> writes: > Anyone get a chance to look into this? Only just now, but I do have a couple of thoughts. For the query SELECT distinct b.indid, b.divid, b.catid, a.id, a.mid \ FROM aecEntMain a, aecWebEntry b \ WHERE (a.id=b.idAND a.mid=b.mid) \ AND (a.status like 'active%' and b.status like 'active%') AND (a.status like'%active:ALL%' and b.status like '%active:ALL%') AND (a.representation like '%:ALL%') AND (b.indid=?and b.divid=? and b.catid=?)"; you're showing a plan of Unique (cost=1271.15 rows=5 width=84) -> Sort (cost=1271.15 rows=5 width=84) -> Nested Loop (cost=1271.15 rows=5width=84) -> Index Scan using aecwebentry_primary on aecwebentry b (cost=1269.08 rows=1 width=60) -> Index Scan using aecentmain_primary on aecentmain a (cost=2.07 rows=16560 width=24) which indicates that the optimizer is guessing only one match in aecwebentry and is therefore putting it on the outside of the nested loop (so that the inner scan over aecentmain would only have to be done once, if it's guessing right). But in a later message you say that the actual number of hits is more like 39 for aecwebentry and one for aecentmain. Which means that the nested loop would go faster if it were done the other way round, aecentmain on the outside. I'm not sure of a way to force the system to try it that way, though. The other question is why is it using a nested loop at all, rather than something more intelligent like merge or hash join. Presumably the optimizer thinks those would be more expensive, but it might be wrong. You could try forcing selection of merge and hash joins for this query and see (a) what kind of plan do you get, (b) how long does it really take? To do that, start psql with PGOPTIONS environment variable set: PGOPTIONS="-fn -fh" # forbid nestloop and hash, ie, force mergejoin PGOPTIONS="-fn -fm" # forbid nestloop and merge, ie, force hashjoin Also, I don't think you ever mentioned exactly what the available indexes are on these tables? regards, tom lane
Okay, after playing around with this some more tonight, and playing with the PGOPTIONS you've presented...I've gotten the query to be faster then with mysql :) The error of my ways: not enough indices *sigh* I created a few more on the fields that were being used on the query, and have: SELECT c.id, c.name, c.url FROM aecCategory c WHERE EXISTS ( SELECT a.status FROM aecEntMain a, aecWebEntry b WHERE a.status LIKE 'active:ALL%' AND a.representation LIKE '%:ALL%' AND b.status LIKE 'active:ALL%' AND b.indid='000001' AND b.divid='100016' AND ((a.id,a.mid) = (b.id,b.mid)) AND ((b.catid,b.indid,b.divid) = (c.id,c.ppid,c.pid))); ========== Seq Scan on aeccategory c (cost=69.61 rows=1170 width=36) SubPlan -> Nested Loop (cost=4.10 rows=1 width=60) -> Index Scan using aecwebentry_divid on aecwebentry b (cost=2.03 rows=1 width=24) -> Index Scan using aecentmain_primaryon aecentmain a (cost=2.07 rows=480 width=36) =========== producing the results I need in 1.26seconds, using 1.5% of the CPU. Now, something does bother me here, and I'm not sure if its a problem we need to address, or if its expected, but if I remove the index aecwebentry_divid, it reverts to using aecwebentry_primary and increases the query time to 12secs, which is: create unique index aecWebEntry_primary on aecWebEntry ( indid,divid,catid,id,mid); Should it do that? On Wed, 22 Sep 1999, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > > Anyone get a chance to look into this? > > Only just now, but I do have a couple of thoughts. > > For the query > > SELECT distinct b.indid, b.divid, b.catid, a.id, a.mid \ > FROM aecEntMain a, aecWebEntry b \ > WHERE (a.id=b.id AND a.mid=b.mid) \ > AND (a.status like 'active%' and b.status like 'active%') > AND (a.status like '%active:ALL%' and b.status like '%active:ALL%') > AND (a.representation like '%:ALL%') > AND (b.indid=? and b.divid=? and b.catid=?)"; > > you're showing a plan of > > Unique (cost=1271.15 rows=5 width=84) > -> Sort (cost=1271.15 rows=5 width=84) > -> Nested Loop (cost=1271.15 rows=5 width=84) > -> Index Scan using aecwebentry_primary on aecwebentry b (cost=1269.08 rows=1 width=60) > -> Index Scan using aecentmain_primary on aecentmain a (cost=2.07 rows=16560 width=24) > > which indicates that the optimizer is guessing only one match in > aecwebentry and is therefore putting it on the outside of the nested > loop (so that the inner scan over aecentmain would only have to be > done once, if it's guessing right). But in a later message you > say that the actual number of hits is more like 39 for aecwebentry > and one for aecentmain. Which means that the nested loop would go > faster if it were done the other way round, aecentmain on the outside. > I'm not sure of a way to force the system to try it that way, though. > > The other question is why is it using a nested loop at all, rather > than something more intelligent like merge or hash join. Presumably > the optimizer thinks those would be more expensive, but it might be > wrong. > > You could try forcing selection of merge and hash joins for this > query and see (a) what kind of plan do you get, (b) how long does > it really take? To do that, start psql with PGOPTIONS environment > variable set: > > PGOPTIONS="-fn -fh" # forbid nestloop and hash, ie, force mergejoin > > PGOPTIONS="-fn -fm" # forbid nestloop and merge, ie, force hashjoin > > Also, I don't think you ever mentioned exactly what the available > indexes are on these tables? > > regards, tom lane > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker <scrappy@hub.org> writes: > Now, something does bother me here, and I'm not sure if its a problem we > need to address, or if its expected, but if I remove the index > aecwebentry_divid, it reverts to using aecwebentry_primary and increases > the query time to 12secs, which is: > create unique index aecWebEntry_primary on aecWebEntry ( indid,divid,catid,id,mid); > Should it do that? Yeah, that does seem odd. The other way is presumably visiting the aecwebentry tuples in a different order (the one induced by the other index), but I don't see why that should produce a 10:1 difference in runtime. Can you send me the EXPLAIN VERBOSE output for the query with and without the extra index? (Preferably the prettyprinted version from the postmaster log file, not what comes out as a NOTICE...) Also, I assume you found that merge or hash join wasn't any better? regards, tom lane