Обсуждение: Optimizing Query
Any suggestions welcome!<br /><br /> Here is my query:<br /><br /> select k.*, c.category from knowledge k, kb_categoriesc , kbwords w0 , kbwords w1 WHERE k.catid=c.catid AND ((k.kbid=w0.kbid and w0.wordid=42743) AND (k.kbid=w1.kbidand w1.wordid=85369)) ORDER BY k.kbid DESC LIMIT 25;<br /><br /> Now for the details<br /> knowledge k = 3,150records<br /> kbwords = 2-field database (kbid, wordid), 825,748 records<br /><br /> Each word in the knowledge baseis stored in a database called wordindex, which has 50,000 records or so. The system first explodes the query stringand pulls the word #s from this database, which is where we get 42743 and 85369 above, "ASIA" and "CHILDREN" respectively.)The idea is then to pull all the articles in the knowledge base which contain both of these words.<br /><br/> Here is the EXPLAIN for the query:<br /><br /> NOTICE: QUERY PLAN:<br /><br /> Nested Loop (cost=0.00..527690060.67rows=2878549 width=308)<br /> -> Nested Loop (cost=0.00..9472443.40 rows=52582 width=304)<br/> -> Nested Loop (cost=0.00..6278.63 rows=960 width=300)<br /> -> Index ScanBackward using knowledge_kbid_key on knowledge k (cost=0.00..1292.51 rows=2825 width=284)<br /> -> Seq Scan on kb_categories c (cost=0.00..1.34 rows=34 width=16)<br /> -> Seq Scan on kbwords w0 (cost=0.00..9787.02rows=5474 width=4)<br /> -> Seq Scan on kbwords w1 (cost=0.00..9787.02 rows=5474 width=4)<br /><br/> This takes quite a while to return results... prohibitively long. There are indexes on k.catid, c.catid, k.kbid,w0.kbid, w0.wordid. Any suggestions for further optimization would be very welcome. We get about 3,000 searches onour database daily...<br /><br /> Blessings,<br /> Justin Long<br /><br /><br /><p><font face="Courier New, Courier">____________________________________________________________________<br/> Justin Long Networkfor Strategic Missions<br /> justinlong@strategicnetwork.org 1732 South ParkCourt<br /><a eudora="autourl" href="http://www.strategicnetwork.org/">http://www.strategicnetwork.org</a> Chesapeake,VA 23320, USA<br /> Reality Checke-zine: reality-check-subscribe@yahoogroups.com<br /> ____________________________________________________________________<br/> Law: Never retreat. Never surrender. Never cuta deal with a dragon.<br /> Corollary: No armor? Unclean life? Then do not mess in the affairs <br /> of dragons, foryou are crunchy and taste good with ketchup.<br /><br /><br /> ____________________________________________________________________<br/> Justin Long Networkfor Strategic Missions<br /> justinlong@strategicnetwork.org 1732 South ParkCourt<br /><a eudora="autourl" href="http://www.strategicnetwork.org/">http://www.strategicnetwork.org</a> Chesapeake,VA 23320, USA<br /> Reality Checke-zine: reality-check-subscribe@yahoogroups.com<br /> ____________________________________________________________________<br/> Law: Never retreat. Never surrender. Never cuta deal with a dragon.<br /> Corollary: No armor? Unclean life? Then do not mess in the affairs <br /> of dragons, foryou are crunchy and taste good with ketchup.<br /></font>
Have you tried VACUUM ANALYZE and CLUSTER? > Any suggestions welcome! > > Here is my query: > > select k.*, c.category from knowledge k, kb_categories c , kbwords w0 , > kbwords w1 WHERE k.catid=c.catid AND ((k.kbid=w0.kbid and w0.wordid=42743) > AND (k.kbid=w1.kbid and w1.wordid=85369)) ORDER BY k.kbid DESC LIMIT 25; > > Now for the details > knowledge k = 3,150 records > kbwords = 2-field database (kbid, wordid), 825,748 records > > Each word in the knowledge base is stored in a database called wordindex, > which has 50,000 records or so. The system first explodes the query string > and pulls the word #s from this database, which is where we get 42743 and > 85369 above, "ASIA" and "CHILDREN" respectively.) The idea is then to pull > all the articles in the knowledge base which contain both of these words. > > Here is the EXPLAIN for the query: > > NOTICE: QUERY PLAN: > > Nested Loop (cost=0.00..527690060.67 rows=2878549 width=308) > -> Nested Loop (cost=0.00..9472443.40 rows=52582 width=304) > -> Nested Loop (cost=0.00..6278.63 rows=960 width=300) > -> Index Scan Backward using knowledge_kbid_key on > knowledge k (cost=0.00..1292.51 rows=2825 width=284) > -> Seq Scan on kb_categories c (cost=0.00..1.34 rows=34 > width=16) > -> Seq Scan on kbwords w0 (cost=0.00..9787.02 rows=5474 width=4) > -> Seq Scan on kbwords w1 (cost=0.00..9787.02 rows=5474 width=4) > > This takes quite a while to return results... prohibitively long. There are > indexes on k.catid, c.catid, k.kbid, w0.kbid, w0.wordid. Any suggestions > for further optimization would be very welcome. We get about 3,000 searches > on our database daily... > > Blessings, > Justin Long > > > > ____________________________________________________________________ > Justin Long Network for Strategic Missions > justinlong@strategicnetwork.org 1732 South Park Court > http://www.strategicnetwork.org Chesapeake, VA 23320, USA > Reality Check e-zine: reality-check-subscribe@yahoogroups.com > ____________________________________________________________________ > Law: Never retreat. Never surrender. Never cut a deal with a dragon. > Corollary: No armor? Unclean life? Then do not mess in the affairs > of dragons, for you are crunchy and taste good with ketchup. > > > ____________________________________________________________________ > Justin Long Network for Strategic Missions > justinlong@strategicnetwork.org 1732 South Park Court > http://www.strategicnetwork.org Chesapeake, VA 23320, USA > Reality Check e-zine: reality-check-subscribe@yahoogroups.com > ____________________________________________________________________ > Law: Never retreat. Never surrender. Never cut a deal with a dragon. > Corollary: No armor? Unclean life? Then do not mess in the affairs > of dragons, for you are crunchy and taste good with ketchup. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Mon, Mar 05, 2001 at 04:07:57PM -0500, Bruce Momjian allegedly wrote: > Have you tried VACUUM ANALYZE and CLUSTER? I assume CLUSTER still drops all indexes except the one you're clustering on? Mathijs -- It's not that perl programmers are idiots, it's that the language rewards idiotic behavior in a way that no other language or tool has ever done. Erik Naggum
Yes. > On Mon, Mar 05, 2001 at 04:07:57PM -0500, Bruce Momjian allegedly wrote: > > Have you tried VACUUM ANALYZE and CLUSTER? > > I assume CLUSTER still drops all indexes except the one you're clustering > on? > > Mathijs > -- > It's not that perl programmers are idiots, it's that the language > rewards idiotic behavior in a way that no other language or tool has > ever done. > Erik Naggum > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Does that mean that if you have 3 indexes on a table and you cluster one, it deletes the other 2?<br /><br /> At 04:45 PM3/5/2001 -0500, you wrote:<br /><blockquote cite="cite" class="cite" type="cite">Yes.<br /><br /> > On Mon, Mar 05,2001 at 04:07:57PM -0500, Bruce Momjian allegedly wrote:<br /> > > Have you tried VACUUM ANALYZE and CLUSTER?<br/> > <br /> > I assume CLUSTER still drops all indexes except the one you're clustering<br /> > on?<br/> > <br /> > Mathijs<br /> > -- <br /> > It's not that perl programmers are idiots, it's that the language<br/> > rewards idiotic behavior in a way that no other language or tool has<br /> > ever done.<br /> > Erik Naggum<br /> > <br /><br /><br /> -- <br /> Bruce Momjian | <a eudora="autourl" href="http://candle.pha.pa.us/">http://candle.pha.pa.us</a><br /> pgman@candle.pha.pa.us | (610) 853-3000<br /> + If your life is a hard drive, | 830 Blythe Avenue<br/> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026</blockquote><p><font face="CourierNew, Courier"><br /> ____________________________________________________________________<br /> Justin Long Networkfor Strategic Missions<br /> justinlong@strategicnetwork.org 1732 South ParkCourt<br /><a eudora="autourl" href="http://www.strategicnetwork.org/">http://www.strategicnetwork.org</a> Chesapeake,VA 23320, USA<br /> Reality Checke-zine: reality-check-subscribe@yahoogroups.com<br /> ____________________________________________________________________<br/> Law: Never retreat. Never surrender. Never cuta deal with a dragon.<br /> Corollary: No armor? Unclean life? Then do not mess in the affairs <br /> of dragons, foryou are crunchy and taste good with ketchup.<br /></font>
Yes, it drops indexes, much to my chagrin, as I just realized ... including SERIALs...<br /><br /> Justin<br /><br /> At04:45 PM 3/5/2001 -0500, you wrote:<br /><blockquote cite="cite" class="cite" type="cite">Yes.<br /><br /> > On Mon,Mar 05, 2001 at 04:07:57PM -0500, Bruce Momjian allegedly wrote:<br /> > > Have you tried VACUUM ANALYZE and CLUSTER?<br/> > <br /> > I assume CLUSTER still drops all indexes except the one you're clustering<br /> > on?<br/> > <br /> > Mathijs<br /> > -- <br /> > It's not that perl programmers are idiots, it's that the language<br/> > rewards idiotic behavior in a way that no other language or tool has<br /> > ever done.<br /> > Erik Naggum<br /> > <br /><br /><br /> -- <br /> Bruce Momjian | <a eudora="autourl" href="http://candle.pha.pa.us/">http://candle.pha.pa.us</a><br /> pgman@candle.pha.pa.us | (610) 853-3000<br /> + If your life is a hard drive, | 830 Blythe Avenue<br/> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026</blockquote><p><font face="CourierNew, Courier"><br /> ____________________________________________________________________<br /> Justin Long Networkfor Strategic Missions<br /> justinlong@strategicnetwork.org 1732 South ParkCourt<br /><a eudora="autourl" href="http://www.strategicnetwork.org/">http://www.strategicnetwork.org</a> Chesapeake,VA 23320, USA<br /> Reality Checke-zine: reality-check-subscribe@yahoogroups.com<br /> ____________________________________________________________________<br/> Law: Never retreat. Never surrender. Never cuta deal with a dragon.<br /> Corollary: No armor? Unclean life? Then do not mess in the affairs <br /> of dragons, foryou are crunchy and taste good with ketchup.<br /></font>
Ok, now I have another question... it doesn't seem to be accessing the index.<br /><br /> explain select k.kbid,k.titlefrom knowledge k , kbwords w0 , kbwords w1 WHERE ((k.kbid=w0.kbid and w0.wordid=42743) AND (k.kbid=w1.kbidand w1.wordid=85369))<br /><br /> NOTICE: QUERY PLAN:<br /><br /> Merge Join (cost=32339.30..35496.97 rows=19262538width=24)<br /> -> Merge Join (cost=16530.24..16668.77 rows=233274 width=20)<br /> -> Sort (cost=15809.06..15809.06 rows=8257 width=4)<br /> -> Seq Scan on kbwords w1 (cost=0.00..15271.85rows=8257 width=4)<br /> -> Sort (cost=721.18..721.18 rows=2825 width=16)<br /> -> Seq Scan on knowledge k (cost=0.00..559.25 rows=2825 width=16)<br /> -> Sort (cost=15809.06..15809.06rows=8257 width=4)<br /> -> Seq Scan on kbwords w0 (cost=0.00..15271.85 rows=8257 width=4)<br/><br /> Note the sequential scans... there is a wordindex where w0.wordid=42743... why isn't it doing an indexscan?wouldn't that be more efficient?<br /><br /> Justin<br /><br /><br /> At 04:45 PM 3/5/2001 -0500, you wrote:<br/><blockquote cite="cite" class="cite" type="cite">Yes.<br /><br /> > On Mon, Mar 05, 2001 at 04:07:57PM -0500,Bruce Momjian allegedly wrote:<br /> > > Have you tried VACUUM ANALYZE and CLUSTER?<br /> > <br /> > Iassume CLUSTER still drops all indexes except the one you're clustering<br /> > on?<br /> > <br /> > Mathijs<br/> > -- <br /> > It's not that perl programmers are idiots, it's that the language<br /> > rewards idioticbehavior in a way that no other language or tool has<br /> > ever done.<br /> > Erik Naggum<br /> > <br /><br /><br /> -- <br /> Bruce Momjian | <a eudora="autourl" href="http://candle.pha.pa.us/">http://candle.pha.pa.us</a><br /> pgman@candle.pha.pa.us | (610) 853-3000<br /> + If your life is a hard drive, | 830 Blythe Avenue<br/> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026</blockquote><p><font face="CourierNew, Courier"><br /> ____________________________________________________________________<br /> Justin Long Networkfor Strategic Missions<br /> justinlong@strategicnetwork.org 1732 South ParkCourt<br /><a eudora="autourl" href="http://www.strategicnetwork.org/">http://www.strategicnetwork.org</a> Chesapeake,VA 23320, USA<br /> Reality Checke-zine: reality-check-subscribe@yahoogroups.com<br /> ____________________________________________________________________<br/> Law: Never retreat. Never surrender. Never cuta deal with a dragon.<br /> Corollary: No armor? Unclean life? Then do not mess in the affairs <br /> of dragons, foryou are crunchy and taste good with ketchup.<br /></font>
On Mon, Mar 05, 2001 at 04:45:47PM -0500, Bruce Momjian allegedly wrote: > Yes. > > > On Mon, Mar 05, 2001 at 04:07:57PM -0500, Bruce Momjian allegedly wrote: > > > Have you tried VACUUM ANALYZE and CLUSTER? > > > > I assume CLUSTER still drops all indexes except the one you're clustering > > on? > > > > Mathijs So running cluster on the words table (which is indexed with a seperate index on each of the two fields) would probably decrease performance by getting rid of one index without warning (the one NOT specified in the cluster command)? How useful is clustering with pgsql when in a lot of situations systems have enough memory to have the filesystem subsystem cache most or all index data? Seek times shouldn't be an issue in such a situation, since the index doesn't have to be read. Just wondering, Mathijs -- It's not that perl programmers are idiots, it's that the language rewards idiotic behavior in a way that no other language or tool has ever done. Erik Naggum
Did you run VACUUM ANALYZE after running CLUSTER? Michael Fork - CCNA - MCP - A+ Network Support - Toledo Internet Access - Toledo Ohio On Mon, 5 Mar 2001, Justin Long wrote: > Ok, now I have another question... it doesn't seem to be accessing the index. > > explain select k.kbid,k.title from knowledge k , kbwords w0 , kbwords w1 > WHERE ((k.kbid=w0.kbid and w0.wordid=42743) AND (k.kbid=w1.kbid and > w1.wordid=85369)) > > NOTICE: QUERY PLAN: > > Merge Join (cost=32339.30..35496.97 rows=19262538 width=24) > -> Merge Join (cost=16530.24..16668.77 rows=233274 width=20) > -> Sort (cost=15809.06..15809.06 rows=8257 width=4) > -> Seq Scan on kbwords w1 (cost=0.00..15271.85 rows=8257 > width=4) > -> Sort (cost=721.18..721.18 rows=2825 width=16) > -> Seq Scan on knowledge k (cost=0.00..559.25 rows=2825 > width=16) > -> Sort (cost=15809.06..15809.06 rows=8257 width=4) > -> Seq Scan on kbwords w0 (cost=0.00..15271.85 rows=8257 width=4) > > Note the sequential scans... there is a wordindex where w0.wordid=42743... > why isn't it doing an indexscan? wouldn't that be more efficient? > > Justin > > > At 04:45 PM 3/5/2001 -0500, you wrote: > >Yes. > > > > > On Mon, Mar 05, 2001 at 04:07:57PM -0500, Bruce Momjian allegedly wrote: > > > > Have you tried VACUUM ANALYZE and CLUSTER? > > > > > > I assume CLUSTER still drops all indexes except the one you're clustering > > > on? > > > > > > Mathijs > > > -- > > > It's not that perl programmers are idiots, it's that the language > > > rewards idiotic behavior in a way that no other language or tool has > > > ever done. > > > Erik Naggum > > > > > > > > >-- > > Bruce Momjian | http://candle.pha.pa.us > > pgman@candle.pha.pa.us | (610) 853-3000 > > + If your life is a hard drive, | 830 Blythe Avenue > > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > > ____________________________________________________________________ > Justin Long Network for Strategic Missions > justinlong@strategicnetwork.org 1732 South Park Court > http://www.strategicnetwork.org Chesapeake, VA 23320, USA > Reality Check e-zine: reality-check-subscribe@yahoogroups.com > ____________________________________________________________________ > Law: Never retreat. Never surrender. Never cut a deal with a dragon. > Corollary: No armor? Unclean life? Then do not mess in the affairs > of dragons, for you are crunchy and taste good with ketchup. >
On Mon, Mar 05, 2001 at 04:59:47PM -0500, Justin Long allegedly wrote: > Ok, now I have another question... it doesn't seem to be accessing the index. > > explain select k.kbid,k.title from knowledge k , kbwords w0 , kbwords w1 > WHERE ((k.kbid=w0.kbid and w0.wordid=42743) AND (k.kbid=w1.kbid and > w1.wordid=85369)) > > NOTICE: QUERY PLAN: > > Merge Join (cost=32339.30..35496.97 rows=19262538 width=24) > -> Merge Join (cost=16530.24..16668.77 rows=233274 width=20) > -> Sort (cost=15809.06..15809.06 rows=8257 width=4) > -> Seq Scan on kbwords w1 (cost=0.00..15271.85 rows=8257 > width=4) > -> Sort (cost=721.18..721.18 rows=2825 width=16) > -> Seq Scan on knowledge k (cost=0.00..559.25 rows=2825 > width=16) > -> Sort (cost=15809.06..15809.06 rows=8257 width=4) > -> Seq Scan on kbwords w0 (cost=0.00..15271.85 rows=8257 width=4) > > Note the sequential scans... there is a wordindex where w0.wordid=42743... > why isn't it doing an indexscan? wouldn't that be more efficient? > > Justin Did you run the 'vacuum analyze' command on the tables concerned (or even better, the whole database)? Without the data this analysis provides psql cannot come up with a good execution plan and falls back to full table scans. Do a 'vacuum analyze' one a week to keep performance levels up. Cheers, Mathijs -- It's not that perl programmers are idiots, it's that the language rewards idiotic behavior in a way that no other language or tool has ever done. Erik Naggum
Justin Long <justinlong@strategicnetwork.org> writes: > Ok, now I have another question... it doesn't seem to be accessing the index. > explain select k.kbid,k.title from knowledge k , kbwords w0 , kbwords w1 > WHERE ((k.kbid=w0.kbid and w0.wordid=42743) AND (k.kbid=w1.kbid and > w1.wordid=85369)) > NOTICE: QUERY PLAN: > Merge Join (cost=32339.30..35496.97 rows=19262538 width=24) > -> Merge Join (cost=16530.24..16668.77 rows=233274 width=20) > -> Sort (cost=15809.06..15809.06 rows=8257 width=4) > -> Seq Scan on kbwords w1 (cost=0.00..15271.85 rows=8257 > width=4) > -> Sort (cost=721.18..721.18 rows=2825 width=16) > -> Seq Scan on knowledge k (cost=0.00..559.25 rows=2825 > width=16) > -> Sort (cost=15809.06..15809.06 rows=8257 width=4) > -> Seq Scan on kbwords w0 (cost=0.00..15271.85 rows=8257 width=4) > Note the sequential scans... there is a wordindex where w0.wordid=42743... > why isn't it doing an indexscan? wouldn't that be more efficient? It probably thinks not, because the estimated number of hits (8257) is so high. That estimate is currently driven by the frequency of the most common value in the column (mainly because that's the only stat we have :-(). I am guessing that you have a few very common words, which are skewing the stats for kbwords and causing it not to pick an indexscan. Does your setup have a notion of "stop words" that shouldn't be indexed, like "a", "an", "the", etc? Perhaps you need to add such a feature, or throw in a few more stopwords if you already have 'em. regards, tom lane
Wow. I can't believe the difference. It didn't take too long. I'll set up a script in my etc/cron.weekly to run it... wouldthere be any benefit to doing a vacuum analyze nightly?<br /><br /> Justin Long<br /><br /> At 11:10 PM 3/5/2001 -0500,you wrote:<br /><blockquote cite="cite" class="cite" type="cite">Justin Long <justinlong@strategicnetwork.org>writes:<br /> > Ok, now I have another question... it doesn't seem to be accessingthe index.<br /><br /> > explain select k.kbid,k.title from knowledge k , kbwords w0 , kbwords w1 <br /> >WHERE ((k.kbid=w0.kbid and w0.wordid=42743) AND (k.kbid=w1.kbid and <br /> > w1.wordid=85369))<br /><br /> > NOTICE: QUERY PLAN:<br /><br /> > Merge Join (cost=32339.30..35496.97 rows=19262538 width=24)<br /> > -> MergeJoin (cost=16530.24..16668.77 rows=233274 width=20)<br /> > -> Sort (cost=15809.06..15809.06 rows=8257width=4)<br /> > -> Seq Scan on kbwords w1 (cost=0.00..15271.85 rows=8257 <br /> > width=4)<br/> > -> Sort (cost=721.18..721.18 rows=2825 width=16)<br /> > -> Seq Scanon knowledge k (cost=0.00..559.25 rows=2825 <br /> > width=16)<br /> > -> Sort (cost=15809.06..15809.06rows=8257 width=4)<br /> > -> Seq Scan on kbwords w0 (cost=0.00..15271.85 rows=8257width=4)<br /><br /> > Note the sequential scans... there is a wordindex where w0.wordid=42743... <br /> >why isn't it doing an indexscan? wouldn't that be more efficient?<br /><br /> It probably thinks not, because the estimatednumber of hits (8257) is<br /> so high. That estimate is currently driven by the frequency of the most<br /> commonvalue in the column (mainly because that's the only stat we have<br /> :-(). I am guessing that you have a few verycommon words, which are<br /> skewing the stats for kbwords and causing it not to pick an indexscan.<br /><br /> Doesyour setup have a notion of "stop words" that shouldn't be indexed,<br /> like "a", "an", "the", etc? Perhaps you needto add such a feature, or<br /> throw in a few more stopwords if you already have 'em.<br /><br /> regards,tom lane<br /><br /> ---------------------------(end of broadcast)---------------------------<br/> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org</blockquote><p><fontface="Courier New, Courier"><br /> ____________________________________________________________________<br/> Justin Long Networkfor Strategic Missions<br /> justinlong@strategicnetwork.org 1732 South ParkCourt<br /> http://www.strategicnetwork.org Chesapeake, VA 23320, USA<br /> Reality Check e-zine: reality-check-subscribe@yahoogroups.com<br/> ____________________________________________________________________<br /> Law:Never retreat. Never surrender. Never cut a deal with a dragon.<br /> Corollary: No armor? Unclean life? Then do notmess in the affairs <br /> of dragons, for you are crunchy and taste good with ketchup.<br /></font>
Justin Long <justinlong@strategicnetwork.org> writes: > Wow. I can't believe the difference. It didn't take too long. I'll set up a > script in my etc/cron.weekly to run it... would there be any benefit to > doing a vacuum analyze nightly? Depends. A nightly vacuum is probably good practice, but you could skip the analyze part if your data statistics (such as column min and max values) don't change much. regards, tom lane