Обсуждение: CPU usage goes to 100%, query seems to ran forever
I have small database running in 8.1.3 in W2K server. The following query causes Postgres process to use 100% CPU and seems to run forever. If I change '1EEKPANT' to less frequently used item code, it runs fast. How to speed it up ? set search_path to public,firma2; select rid.toode FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) LEFT JOIN artliik ON toode.grupp=artliik.grupp and toode.liik=artliik.liik WHERE (NOT '0' or dok.kinnitatud) AND dok.kuupaev BETWEEN '2006-04-08' AND '2006-04-27' AND rid.toode='1EEKPANT' AND (NOT dok.eimuuda or '0' ) and dok.laonr='1'::float8 and POSITION( dok.doktyyp IN 'OSIDVGYKIF')!=0 AND ( ( ('1' OR (POSITION(dok.doktyyp IN 'TUNH')=0 and (rid.kogus<0 or ('1' and rid.kogus=0)))) and POSITION(dok.doktyyp IN 'VGYKITDNHMEBARCFJ' )!=0 AND CASE WHEN NOT dok.objrealt OR dok.doktyyp='I' THEN dok.yksus ELSE rid.kuluobjekt END LIKE 'LADU%' ESCAPE '!' ) OR (POSITION(dok.doktyyp IN 'OSIUDP' )!=0 AND CASE WHEN dok.objrealt THEN rid.kuluobjekt ELSE dok.sihtyksus END LIKE 'LADU%' ESCAPE '!' ) ) AND dok.kuupaev||dok.kellaaeg BETWEEN '2006-04-08' AND '2006-04-2723 59' AND ('0' or ( length(trim(rid.toode))>2 AND rid.toode is NOT NULL)) AND ( LENGTH('' )=0 OR rid.partii='' OR (dok.doktyyp='I' AND rid.kulupartii='' ) ) AND (NOT dok.inventuur or rid.kogus!=0) AND dok.dokumnr!= 0 AND ( artliik.arttyyp NOT IN ('Teenus', 'Komplekt' ) OR artliik.arttyyp IS NULL) explain returns: "Nested Loop Left Join (cost=0.00..1828.18 rows=1 width=24)" " Filter: ((("inner".arttyyp <> 'Teenus'::bpchar) AND ("inner".arttyyp <> 'Komplekt'::bpchar)) OR ("inner".arttyyp IS NULL))" " -> Nested Loop (cost=0.00..1822.51 rows=1 width=43)" " -> Nested Loop (cost=0.00..1816.56 rows=1 width=24)" " Join Filter: (("outer".dokumnr = "inner".dokumnr) AND ((("position"('VGYKITDNHMEBARCFJ'::text, ("outer".doktyyp)::text) <> 0) AND (CASE WHEN ((NOT ("outer".objrealt)::boolean) OR ("outer".doktyyp = 'I'::bpchar)) THEN "outer".yksus ELSE "inner (..)" " -> Seq Scan on dok (cost=0.00..787.80 rows=1 width=39)" " Filter: ((kuupaev >= '2006-04-08'::date) AND (kuupaev <= '2006-04-27'::date) AND (NOT (eimuuda)::boolean) AND ((laonr)::double precision = 1::double precision) AND ("position"('OSIDVGYKIF'::text, (doktyyp)::text) <> 0) AND (((kuupaev):: (..)" " -> Seq Scan on rid (cost=0.00..1019.42 rows=249 width=51)" " Filter: ((toode = '1EEKPANT'::bpchar) AND (length(btrim((toode)::text)) > 2) AND (toode IS NOT NULL))" " -> Index Scan using toode_pkey on toode (cost=0.00..5.94 rows=1 width=43)" " Index Cond: ('1EEKPANT'::bpchar = toode)" " -> Index Scan using artliik_pkey on artliik (cost=0.00..5.65 rows=1 width=88)" " Index Cond: (("outer".grupp = artliik.grupp) AND ("outer".liik = artliik.liik))" Andrus.
"Andrus" <eetasoft@online.ee> writes: > I have small database running in 8.1.3 in W2K server. > The following query causes Postgres process to use 100% CPU and seems to run > forever. > If I change '1EEKPANT' to less frequently used item code, it runs fast. You have ANALYZEd all these tables recently, I hope? The planner certainly doesn't think this query will take very long. To find out what's wrong, you're going to have to be patient enough to let an EXPLAIN ANALYZE run to completion. Plain EXPLAIN won't tell. regards, tom lane
> You have ANALYZEd all these tables recently, I hope? The planner > certainly doesn't think this query will take very long. I have autovacuum running so I expect it takes care of ANALYZE, isn't it ? I ran also analyze command before running explain analyze. > To find out what's wrong, you're going to have to be patient enough to > let an EXPLAIN ANALYZE run to completion. Plain EXPLAIN won't tell. Here it is running in my local computer. I'm expecting run time no more 1 second "Nested Loop Left Join (cost=0.00..1829.95 rows=1 width=24) (actual time=492064.990..492064.990 rows=0 loops=1)" " Filter: ((("inner".arttyyp <> 'Teenus'::bpchar) AND ("inner".arttyyp <> 'Komplekt'::bpchar)) OR ("inner".arttyyp IS NULL))" " -> Nested Loop (cost=0.00..1825.01 rows=1 width=43) (actual time=492064.983..492064.983 rows=0 loops=1)" " -> Nested Loop (cost=0.00..1819.04 rows=1 width=24) (actual time=492064.978..492064.978 rows=0 loops=1)" " Join Filter: (("outer".dokumnr = "inner".dokumnr) AND ((("position"('VGYKITDNHMEBARCFJ'::text, ("outer".doktyyp)::text) <> 0) AND (CASE WHEN ((NOT ("outer".objrealt)::boolean) OR ("outer".doktyyp = 'I'::bpchar)) THEN "outer".yksus ELSE "inner (..)" " -> Seq Scan on dok (cost=0.00..787.80 rows=1 width=39) (actual time=0.152..878.198 rows=7670 loops=1)" " Filter: ((kuupaev >= '2006-04-08'::date) AND (kuupaev <= '2006-04-27'::date) AND (NOT (eimuuda)::boolean) AND ((laonr)::double precision = 1::double precision) AND ("position"('OSIDVGYKIF'::text, (doktyyp)::text) <> 0) AND (((kuupaev):: (..)" " -> Seq Scan on rid (cost=0.00..1019.42 rows=315 width=51) (actual time=22.003..62.216 rows=839 loops=7670)" " Filter: ((toode = '1EEKPANT'::bpchar) AND (length(btrim((toode)::text)) > 2) AND (toode IS NOT NULL))" " -> Index Scan using toode_pkey on toode (cost=0.00..5.96 rows=1 width=43) (never executed)" " Index Cond: ('1EEKPANT'::bpchar = toode)" " -> Index Scan using artliik_pkey on artliik (cost=0.00..4.92 rows=1 width=31) (never executed)" " Index Cond: (("outer".grupp = artliik.grupp) AND ("outer".liik = artliik.liik))" "Total runtime: 492065.840 ms" Andrus.
"Andrus" <eetasoft@online.ee> writes: > Here it is running in my local computer. I'm expecting run time no more 1 > second Something seems to have truncated your EXPLAIN output, but anyway we can see where the problem is: > " -> Seq Scan on dok (cost=0.00..787.80 rows=1 width=39) > (actual time=0.152..878.198 rows=7670 loops=1)" > " Filter: ((kuupaev >= '2006-04-08'::date) AND (kuupaev > <= '2006-04-27'::date) AND (NOT (eimuuda)::boolean) AND ((laonr)::double > precision = 1::double precision) AND ("position"('OSIDVGYKIF'::text, > (doktyyp)::text) <> 0) AND (((kuupaev):: (..)" The planner is expecting to get one row from "dok" passing the filter condition, and hence chooses a plan that is suitable for a small number of rows ... but in reality there are 7670 rows matching the filter condition, and that's what blows the runtime out of the water. (Most of the runtime is actually going into 7670 repeated scans of "rid", which wouldn't have happened with another plan type.) So you need to see about getting that estimate to be more accurate. First thing is to make sure that "dok" has been ANALYZEd --- just do it by hand. If that doesn't change the EXPLAIN plan then you need to work harder. I can see at least three things you are doing that are unnecessarily destroying the planner's ability to estimate the number of matching rows: dok.laonr='1'::float8 and Since laonr apparently isn't float8, this forces a runtime type conversion as well as interfering with statistics use. (The planner will have ANALYZE stats about dok.laonr, but the connection to dok.laonr::float8 escapes it.) Just write the constant with quotes and no type coercion. POSITION( dok.doktyyp IN 'OSIDVGYKIF')!=0 AND This is completely unestimatable given the available statistics, and it doesn't look to me like it is all that great a semantic representation either. Perhaps the query that's really meant here is "dok.doktypp IN ('O','S','I', ...)"? If so, you should say what you mean, not play games with converting the query into some strange string operation. AND dok.kuupaev||dok.kellaaeg BETWEEN '2006-04-08' AND '2006-04-2723 59' This is another case where the planner is not going to have any ability to make a useful estimate, and it's because you are using a crummy representation of your data. You should merge those two columns into one timestamp column and just do a simple BETWEEN test. By and large, unnatural representations of data that you use in WHERE clauses are going to cost you big-time in SQL queries. It's worth taking time up front to design a clean table schema, and taking time to revise it when requirements change. regards, tom lane
> Something seems to have truncated your EXPLAIN output, but anyway we > can see where the problem is: I copied it from pgAdmin in 640x480 screen resolution in XP Maybe pgAdmin bug ? > The planner is expecting to get one row from "dok" passing the filter > condition, and hence chooses a plan that is suitable for a small number > of rows ... but in reality there are 7670 rows matching the filter > condition, and that's what blows the runtime out of the water. (Most of > the runtime is actually going into 7670 repeated scans of "rid", which > wouldn't have happened with another plan type.) I added index CREATE INDEX rid_toode_idx ON firma2.rid USING btree (toode); and query start working fast ! > So you need to see about getting that estimate to be more accurate. > First thing is to make sure that "dok" has been ANALYZEd --- just do it > by hand. As I wrote I have autovacuum running. Is'nt this sufficient ? > I can see at least three things you are doing that are > unnecessarily destroying the planner's ability to estimate the number of > matching rows: > > dok.laonr='1'::float8 and > Since laonr apparently isn't float8, this forces a runtime type > conversion as well as interfering with statistics use. (The planner > will have ANALYZE stats about dok.laonr, but the connection to > dok.laonr::float8 escapes it.) Just write the constant with quotes > and no type coercion. I re-wrote it as dok.laonr=1 this query is automatically generated by VFP and ODBC parameter substitution which adds those type conversions. VFP has only float8 type and it probably forces ODBC driver convert numbers to float8 > POSITION( dok.doktyyp IN 'OSIDVGYKIF')!=0 AND > > This is completely unestimatable given the available statistics, and it > doesn't look to me like it is all that great a semantic representation > either. Perhaps the query that's really meant here is "dok.doktypp IN > ('O','S','I', ...)"? If so, you should say what you mean, not play > games with converting the query into some strange string operation. 'OSID ...' is a string parameter substituted to SELECT template. changing this to IN ( 'O', 'S', .. requires re-writing parts of code and I'm not sure it makes code faster. > AND dok.kuupaev||dok.kellaaeg BETWEEN '2006-04-08' AND '2006-04-2723 59' > This is another case where the planner is not going to have any ability > to make a useful estimate, and it's because you are using a crummy > representation of your data. You should merge those two columns into > one timestamp column and just do a simple BETWEEN test. > By and large, unnatural representations of data that you use in WHERE > clauses are going to cost you big-time in SQL queries. It's worth > taking time up front to design a clean table schema, and taking time > to revise it when requirements change. date range test in other part of where clause dok.kuupaev BETWEEN .... is optimizable. AND dok.kuupaev||dok.kellaaeg adds time range test to date range. There are less that some thousands documents per day. Wasting time to re-engineer database and deployed application seems not reasonable in this case. Andrus.