Обсуждение: Very slow queries - please help.
Hi Folks, I'm new to Postgresql. I'm having great difficulties getting the performance I had hoped for from Postgresql 8.0. The typical query below takes ~20 minutes !! I hope an expert out there will tell me what I'm doing wrong - I hope *I* am doing something wrong. Hardware -------- Single processor, Intel Xeon 3.06 GHz machine running Red Hat Ent. 4. with 1.5 GB of RAM. The machine is dedicated to running Postgresql 8.0 and Apache/mod_perl etc. The database is being accessed for report generation via a web form. The web server talks to Pg over TCP/IP (I know, that I don't need to do this if they are all on the same machine, but I have good reasons for this and don't suspect that this is where my problems are - I have the same poor performance when running from psql on the server.) Database -------- Very simple, not fully normalized set of two tables. The first table, very small (2000 lines of 4 cols with very few chars and integers in in col). The other quite a bit larger (500000 lines with 15 cols. with the largest fields ~ 256 chars) Typical query ------------ SELECT n.name FROM node n WHERE n.name LIKE '56x%' AND n.type='H' AND n.usage='TEST' AND n.node_id NOT IN (select n.node_id FROM job_log j INNER JOIN node n ON j.node_id = n.node_id WHERE n.name LIKE '56x%' AND n.type='H' AND n.usage='TEST' AND j.job_name = 'COPY FILES' AND j.job_start >= '2005-11-14 00:00:00' AND (j.job_stop <= '2005-11-22 09:31:10' OR j.job_stop IS NULL)) ORDER BY n.name The node table is the small table and the job_log table is the large table. I've tried all the basic things that I found in the documentation like VACUUM ANALYZE, EXPLAIN etc., but I suspect there is something terribly wrong with what I'm doing and these measures will not shave off 19 min and 50 seconds off the query time. Any help and comments would be very much appreciated. Bealach
> Typical query > ------------ > > SELECT n.name > FROM node n > WHERE n.name > LIKE '56x%' > AND n.type='H' > AND n.usage='TEST' > AND n.node_id > NOT IN > (select n.node_id > FROM job_log j > INNER JOIN node n > ON j.node_id = n.node_id > WHERE n.name > LIKE '56x%' > AND n.type='H' > AND n.usage='TEST' > AND j.job_name = 'COPY FILES' > AND j.job_start >= '2005-11-14 00:00:00' > AND (j.job_stop <= '2005-11-22 09:31:10' OR j.job_stop IS NULL)) > ORDER BY n.name Do you have any indexes? regards Claus
Hi,
Thanks for your comments. I've explicitly made any indexes, but the
default ones are:
user@10.0.0.2.dbdev=> \di
List of relations
Schema | Name | Type | Owner | Table
---------+-----------------+-------+---------+---------
user | job_log_id_pkey | index | user | job_log
user | node_id_pkey | index | user | node
user | node_name_key | index | user | node
(3 rows)
I'm also sending the EXPLAIN outputs.
explain SELECT n.name,n.type,
n.usage, j.status,
j.job_start,j.job_stop,
j.nfiles_in_job,j.job_name
FROM job_log j
INNER JOIN node n
ON j.node_id = n.node_id
WHERE n.name
LIKE '56x%'
AND n.type = 'K'
AND n.usage = 'LIVE'
AND j.job_name = 'COPY FILES'
AND j.job_start >= '2005-11-14 00:00:00'
AND (j.job_stop <= '2005-11-14 05:00:00' OR j.job_stop IS NULL)
ORDER BY n.name;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..75753.31 rows=1 width=461)
Join Filter: ("inner".node_id = "outer".node_id)
-> Index Scan using node_name_key on node n (cost=0.00..307.75 rows=1
width=181)
Filter: ((name ~~ '56x%'::text) AND ("type" = 'K'::bpchar) AND
("usage" = 'LIVE'::bpchar))
-> Seq Scan on job_log j (cost=0.00..75445.54 rows=1 width=288)
Filter: ((job_name = 'COPY FILES'::bpchar) AND (job_start >=
'2005-11-14 00:00:00'::timestamp without time zone) AND ((job_stop <=
'2005-11-14 05:00:00'::timestamp without time zone) OR (job_stop IS NULL)))
(6 rows)
explain SELECT n.name, n.type, n.usage
FROM node n
WHERE n.name
LIKE '56x%'
AND n.type = 'K'
AND n.usage = 'LIVE'
AND n.node_id
NOT IN
(SELECT n.node_id
FROM job_log j
INNER JOIN node n
ON j.node_id = n.node_id
WHERE n.name
LIKE '56x%'
AND n.type = 'K'
AND n.usage = 'LIVE'
AND j.job_name = 'COPY FILES'
AND j.job_start >= '2005-11-14 00:00:00'
AND (j.job_stop <= '2005-11-14 05:00:00' OR j.job_stop IS NULL))
ORDER BY n.name;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using node_name_key on node n (cost=75451.55..75764.94 rows=1
width=177)
Filter: ((name ~~ '56x%'::text) AND ("type" = 'K'::bpchar) AND ("usage" =
'LIVE'::bpchar) AND (NOT (hashed subplan)))
SubPlan
-> Nested Loop (cost=0.00..75451.54 rows=1 width=4)
-> Seq Scan on job_log j (cost=0.00..75445.54 rows=1 width=4)
Filter: ((job_name = 'COPY FILES'::bpchar) AND (job_start
>= '2005-11-14 00:00:00'::timestamp without time zone) AND ((job_stop <=
'2005-11-14 05:00:00'::timestamp without time zone) OR (job_stop IS NULL)))
-> Index Scan using node_id_pkey on node n (cost=0.00..5.99
rows=1 width=4)
Index Cond: ("outer".node_id = n.node_id)
Filter: ((name ~~ '56x%'::text) AND ("type" = 'K'::bpchar)
AND ("usage" = 'LIVE'::bpchar))
Yours,
Bealach
>From: Claus Guttesen <kometen@gmail.com>
>To: Bealach-na Bo <bealach_na_bo@hotmail.com>
>CC: pgsql-performance@postgresql.org
>Subject: Re: [PERFORM] Very slow queries - please help.
>Date: Thu, 24 Nov 2005 14:23:38 +0100
>
> > Typical query
> > ------------
> >
> > SELECT n.name
> > FROM node n
> > WHERE n.name
> > LIKE '56x%'
> > AND n.type='H'
> > AND n.usage='TEST'
> > AND n.node_id
> > NOT IN
> > (select n.node_id
> > FROM job_log j
> > INNER JOIN node n
> > ON j.node_id = n.node_id
> > WHERE n.name
> > LIKE '56x%'
> > AND n.type='H'
> > AND n.usage='TEST'
> > AND j.job_name = 'COPY FILES'
> > AND j.job_start >= '2005-11-14 00:00:00'
> > AND (j.job_stop <= '2005-11-22 09:31:10' OR j.job_stop IS NULL))
> > ORDER BY n.name
>
>Do you have any indexes?
>
>regards
>Claus
Hi, > I'm also sending the EXPLAIN outputs. Please provide EXPLAIN ANALYZE outputs instead of EXPLAIN. You will have more information. Indexes on your tables are obviously missing. You should try to add: CREATE INDEX idx_node_filter ON node(name, type, usage); CREATE INDEX idx_job_log_filter ON job_log(job_name, job_start, job_stop); I'm not so sure it's a good idea to add job_stop in this index as you have an IS NULL in your query so I'm not sure it can be used. You should try it anyway and remove it if not needed. I added all your search fields in the indexes but it depends a lot on the selectivity of your conditions. I don't know your data but I think you understand the idea. HTH -- Guillaume
"Bealach-na Bo" <bealach_na_bo@hotmail.com> writes:
> I'm having great difficulties getting the performance I had hoped for
> from Postgresql 8.0. The typical query below takes ~20 minutes !!
You need to show us the table definition (including indexes) and the
EXPLAIN ANALYZE results for the query.
It seems likely that the NOT IN is the source of your problems,
but it's hard to be sure without EXPLAIN results.
regards, tom lane