Обсуждение: Performance woes relating to DISTINCT (I think)
Hello.
I'm not sure if this is a question suited for here, the -sql list, or
the -performance list, so if I'm mis-posting, please direct me to the
right list.
I was unable to come up with anything relevant from the archives
(though...I am not too sure where to start so I may just have been
looking for the wrong things).
I am trying to convert a bunch of code from MySQL to Postgresql. In
MySQL I was running this particular query with more complexity (with
joins and such) and it takes far less than a second. The Pg
stripped-down version takes over 45 seconds (it was taking over 80
seconds with the joins).
The table-in-question (tasks_applied) contains 12 columns:
cid, modcode, yearcode, seid, tid, actid, pkgid, optional, corrected,
labor, lmid, parts_price
I have the following indexes:
CREATE INDEX actid ON tasks_applied USING btree (actid)
CREATE INDEX pkgid ON tasks_applied USING btree (pkgid)
CREATE INDEX strafe_group ON tasks_applied USING btree (modcode,
yearcode, seid, tid) WHERE cid = 0
CREATE UNIQUE INDEX tasks_applied_pkey ON tasks_applied USING btree
(cid, modcode, yearcode, seid, tid, actid, optional)
Here is my Pg query:
-----------------
SELECT DISTINCT
modcode,
yearcode,
seid,
COUNT(DISTINCT(tid)) AS task_count
FROM
tasks_applied
WHERE
cid=0 AND
seid=100001
GROUP BY
modcode,
yearcode,
seid
HAVING
COUNT(tid)>=0
ORDER BY
modcode ASC,
yearcode ASC,
seid ASC
---------------
Here's my EXPLAIN ANAYZE output:
-----------------
QUERY PLAN
Unique (cost=85168.84..85168.98 rows=11 width=22) (actual
time=45602.908..45607.399 rows=515 loops=1)
-> Sort (cost=85168.84..85168.87 rows=11 width=22) (actual
time=45602.897..45604.286 rows=515 loops=1)
Sort Key: modcode, yearcode, seid, count(DISTINCT tid)
-> GroupAggregate (cost=0.00..85168.65 rows=11 width=22)
(actual time=3149.916..45578.292 rows=515 loops=1)
Filter: (count(tid) >= 0)
-> Index Scan using strafe_group on tasks_applied
(cost=0.00..85167.23 rows=107 width=22) (actual
time=3144.908..45366.147 rows=29893 loops=1)
Filter: ((cid = 0) AND (seid = 100001))
Total runtime: 45609.207 ms
-------------
Finally, here's my MySQL query:
SELECT
tasks_applied.modcode AS modcode,
vin_models.shortname AS shortname,
vin_years.year AS year,
vin_years.yearcode AS yearcode,
service_events.details AS se,
service_events.intl_details AS i_se,
service_events.seid AS seid,
COUNT(DISTINCT(tid)) AS task_count
FROM
tasks_applied,
service_events,
vin_models,
vin_years
WHERE
cid=0
AND tasks_applied.yearcode=vin_years.yearcode
AND tasks_applied.modcode=vin_models.modcode
AND tasks_applied.seid=service_events.seid
AND tasks_applied.seid=100001
GROUP BY
se, modcode, year
HAVING
COUNT(tid)>=0
ORDER BY
tasks_applied.modcode ASC,
vin_years.year ASC,
service_events.seid ASC
---------
Any help would be greatly appreciated (even if it's just "RTFM on xxx").
Thanks
--jeff
On 9/26/05, boinger <boinger@gmail.com> wrote:
Hmm, planner expected 11 rows, got 515
planner expected 107 rows, got 29893...
I guess the problem here is that planner has wrong idea how your
data looks. Try doing two things:
VACUUM ANALYZE;
(of tables in question or whole database)
If that doesn't help, do increase the statistics target. By default PostgreSQL
keeps 10 samples, but you might want to increase it to 50 or even 100.
And then rerun VACUUM ANALYZE.
If it doesn't help -- please repost the new query plan once again.
Regards,
Dawid
Hello.
I'm not sure if this is a question suited for here, the -sql list, or
the -performance list, so if I'm mis-posting, please direct me to the
right list.
I was unable to come up with anything relevant from the archives
(though...I am not too sure where to start so I may just have been
looking for the wrong things).
I am trying to convert a bunch of code from MySQL to Postgresql. In
MySQL I was running this particular query with more complexity (with
joins and such) and it takes far less than a second. The Pg
stripped-down version takes over 45 seconds (it was taking over 80
seconds with the joins).
QUERY PLAN
-> GroupAggregate (cost=0.00..85168.65 rows=11 width=22)
(actual time=3149.916..45578.292 rows=515 loops=1)
Hmm, planner expected 11 rows, got 515
(cost=0.00..85167.23 rows=107 width=22) (actual
time=3144.908..45366.147 rows=29893 loops=1)
planner expected 107 rows, got 29893...
data looks. Try doing two things:
VACUUM ANALYZE;
(of tables in question or whole database)
If that doesn't help, do increase the statistics target. By default PostgreSQL
keeps 10 samples, but you might want to increase it to 50 or even 100.
And then rerun VACUUM ANALYZE.
If it doesn't help -- please repost the new query plan once again.
Regards,
Dawid
On 9/27/05, Dawid Kuroczko <qnex42@gmail.com> wrote:
> > QUERY PLAN
> > -> GroupAggregate (cost=0.00..85168.65 rows=11
> width=22)
> > (actual time=3149.916..45578.292 rows=515 loops=1)
>
> Hmm, planner expected 11 rows, got 515
>
>
> > (cost=0.00..85167.23 rows=107 width=22) (actual
> > time=3144.908..45366.147 rows=29893 loops=1)
>
>
> planner expected 107 rows, got 29893...
> I guess the problem here is that planner has wrong idea how your
> data looks. Try doing two things:
>
> VACUUM ANALYZE;
> (of tables in question or whole database)
>
> If that doesn't help, do increase the statistics target. By default
> PostgreSQL
> keeps 10 samples, but you might want to increase it to 50 or even 100.
> And then rerun VACUUM ANALYZE.
>
> If it doesn't help -- please repost the new query plan once again.
I actually kind of inadvertently "fixed" it.
I threw my hands up and thought to myself "FINE! If it's going to take
that long, at least it can do all the joins and whatnot instead of
having to loop back and do separate queries"
So, I piled in everything I needed it to do, and now it's inexplicably
(to me) fast (!?).
I'm still running a full VACUUM ANALYZE on your recommendation...maybe
shave a few more ms off.
Here's what I have, now (pre-vacuum):
SQL:
SELECT
tasks_applied.modcode AS modcode,
tasks_applied.seid AS seid,
tasks_applied.yearcode AS yearcode,
vin_years.year AS year,
COUNT(DISTINCT(tid)) AS task_count
FROM
"SS_valid_modelyears",
tasks_applied,
vin_years
WHERE
cid=0
AND tasks_applied.seid='500001'
AND "SS_valid_modelyears".modcode=tasks_applied.modcode
AND "SS_valid_modelyears".year=vin_years.year
AND tasks_applied.yearcode=vin_years.yearcode
AND "SS_valid_modelyears".valid=1
GROUP BY
tasks_applied.seid,
vin_years.year,
tasks_applied.modcode,
"SS_valid_modelyears".shortname,
tasks_applied.yearcode
ORDER BY
tasks_applied.seid ASC,
vin_years.year ASC
QUERY PLAN:
GroupAggregate (cost=201.39..201.42 rows=1 width=69) (actual
time=80.383..80.386 rows=1 loops=1)
-> Sort (cost=201.39..201.40 rows=1 width=69) (actual
time=79.737..79.898 rows=59 loops=1)
Sort Key: tasks_applied.seid, vin_years."year",
tasks_applied.modcode, "SS_valid_modelyears".shortname,
tasks_applied.yearcode
-> Nested Loop (cost=1.38..201.38 rows=1 width=69) (actual
time=72.599..78.765 rows=59 loops=1)
-> Hash Join (cost=1.38..165.15 rows=6 width=61)
(actual time=0.530..18.881 rows=1188 loops=1)
Hash Cond: ("outer"."year" = "inner"."year")
-> Seq Scan on "SS_valid_modelyears"
(cost=0.00..163.54 rows=36 width=56) (actual time=0.183..9.202
rows=1188 loops=1)
Filter: ("valid" = 1)
-> Hash (cost=1.30..1.30 rows=30 width=9)
(actual time=0.230..0.230 rows=0 loops=1)
-> Seq Scan on vin_years (cost=0.00..1.30
rows=30 width=9) (actual time=0.019..0.116 rows=30 loops=1)
-> Index Scan using strafe_group on tasks_applied
(cost=0.00..6.02 rows=1 width=22) (actual time=0.042..0.043 rows=0
loops=1188)
Index Cond: ((("outer".modcode)::text =
(tasks_applied.modcode)::text) AND (tasks_applied.yearcode =
"outer".yearcode) AND (tasks_applied.seid = 500001))
Filter: (cid = 0)
Total runtime: 80.764 ms
On Tue, Sep 27, 2005 at 09:07:41AM -0500, boinger wrote:
> -> Index Scan using strafe_group on tasks_applied
> (cost=0.00..6.02 rows=1 width=22) (actual time=0.042..0.043 rows=0
> loops=1188)
>
> Index Cond: ((("outer".modcode)::text =
> (tasks_applied.modcode)::text) AND (tasks_applied.yearcode =
> "outer".yearcode) AND (tasks_applied.seid = 500001))
>
> Filter: (cid = 0)
>
> Total runtime: 80.764 ms
Compare that to the index scan it had to do before. Now that you gave
the database the exact info it needs to answer your real question, it
can use a much, much more selective index scan.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461