Обсуждение: Slow Query
Hi, I am new to optimizing queries and i'm getting a slow running time (~1.5secs) with the following SQL: SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle", "Vacancy"."DateCreated", "Vacancy"."CustomAccess" , "Department"."Name" as "Department", list("Occupation"."Name") as "Occupation", "Vacancy"."PositionNo" , "Vacancy"."Template" from "Vacancy" LEFT JOIN "CategoryOption_TableRow" as "c_22" ON ("c_22"."RowID" = "Vacancy"."ID" and "c_22"."Category_TableID" = 22) LEFT JOIN "CategoryOption" as "Occupation" ON ("Occupation"."ID" = "c_22"."CategoryOptionID") LEFT JOIN "TableRow_TableRow" as "t_33" ON ("t_33"."Table1RowID" = "Vacancy"."ID" and "t_33"."Table_TableID" = 33 ) LEFT JOIN "Department" ON ("Department"."ID" = "t_33"."Table2RowID" and "Department"."Active" = 't' and "Department" ."ClientID" = 263) JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50 and "c_50"."RowID" = "Vacancy" ."ID" and "c_50"."CategoryOptionID"=19205) WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT DISTINCT("Vacancy"."ID") FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on ("ct126"."Category_TableID" = 126 and "RowID" = "Vacancy"."ID") left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID" and "Workflow"."Level" = 1) left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" = "Workflow"."ID" and "c30"."Category_TableID" = 30 and "c30"."CategoryOptionID" = 21923) WHERE "Template" AND "ct126"."CategoryOptionID" IN(34024,35254,35255,35256)) and "Vacancy" ."Template" = 't' GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated", "Vacancy"."CustomAccess", "Department" ."Name", "Vacancy"."PositionNo", "Vacancy"."Template" UNION SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle", "Vacancy"."DateCreated", "Vacancy"."CustomAccess" , "Department"."Name" as "Department", list("Occupation"."Name") as "Occupation", "Vacancy"."PositionNo" , "Vacancy"."Template" from "Vacancy" LEFT JOIN "CategoryOption_TableRow" as "c_22" ON ("c_22"."RowID" = "Vacancy"."ID" and "c_22"."Category_TableID" = 22) LEFT JOIN "CategoryOption" as "Occupation" ON ("Occupation"."ID" = "c_22"."CategoryOptionID") LEFT JOIN "TableRow_TableRow" as "t_33" ON ("t_33"."Table1RowID" = "Vacancy"."ID" and "t_33"."Table_TableID" = 33 ) LEFT JOIN "Department" ON ("Department"."ID" = "t_33"."Table2RowID" and "Department"."Active" = 't' and "Department" ."ClientID" = 263) JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50 and "c_50"."RowID" = "Vacancy" ."ID" and "c_50"."CategoryOptionID"=19205) WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT DISTINCT("Vacancy"."ID") FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on ("ct126"."Category_TableID" = 126 and "RowID" = "Vacancy"."ID") left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID" and "Workflow"."Level" = 1) left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" = "Workflow"."ID" and "c30"."Category_TableID" = 30 and "c30"."CategoryOptionID" = 21923) WHERE "Template" AND "ct126"."CategoryOptionID" IN(34024,35254,35255,35256)) and "Vacancy" ."Template" <> 't' AND "Vacancy"."Level" = 1 GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated", "Vacancy"."CustomAccess", "Department" ."Name", "Vacancy"."PositionNo", "Vacancy"."Template" ORDER BY "JobTitle" Running explain analyze gives me the following information: http://explain.depesz.com/s/pdC <http://explain.depesz.com/s/pdC> For a total runtime: 2877.157 ms If i remove the left joins on Department and TableRow_TableRow this reduces the run time by about a third. Additionally removing CategoryOption and CategoryOption_TableRow joins further reduces by a about a third. Given that i need both these joins for the information retrieved by them, what would be the best way to re-factor this query so it runs faster? Looking at the output of explain analyze the hash aggregates and sort seem to be the primary issue. Thanks in advance -- View this message in context: http://postgresql.nabble.com/Slow-Query-tp5861835.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On Wed, Aug 12, 2015 at 12:34 PM, robbyc <robcampbell73@gmail.com> wrote:
Hi,
I am new to optimizing queries and i'm getting a slow running time
(~1.5secs) with the following SQL:
SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
"Vacancy"."DateCreated", "Vacancy"."CustomAccess"
, "Department"."Name" as "Department", list("Occupation"."Name") as
"Occupation", "Vacancy"."PositionNo"
, "Vacancy"."Template" from
"Vacancy"
LEFT JOIN "CategoryOption_TableRow" as "c_22"
ON ("c_22"."RowID" = "Vacancy"."ID"
and "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation"
ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33"
ON ("t_33"."Table1RowID" = "Vacancy"."ID"
and "t_33"."Table_TableID" = 33 )
LEFT JOIN "Department"
ON ("Department"."ID" = "t_33"."Table2RowID" and
"Department"."Active" = 't' and "Department"
."ClientID" = 263)
JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
and "c_50"."RowID" = "Vacancy"
."ID" and "c_50"."CategoryOptionID"=19205)
WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
DISTINCT("Vacancy"."ID")
FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
("ct126"."Category_TableID" = 126
and "RowID" = "Vacancy"."ID")
left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
and "Workflow"."Level"
= 1)
left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
"Workflow"."ID" and "c30"."Category_TableID"
= 30 and "c30"."CategoryOptionID" = 21923)
WHERE "Template" AND "ct126"."CategoryOptionID"
IN(34024,35254,35255,35256)) and "Vacancy"
."Template" = 't'
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"
."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
UNION
SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
"Vacancy"."DateCreated", "Vacancy"."CustomAccess"
, "Department"."Name" as "Department", list("Occupation"."Name") as
"Occupation", "Vacancy"."PositionNo"
, "Vacancy"."Template" from
"Vacancy"
LEFT JOIN "CategoryOption_TableRow" as "c_22"
ON ("c_22"."RowID" = "Vacancy"."ID"
and "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation"
ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33"
ON ("t_33"."Table1RowID" = "Vacancy"."ID"
and "t_33"."Table_TableID" = 33 )
LEFT JOIN "Department"
ON ("Department"."ID" = "t_33"."Table2RowID" and
"Department"."Active" = 't' and "Department"
."ClientID" = 263)
JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
and "c_50"."RowID" = "Vacancy"
."ID" and "c_50"."CategoryOptionID"=19205)
WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
DISTINCT("Vacancy"."ID")
FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
("ct126"."Category_TableID" = 126
and "RowID" = "Vacancy"."ID")
left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
and "Workflow"."Level"
= 1)
left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
"Workflow"."ID" and "c30"."Category_TableID"
= 30 and "c30"."CategoryOptionID" = 21923)
WHERE "Template" AND "ct126"."CategoryOptionID"
IN(34024,35254,35255,35256)) and "Vacancy"
."Template" <> 't' AND "Vacancy"."Level" = 1
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"
."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
ORDER BY "JobTitle"
Running explain analyze gives me the following information:
http://explain.depesz.com/s/pdC <http://explain.depesz.com/s/pdC>
For a total runtime: 2877.157 ms
If i remove the left joins on Department and TableRow_TableRow this reduces
the run time by about a third.
Additionally removing CategoryOption and CategoryOption_TableRow joins
further reduces by a about a third.
Given that i need both these joins for the information retrieved by them,
what would be the best way to re-factor this query so it runs faster?
Looking at the output of explain analyze the hash aggregates and sort seem
to be the primary issue.
The query has got a distinct and group-by/order-by clauses which seems to be taking time. Without looking at much details of the query code and Table size etc, did you try increasing the work_mem and then execute the query and see if that helps ? This will reduce the on-disk IO for sorting. Also, Vacancy.JobTitle seems to be a non-index column.
Regards,
Venkata Balaji
Fujitsu Australia
Hi Venkata,
work_mem was set to 72MB, increased to 144MB, no change.On Wed, Aug 12, 2015 at 2:09 PM, Venkata Balaji N [via PostgreSQL] <[hidden email]> wrote:
On Wed, Aug 12, 2015 at 12:34 PM, robbyc <[hidden email]> wrote:Hi,
I am new to optimizing queries and i'm getting a slow running time
(~1.5secs) with the following SQL:
SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
"Vacancy"."DateCreated", "Vacancy"."CustomAccess"
, "Department"."Name" as "Department", list("Occupation"."Name") as
"Occupation", "Vacancy"."PositionNo"
, "Vacancy"."Template" from
"Vacancy"
LEFT JOIN "CategoryOption_TableRow" as "c_22"
ON ("c_22"."RowID" = "Vacancy"."ID"
and "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation"
ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33"
ON ("t_33"."Table1RowID" = "Vacancy"."ID"
and "t_33"."Table_TableID" = 33 )
LEFT JOIN "Department"
ON ("Department"."ID" = "t_33"."Table2RowID" and
"Department"."Active" = 't' and "Department"
."ClientID" = 263)
JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
and "c_50"."RowID" = "Vacancy"
."ID" and "c_50"."CategoryOptionID"=19205)
WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
DISTINCT("Vacancy"."ID")
FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
("ct126"."Category_TableID" = 126
and "RowID" = "Vacancy"."ID")
left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
and "Workflow"."Level"
= 1)
left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
"Workflow"."ID" and "c30"."Category_TableID"
= 30 and "c30"."CategoryOptionID" = 21923)
WHERE "Template" AND "ct126"."CategoryOptionID"
IN(34024,35254,35255,35256)) and "Vacancy"
."Template" = 't'
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"
."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
UNION
SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
"Vacancy"."DateCreated", "Vacancy"."CustomAccess"
, "Department"."Name" as "Department", list("Occupation"."Name") as
"Occupation", "Vacancy"."PositionNo"
, "Vacancy"."Template" from
"Vacancy"
LEFT JOIN "CategoryOption_TableRow" as "c_22"
ON ("c_22"."RowID" = "Vacancy"."ID"
and "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation"
ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33"
ON ("t_33"."Table1RowID" = "Vacancy"."ID"
and "t_33"."Table_TableID" = 33 )
LEFT JOIN "Department"
ON ("Department"."ID" = "t_33"."Table2RowID" and
"Department"."Active" = 't' and "Department"
."ClientID" = 263)
JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
and "c_50"."RowID" = "Vacancy"
."ID" and "c_50"."CategoryOptionID"=19205)
WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
DISTINCT("Vacancy"."ID")
FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
("ct126"."Category_TableID" = 126
and "RowID" = "Vacancy"."ID")
left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
and "Workflow"."Level"
= 1)
left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
"Workflow"."ID" and "c30"."Category_TableID"
= 30 and "c30"."CategoryOptionID" = 21923)
WHERE "Template" AND "ct126"."CategoryOptionID"
IN(34024,35254,35255,35256)) and "Vacancy"
."Template" <> 't' AND "Vacancy"."Level" = 1
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"
."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
ORDER BY "JobTitle"
Running explain analyze gives me the following information:
http://explain.depesz.com/s/pdC <http://explain.depesz.com/s/pdC>
For a total runtime: 2877.157 ms
If i remove the left joins on Department and TableRow_TableRow this reduces
the run time by about a third.
Additionally removing CategoryOption and CategoryOption_TableRow joins
further reduces by a about a third.
Given that i need both these joins for the information retrieved by them,
what would be the best way to re-factor this query so it runs faster?
Looking at the output of explain analyze the hash aggregates and sort seem
to be the primary issue.The query has got a distinct and group-by/order-by clauses which seems to be taking time. Without looking at much details of the query code and Table size etc, did you try increasing the work_mem and then execute the query and see if that helps ? This will reduce the on-disk IO for sorting. Also, Vacancy.JobTitle seems to be a non-index column.Regards,Venkata BalajiFujitsu AustraliaIf you reply to this email, your message will be added to the discussion below:http://postgresql.nabble.com/Slow-Query-tp5861835p5861839.html
--
View this message in context: Re: Slow Query
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On Wed, Aug 12, 2015 at 3:29 PM, robbyc <robcampbell73@gmail.com> wrote:
Hi Venkata,work_mem was set to 72MB, increased to 144MB, no change.
Increasing work_mem depends on various other factors like Table size (amount of data being sorted), available memory etc.
Added an index of type varchar_pattern_ops to Vacancy.JobTitle, this did not help either.
Sorry, I did not mean to say that an Index must be added straight away. The column must be eligible to have an Index. Meaning, Index will be beneficial if created on a column with high number of distinct values.
If either of the above does not help, then options to rewrite the query must be explored.
Thanks,
Venkata Balaji N
Fujitsu Australia
On Wed, Aug 12, 2015 at 2:09 PM, Venkata Balaji N [via PostgreSQL] <[hidden email]> wrote:On Wed, Aug 12, 2015 at 12:34 PM, robbyc <[hidden email]> wrote:Hi,
I am new to optimizing queries and i'm getting a slow running time
(~1.5secs) with the following SQL:
SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
"Vacancy"."DateCreated", "Vacancy"."CustomAccess"
, "Department"."Name" as "Department", list("Occupation"."Name") as
"Occupation", "Vacancy"."PositionNo"
, "Vacancy"."Template" from
"Vacancy"
LEFT JOIN "CategoryOption_TableRow" as "c_22"
ON ("c_22"."RowID" = "Vacancy"."ID"
and "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation"
ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33"
ON ("t_33"."Table1RowID" = "Vacancy"."ID"
and "t_33"."Table_TableID" = 33 )
LEFT JOIN "Department"
ON ("Department"."ID" = "t_33"."Table2RowID" and
"Department"."Active" = 't' and "Department"
."ClientID" = 263)
JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
and "c_50"."RowID" = "Vacancy"
."ID" and "c_50"."CategoryOptionID"=19205)
WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
DISTINCT("Vacancy"."ID")
FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
("ct126"."Category_TableID" = 126
and "RowID" = "Vacancy"."ID")
left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
and "Workflow"."Level"
= 1)
left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
"Workflow"."ID" and "c30"."Category_TableID"
= 30 and "c30"."CategoryOptionID" = 21923)
WHERE "Template" AND "ct126"."CategoryOptionID"
IN(34024,35254,35255,35256)) and "Vacancy"
."Template" = 't'
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"
."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
UNION
SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
"Vacancy"."DateCreated", "Vacancy"."CustomAccess"
, "Department"."Name" as "Department", list("Occupation"."Name") as
"Occupation", "Vacancy"."PositionNo"
, "Vacancy"."Template" from
"Vacancy"
LEFT JOIN "CategoryOption_TableRow" as "c_22"
ON ("c_22"."RowID" = "Vacancy"."ID"
and "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation"
ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33"
ON ("t_33"."Table1RowID" = "Vacancy"."ID"
and "t_33"."Table_TableID" = 33 )
LEFT JOIN "Department"
ON ("Department"."ID" = "t_33"."Table2RowID" and
"Department"."Active" = 't' and "Department"
."ClientID" = 263)
JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
and "c_50"."RowID" = "Vacancy"
."ID" and "c_50"."CategoryOptionID"=19205)
WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
DISTINCT("Vacancy"."ID")
FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
("ct126"."Category_TableID" = 126
and "RowID" = "Vacancy"."ID")
left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
and "Workflow"."Level"
= 1)
left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
"Workflow"."ID" and "c30"."Category_TableID"
= 30 and "c30"."CategoryOptionID" = 21923)
WHERE "Template" AND "ct126"."CategoryOptionID"
IN(34024,35254,35255,35256)) and "Vacancy"
."Template" <> 't' AND "Vacancy"."Level" = 1
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"
."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
ORDER BY "JobTitle"
Running explain analyze gives me the following information:
http://explain.depesz.com/s/pdC <http://explain.depesz.com/s/pdC>
For a total runtime: 2877.157 ms
If i remove the left joins on Department and TableRow_TableRow this reduces
the run time by about a third.
Additionally removing CategoryOption and CategoryOption_TableRow joins
further reduces by a about a third.
Given that i need both these joins for the information retrieved by them,
what would be the best way to re-factor this query so it runs faster?
Looking at the output of explain analyze the hash aggregates and sort seem
to be the primary issue.The query has got a distinct and group-by/order-by clauses which seems to be taking time. Without looking at much details of the query code and Table size etc, did you try increasing the work_mem and then execute the query and see if that helps ? This will reduce the on-disk IO for sorting. Also, Vacancy.JobTitle seems to be a non-index column.Regards,Venkata BalajiFujitsu AustraliaIf you reply to this email, your message will be added to the discussion below:http://postgresql.nabble.com/Slow-Query-tp5861835p5861839.html
--
View this message in context: Re: Slow Query
On 08/12/2015 04:34 AM, robbyc wrote: > Hi, > > I am new to optimizing queries and i'm getting a slow running time > (~1.5secs) with the following SQL: Before mucking about with work_mem and indexes, the first thing to do is rewrite this query correctly. Here are just some of the things wrong with the query as written: * You're doing a DISTINCT on the same set of columns also in a GROUP BY. This is redundant and causes needless deduplication. * You're joining two GROUPed BY then DISTINCTed queries using the UNION operator which will do yet another pass for deduplication. * You've got the entire query repeated for just a simple difference in the global WHERE clause. These can be merged. * You've kept LEFT JOINs in the subquery but you don't use any values from them. These can be safely removed altogether. * You're using a NOT IN clause which is almost never what you want. Use NOT EXISTS instead. What is this list() function? How is it defined? Can it be replaced with string_agg()? You're not doing yourself any favors at all with all this quoting and mixed case stuff. Here is a rewritten version, please let me know how it performs: SELECT "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated", "Vacancy"."CustomAccess", "Department"."Name" as "Department", list("Occupation"."Name") as "Occupation", "Vacancy"."PositionNo", "Vacancy"."Template" FROM "Vacancy" JOIN "CategoryOption_TableRow" as "c_50" ON ( "c_50"."Category_TableID"= 50 AND "c_50"."RowID" = "Vacancy"."ID" AND "c_50"."CategoryOptionID"=19205) LEFT JOIN "CategoryOption_TableRow" as "c_22" ON ( "c_22"."RowID" = "Vacancy"."ID" AND "c_22"."Category_TableID" = 22) LEFT JOIN "CategoryOption" as "Occupation" ON ( "Occupation"."ID" = "c_22"."CategoryOptionID") LEFT JOIN "TableRow_TableRow" as "t_33" ON ( "t_33"."Table1RowID" = "Vacancy"."ID" AND "t_33"."Table_TableID" = 33) LEFT JOIN "Department" ON ( "Department"."ID" = "t_33"."Table2RowID" AND "Department"."Active" = 't' AND "Department"."ClientID" = 263) WHERE "Vacancy"."ClientID" = 263 AND NOT EXISTS ( SELECT 1 FROM "Vacancy" as _Vacancy JOIN "CategoryOption_TableRow" "ct126" on ( "ct126"."Category_TableID" = 126 AND "RowID" = _Vacancy."ID") WHERE _Vacancy."Template" AND "ct126"."CategoryOptionID" IN (34024,35254,35255,35256) AND _Vacancy."ID" = "Vacancy"."ID") AND ("Vacancy"."Template" = 't' OR "Vacancy"."Level" = 1) GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated", "Vacancy"."CustomAccess", "Department"."Name", "Vacancy"."PositionNo", "Vacancy"."Template" > SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle", > "Vacancy"."DateCreated", "Vacancy"."CustomAccess" > , "Department"."Name" as "Department", list("Occupation"."Name") as > "Occupation", "Vacancy"."PositionNo" > , "Vacancy"."Template" from > "Vacancy" > LEFT JOIN "CategoryOption_TableRow" as "c_22" > ON ("c_22"."RowID" = "Vacancy"."ID" > and "c_22"."Category_TableID" = 22) > LEFT JOIN "CategoryOption" as "Occupation" > ON ("Occupation"."ID" = "c_22"."CategoryOptionID") > LEFT JOIN "TableRow_TableRow" as "t_33" > ON ("t_33"."Table1RowID" = "Vacancy"."ID" > and "t_33"."Table_TableID" = 33 ) > LEFT JOIN "Department" > ON ("Department"."ID" = "t_33"."Table2RowID" and > "Department"."Active" = 't' and "Department" > ."ClientID" = 263) > JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50 > and "c_50"."RowID" = "Vacancy" > ."ID" and "c_50"."CategoryOptionID"=19205) > WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT > DISTINCT("Vacancy"."ID") > FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on > ("ct126"."Category_TableID" = 126 > and "RowID" = "Vacancy"."ID") > left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID" > and "Workflow"."Level" > = 1) > left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" = > "Workflow"."ID" and "c30"."Category_TableID" > = 30 and "c30"."CategoryOptionID" = 21923) > WHERE "Template" AND "ct126"."CategoryOptionID" > IN(34024,35254,35255,35256)) and "Vacancy" > ."Template" = 't' > GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated", > "Vacancy"."CustomAccess", "Department" > ."Name", "Vacancy"."PositionNo", "Vacancy"."Template" > UNION > SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle", > "Vacancy"."DateCreated", "Vacancy"."CustomAccess" > , "Department"."Name" as "Department", list("Occupation"."Name") as > "Occupation", "Vacancy"."PositionNo" > , "Vacancy"."Template" from > "Vacancy" > LEFT JOIN "CategoryOption_TableRow" as "c_22" > ON ("c_22"."RowID" = "Vacancy"."ID" > and "c_22"."Category_TableID" = 22) > LEFT JOIN "CategoryOption" as "Occupation" > ON ("Occupation"."ID" = "c_22"."CategoryOptionID") > LEFT JOIN "TableRow_TableRow" as "t_33" > ON ("t_33"."Table1RowID" = "Vacancy"."ID" > and "t_33"."Table_TableID" = 33 ) > LEFT JOIN "Department" > ON ("Department"."ID" = "t_33"."Table2RowID" and > "Department"."Active" = 't' and "Department" > ."ClientID" = 263) > JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50 > and "c_50"."RowID" = "Vacancy" > ."ID" and "c_50"."CategoryOptionID"=19205) > WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT > DISTINCT("Vacancy"."ID") > FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on > ("ct126"."Category_TableID" = 126 > and "RowID" = "Vacancy"."ID") > left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID" > and "Workflow"."Level" > = 1) > left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" = > "Workflow"."ID" and "c30"."Category_TableID" > = 30 and "c30"."CategoryOptionID" = 21923) > WHERE "Template" AND "ct126"."CategoryOptionID" > IN(34024,35254,35255,35256)) and "Vacancy" > ."Template" <> 't' AND "Vacancy"."Level" = 1 > GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated", > "Vacancy"."CustomAccess", "Department" > ."Name", "Vacancy"."PositionNo", "Vacancy"."Template" > ORDER BY "JobTitle" > > Running explain analyze gives me the following information: > http://explain.depesz.com/s/pdC <http://explain.depesz.com/s/pdC> > > For a total runtime: 2877.157 ms > > If i remove the left joins on Department and TableRow_TableRow this reduces > the run time by about a third. > Additionally removing CategoryOption and CategoryOption_TableRow joins > further reduces by a about a third. > > Given that i need both these joins for the information retrieved by them, > what would be the best way to re-factor this query so it runs faster? > > Looking at the output of explain analyze the hash aggregates and sort seem > to be the primary issue. > > Thanks in advance -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Hi Vik,
Thanks for your feedback, very helpful. I modified your query slightly, this will return all vacancy templates and all level 1 vacancies which arent templates, and does so in about ~800-900ms less, an great improvement on the original query.
SELECT "Vacancy"."ID",
"Vacancy"."JobTitle",
"Vacancy"."DateCreated",
"Vacancy"."CustomAccess",
"Department"."Name" as "Department",
list("Occupation"."Name") as "Occupation",
"Vacancy"."PositionNo",
"Vacancy"."Template"
FROM "Vacancy"
JOIN "CategoryOption_TableRow" as "c_50" ON (
"c_50"."Category_TableID"= 50
AND "c_50"."RowID" = "Vacancy"."ID"
AND "c_50"."CategoryOptionID"=19205)
LEFT JOIN "CategoryOption_TableRow" as "c_22" ON (
"c_22"."RowID" = "Vacancy"."ID"
AND "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation" ON (
"Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33" ON (
"t_33"."Table1RowID" = "Vacancy"."ID"
AND "t_33"."Table_TableID" = 33)
LEFT JOIN "Department" ON (
"Department"."ID" = "t_33"."Table2RowID"
AND "Department"."Active" = 't'
AND "Department"."ClientID" = 263)
WHERE "Vacancy"."ClientID" = 263
AND NOT EXISTS (
SELECT 1
FROM "Vacancy" as "v"
JOIN "CategoryOption_TableRow" "ct126" on (
"ct126"."Category_TableID" = 126
AND "RowID" = "v"."ID")
WHERE "v"."Template"
AND "ct126"."CategoryOptionID" IN (34024,35254,35255,35256)
AND "v"."ID" = "Vacancy"."ID")
AND ("Vacancy"."Template" OR ("Vacancy"."Template" = 'f' AND "Vacancy"."Level" = 1))
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"."Name",
"Vacancy"."PositionNo", "Vacancy"."Template"
On Wed, Aug 12, 2015 at 9:35 PM, Vik Fearing-3 [via PostgreSQL] <[hidden email]> wrote:
On 08/12/2015 04:34 AM, robbyc wrote:
> Hi,
>
> I am new to optimizing queries and i'm getting a slow running time
> (~1.5secs) with the following SQL:
Before mucking about with work_mem and indexes, the first thing to do is
rewrite this query correctly. Here are just some of the things wrong
with the query as written:
* You're doing a DISTINCT on the same set of columns also in a GROUP BY.
This is redundant and causes needless deduplication.
* You're joining two GROUPed BY then DISTINCTed queries using the UNION
operator which will do yet another pass for deduplication.
* You've got the entire query repeated for just a simple difference in
the global WHERE clause. These can be merged.
* You've kept LEFT JOINs in the subquery but you don't use any values
from them. These can be safely removed altogether.
* You're using a NOT IN clause which is almost never what you want. Use
NOT EXISTS instead.
What is this list() function? How is it defined? Can it be replaced
with string_agg()?
You're not doing yourself any favors at all with all this quoting and
mixed case stuff.
Here is a rewritten version, please let me know how it performs:
SELECT "Vacancy"."ID",
"Vacancy"."JobTitle",
"Vacancy"."DateCreated",
"Vacancy"."CustomAccess",
"Department"."Name" as "Department",
list("Occupation"."Name") as "Occupation",
"Vacancy"."PositionNo",
"Vacancy"."Template"
FROM "Vacancy"
JOIN "CategoryOption_TableRow" as "c_50" ON (
"c_50"."Category_TableID"= 50
AND "c_50"."RowID" = "Vacancy"."ID"
AND "c_50"."CategoryOptionID"=19205)
LEFT JOIN "CategoryOption_TableRow" as "c_22" ON (
"c_22"."RowID" = "Vacancy"."ID"
AND "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation" ON (
"Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33" ON (
"t_33"."Table1RowID" = "Vacancy"."ID"
AND "t_33"."Table_TableID" = 33)
LEFT JOIN "Department" ON (
"Department"."ID" = "t_33"."Table2RowID"
AND "Department"."Active" = 't'
AND "Department"."ClientID" = 263)
WHERE "Vacancy"."ClientID" = 263
AND NOT EXISTS (
SELECT 1
FROM "Vacancy" as _Vacancy
JOIN "CategoryOption_TableRow" "ct126" on (
"ct126"."Category_TableID" = 126
AND "RowID" = _Vacancy."ID")
WHERE _Vacancy."Template"
AND "ct126"."CategoryOptionID" IN (34024,35254,35255,35256)
AND _Vacancy."ID" = "Vacancy"."ID")
AND ("Vacancy"."Template" = 't' OR "Vacancy"."Level" = 1)Vik Fearing <a href="tel:%2B33%206%2046%2075%2015%2036" value="+33646751536" target="_blank">+33 6 46 75 15 36GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"."Name",
"Vacancy"."PositionNo", "Vacancy"."Template"
> SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
> "Vacancy"."DateCreated", "Vacancy"."CustomAccess"
> , "Department"."Name" as "Department", list("Occupation"."Name") as
> "Occupation", "Vacancy"."PositionNo"
> , "Vacancy"."Template" from
> "Vacancy"
> LEFT JOIN "CategoryOption_TableRow" as "c_22"
> ON ("c_22"."RowID" = "Vacancy"."ID"
> and "c_22"."Category_TableID" = 22)
> LEFT JOIN "CategoryOption" as "Occupation"
> ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
> LEFT JOIN "TableRow_TableRow" as "t_33"
> ON ("t_33"."Table1RowID" = "Vacancy"."ID"
> and "t_33"."Table_TableID" = 33 )
> LEFT JOIN "Department"
> ON ("Department"."ID" = "t_33"."Table2RowID" and
> "Department"."Active" = 't' and "Department"
> ."ClientID" = 263)
> JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
> and "c_50"."RowID" = "Vacancy"
> ."ID" and "c_50"."CategoryOptionID"=19205)
> WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
> DISTINCT("Vacancy"."ID")
> FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
> ("ct126"."Category_TableID" = 126
> and "RowID" = "Vacancy"."ID")
> left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
> and "Workflow"."Level"
> = 1)
> left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
> "Workflow"."ID" and "c30"."Category_TableID"
> = 30 and "c30"."CategoryOptionID" = 21923)
> WHERE "Template" AND "ct126"."CategoryOptionID"
> IN(34024,35254,35255,35256)) and "Vacancy"
> ."Template" = 't'
> GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
> "Vacancy"."CustomAccess", "Department"
> ."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
> UNION
> SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
> "Vacancy"."DateCreated", "Vacancy"."CustomAccess"
> , "Department"."Name" as "Department", list("Occupation"."Name") as
> "Occupation", "Vacancy"."PositionNo"
> , "Vacancy"."Template" from
> "Vacancy"
> LEFT JOIN "CategoryOption_TableRow" as "c_22"
> ON ("c_22"."RowID" = "Vacancy"."ID"
> and "c_22"."Category_TableID" = 22)
> LEFT JOIN "CategoryOption" as "Occupation"
> ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
> LEFT JOIN "TableRow_TableRow" as "t_33"
> ON ("t_33"."Table1RowID" = "Vacancy"."ID"
> and "t_33"."Table_TableID" = 33 )
> LEFT JOIN "Department"
> ON ("Department"."ID" = "t_33"."Table2RowID" and
> "Department"."Active" = 't' and "Department"
> ."ClientID" = 263)
> JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
> and "c_50"."RowID" = "Vacancy"
> ."ID" and "c_50"."CategoryOptionID"=19205)
> WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
> DISTINCT("Vacancy"."ID")
> FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
> ("ct126"."Category_TableID" = 126
> and "RowID" = "Vacancy"."ID")
> left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
> and "Workflow"."Level"
> = 1)
> left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
> "Workflow"."ID" and "c30"."Category_TableID"
> = 30 and "c30"."CategoryOptionID" = 21923)
> WHERE "Template" AND "ct126"."CategoryOptionID"
> IN(34024,35254,35255,35256)) and "Vacancy"
> ."Template" <> 't' AND "Vacancy"."Level" = 1
> GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
> "Vacancy"."CustomAccess", "Department"
> ."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
> ORDER BY "JobTitle"
>
> Running explain analyze gives me the following information:
> http://explain.depesz.com/s/pdC <http://explain.depesz.com/s/pdC>
>
> For a total runtime: 2877.157 ms
>
> If i remove the left joins on Department and TableRow_TableRow this reduces
> the run time by about a third.
> Additionally removing CategoryOption and CategoryOption_TableRow joins
> further reduces by a about a third.
>
> Given that i need both these joins for the information retrieved by them,
> what would be the best way to re-factor this query so it runs faster?
>
> Looking at the output of explain analyze the hash aggregates and sort seem
> to be the primary issue.
>
> Thanks in advance
--
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performanceIf you reply to this email, your message will be added to the discussion below:http://postgresql.nabble.com/Slow-Query-tp5861835p5861873.html
--
View this message in context: Re: Slow Query
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
In the 'not exists' cluster, you do not have to search table "Vacancy as v" again.
I think it would be faster to use the outer Vacancy table as below.
In your case, that do the same work.
NOT EXISTS (
SELECT 1
FROM "CategoryOption_TableRow" "ct126"
WHERE "Vacancy"."Template"
AND "ct126"."CategoryOptionID" IN (34024,35254,35255,35256)
AND "ct126"."Category_TableID" = 126
AND "ct126"."RowID" = "Vacancy"."ID"
)
Hi,
Doing this returns 0 recordsOn Thu, Aug 13, 2015 at 7:22 PM, 林士博 [via PostgreSQL] <[hidden email]> wrote:
In the 'not exists' cluster, you do not have to search table "Vacancy as v" again.I think it would be faster to use the outer Vacancy table as below.In your case, that do the same work.NOT EXISTS (SELECT 1FROM "CategoryOption_TableRow" "ct126"WHERE "Vacancy"."Template"AND "ct126"."CategoryOptionID" IN (34024,35254,35255,35256)AND "ct126"."Category_TableID" = 126AND "ct126"."RowID" = "Vacancy"."ID")If you reply to this email, your message will be added to the discussion below:http://postgresql.nabble.com/Slow-Query-tp5861835p5862008.html
--
View this message in context: Re: Slow Query
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Is the "Vacancy"."ID" a primary key?
Or is unique in Vacancy table?
OK.
If you benchmark that correctly,
then it seems that adding some redundant search can get the query faster in some special cases.
And without further info, I can not see any reason.
2015-08-14 14:35 GMT+09:00 Robert Campbell <robcampbell73@gmail.com>:
Vacancy ID is a primary key.It does return records but not quite as fast as original query, about 200ms slowerHi,My mistake, didnt apply the sub query properly the first time.On Fri, Aug 14, 2015 at 3:10 PM, 林士博 <lin@repica.co.jp> wrote:Is the "Vacancy"."ID" a primary key?Or is unique in Vacancy table?
--