Обсуждение: Slow Query

Поиск

Список

Период

Сортировка

Slow Query

От

robbyc

Дата:

12 августа 2015 г., 05:34:26

Hi,

I am new to optimizing queries and i'm getting a slow running time
(~1.5secs) with the following SQL:

SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
"Vacancy"."DateCreated", "Vacancy"."CustomAccess"
, "Department"."Name" as "Department", list("Occupation"."Name") as
"Occupation", "Vacancy"."PositionNo"
, "Vacancy"."Template" from
   "Vacancy"
LEFT JOIN "CategoryOption_TableRow" as "c_22"
         ON ("c_22"."RowID" = "Vacancy"."ID"
         and "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation"
         ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33"
      ON ("t_33"."Table1RowID" = "Vacancy"."ID"
      and "t_33"."Table_TableID" = 33 )
LEFT JOIN "Department"
         ON ("Department"."ID" = "t_33"."Table2RowID" and
"Department"."Active" = 't' and "Department"
."ClientID" = 263)
JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
and "c_50"."RowID" = "Vacancy"
."ID" and "c_50"."CategoryOptionID"=19205)
WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
DISTINCT("Vacancy"."ID")
            FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
("ct126"."Category_TableID" = 126
 and "RowID" = "Vacancy"."ID")
            left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
and "Workflow"."Level"
= 1)
            left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
"Workflow"."ID" and "c30"."Category_TableID"
 = 30 and "c30"."CategoryOptionID" = 21923)
            WHERE "Template" AND "ct126"."CategoryOptionID"
IN(34024,35254,35255,35256)) and "Vacancy"
."Template" = 't'
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"
."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
UNION
SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
"Vacancy"."DateCreated", "Vacancy"."CustomAccess"
, "Department"."Name" as "Department", list("Occupation"."Name") as
"Occupation", "Vacancy"."PositionNo"
, "Vacancy"."Template" from
   "Vacancy"
LEFT JOIN "CategoryOption_TableRow" as "c_22"
         ON ("c_22"."RowID" = "Vacancy"."ID"
         and "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation"
         ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33"
      ON ("t_33"."Table1RowID" = "Vacancy"."ID"
      and "t_33"."Table_TableID" = 33 )
LEFT JOIN "Department"
         ON ("Department"."ID" = "t_33"."Table2RowID" and
"Department"."Active" = 't' and "Department"
."ClientID" = 263)
JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
and "c_50"."RowID" = "Vacancy"
."ID" and "c_50"."CategoryOptionID"=19205)
WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
DISTINCT("Vacancy"."ID")
            FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
("ct126"."Category_TableID" = 126
 and "RowID" = "Vacancy"."ID")
            left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
and "Workflow"."Level"
= 1)
            left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
"Workflow"."ID" and "c30"."Category_TableID"
 = 30 and "c30"."CategoryOptionID" = 21923)
            WHERE "Template" AND "ct126"."CategoryOptionID"
IN(34024,35254,35255,35256))  and "Vacancy"
."Template" <> 't' AND "Vacancy"."Level" = 1
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"
."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
 ORDER BY "JobTitle"

Running explain analyze gives me the following information:
http://explain.depesz.com/s/pdC <http://explain.depesz.com/s/pdC>

For a total runtime: 2877.157 ms

If i remove the left joins on Department and TableRow_TableRow this reduces
the run time by about a third.
Additionally removing CategoryOption and CategoryOption_TableRow joins
further reduces by a about a third.

Given that i need both these joins for the information retrieved by them,
what would be the best way to re-factor this query so it runs faster?

Looking at the output of explain analyze the hash aggregates and sort seem
to be the primary issue.

Thanks in advance










--
View this message in context: http://postgresql.nabble.com/Slow-Query-tp5861835.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Slow Query

От

Venkata Balaji N

Дата:

12 августа 2015 г., 07:08:51

On Wed, Aug 12, 2015 at 12:34 PM, robbyc <robcampbell73@gmail.com> wrote:

Hi,

I am new to optimizing queries and i'm getting a slow running time
(~1.5secs) with the following SQL:

SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
"Vacancy"."DateCreated", "Vacancy"."CustomAccess"
, "Department"."Name" as "Department", list("Occupation"."Name") as
"Occupation", "Vacancy"."PositionNo"
, "Vacancy"."Template" from
"Vacancy"
LEFT JOIN "CategoryOption_TableRow" as "c_22"
ON ("c_22"."RowID" = "Vacancy"."ID"
and "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation"
ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33"
ON ("t_33"."Table1RowID" = "Vacancy"."ID"
and "t_33"."Table_TableID" = 33 )
LEFT JOIN "Department"
ON ("Department"."ID" = "t_33"."Table2RowID" and
"Department"."Active" = 't' and "Department"
."ClientID" = 263)
JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
and "c_50"."RowID" = "Vacancy"
."ID" and "c_50"."CategoryOptionID"=19205)
WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
DISTINCT("Vacancy"."ID")
FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
("ct126"."Category_TableID" = 126
and "RowID" = "Vacancy"."ID")
left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
and "Workflow"."Level"
= 1)
left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
"Workflow"."ID" and "c30"."Category_TableID"
= 30 and "c30"."CategoryOptionID" = 21923)
WHERE "Template" AND "ct126"."CategoryOptionID"
IN(34024,35254,35255,35256)) and "Vacancy"
."Template" = 't'
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"
."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
UNION
SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
"Vacancy"."DateCreated", "Vacancy"."CustomAccess"
, "Department"."Name" as "Department", list("Occupation"."Name") as
"Occupation", "Vacancy"."PositionNo"
, "Vacancy"."Template" from
"Vacancy"
LEFT JOIN "CategoryOption_TableRow" as "c_22"
ON ("c_22"."RowID" = "Vacancy"."ID"
and "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation"
ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33"
ON ("t_33"."Table1RowID" = "Vacancy"."ID"
and "t_33"."Table_TableID" = 33 )
LEFT JOIN "Department"
ON ("Department"."ID" = "t_33"."Table2RowID" and
"Department"."Active" = 't' and "Department"
."ClientID" = 263)
JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
and "c_50"."RowID" = "Vacancy"
."ID" and "c_50"."CategoryOptionID"=19205)
WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
DISTINCT("Vacancy"."ID")
FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
("ct126"."Category_TableID" = 126
and "RowID" = "Vacancy"."ID")
left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
and "Workflow"."Level"
= 1)
left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
"Workflow"."ID" and "c30"."Category_TableID"
= 30 and "c30"."CategoryOptionID" = 21923)
WHERE "Template" AND "ct126"."CategoryOptionID"
IN(34024,35254,35255,35256)) and "Vacancy"
."Template" <> 't' AND "Vacancy"."Level" = 1
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"
."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
ORDER BY "JobTitle"

Running explain analyze gives me the following information:
http://explain.depesz.com/s/pdC <http://explain.depesz.com/s/pdC>

For a total runtime: 2877.157 ms

If i remove the left joins on Department and TableRow_TableRow this reduces
the run time by about a third.
Additionally removing CategoryOption and CategoryOption_TableRow joins
further reduces by a about a third.

Given that i need both these joins for the information retrieved by them,
what would be the best way to re-factor this query so it runs faster?

Looking at the output of explain analyze the hash aggregates and sort seem
to be the primary issue.

The query has got a distinct and group-by/order-by clauses which seems to be taking time. Without looking at much details of the query code and Table size etc, did you try increasing the work_mem and then execute the query and see if that helps ? This will reduce the on-disk IO for sorting. Also, Vacancy.JobTitle seems to be a non-index column.

Regards,

Venkata Balaji

Fujitsu Australia

Re: Slow Query

От

robbyc

Дата:

12 августа 2015 г., 08:29:55

Hi Venkata,

work_mem was set to 72MB, increased to 144MB, no change.

Added an index of type varchar_pattern_ops to Vacancy.JobTitle, this did not help either.

On Wed, Aug 12, 2015 at 2:09 PM, Venkata Balaji N [via PostgreSQL] <[hidden email]> wrote:

On Wed, Aug 12, 2015 at 12:34 PM, robbyc <[hidden email]> wrote:
Hi,

I am new to optimizing queries and i'm getting a slow running time
(~1.5secs) with the following SQL:

SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
"Vacancy"."DateCreated", "Vacancy"."CustomAccess"
, "Department"."Name" as "Department", list("Occupation"."Name") as
"Occupation", "Vacancy"."PositionNo"
, "Vacancy"."Template" from
"Vacancy"
LEFT JOIN "CategoryOption_TableRow" as "c_22"
ON ("c_22"."RowID" = "Vacancy"."ID"
and "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation"
ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33"
ON ("t_33"."Table1RowID" = "Vacancy"."ID"
and "t_33"."Table_TableID" = 33 )
LEFT JOIN "Department"
ON ("Department"."ID" = "t_33"."Table2RowID" and
"Department"."Active" = 't' and "Department"
."ClientID" = 263)
JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
and "c_50"."RowID" = "Vacancy"
."ID" and "c_50"."CategoryOptionID"=19205)
WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
DISTINCT("Vacancy"."ID")
FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
("ct126"."Category_TableID" = 126
and "RowID" = "Vacancy"."ID")
left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
and "Workflow"."Level"
= 1)
left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
"Workflow"."ID" and "c30"."Category_TableID"
= 30 and "c30"."CategoryOptionID" = 21923)
WHERE "Template" AND "ct126"."CategoryOptionID"
IN(34024,35254,35255,35256)) and "Vacancy"
."Template" = 't'
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"
."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
UNION
SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
"Vacancy"."DateCreated", "Vacancy"."CustomAccess"
, "Department"."Name" as "Department", list("Occupation"."Name") as
"Occupation", "Vacancy"."PositionNo"
, "Vacancy"."Template" from
"Vacancy"
LEFT JOIN "CategoryOption_TableRow" as "c_22"
ON ("c_22"."RowID" = "Vacancy"."ID"
and "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation"
ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33"
ON ("t_33"."Table1RowID" = "Vacancy"."ID"
and "t_33"."Table_TableID" = 33 )
LEFT JOIN "Department"
ON ("Department"."ID" = "t_33"."Table2RowID" and
"Department"."Active" = 't' and "Department"
."ClientID" = 263)
JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
and "c_50"."RowID" = "Vacancy"
."ID" and "c_50"."CategoryOptionID"=19205)
WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
DISTINCT("Vacancy"."ID")
FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
("ct126"."Category_TableID" = 126
and "RowID" = "Vacancy"."ID")
left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
and "Workflow"."Level"
= 1)
left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
"Workflow"."ID" and "c30"."Category_TableID"
= 30 and "c30"."CategoryOptionID" = 21923)
WHERE "Template" AND "ct126"."CategoryOptionID"
IN(34024,35254,35255,35256)) and "Vacancy"
."Template" <> 't' AND "Vacancy"."Level" = 1
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"
."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
ORDER BY "JobTitle"

Running explain analyze gives me the following information:
http://explain.depesz.com/s/pdC <http://explain.depesz.com/s/pdC>

For a total runtime: 2877.157 ms

If i remove the left joins on Department and TableRow_TableRow this reduces
the run time by about a third.
Additionally removing CategoryOption and CategoryOption_TableRow joins
further reduces by a about a third.

Given that i need both these joins for the information retrieved by them,
what would be the best way to re-factor this query so it runs faster?

Looking at the output of explain analyze the hash aggregates and sort seem
to be the primary issue.

The query has got a distinct and group-by/order-by clauses which seems to be taking time. Without looking at much details of the query code and Table size etc, did you try increasing the work_mem and then execute the query and see if that helps ? This will reduce the on-disk IO for sorting. Also, Vacancy.JobTitle seems to be a non-index column.

Regards,
Venkata Balaji

Fujitsu Australia

If you reply to this email, your message will be added to the discussion below:
http://postgresql.nabble.com/Slow-Query-tp5861835p5861839.html
To unsubscribe from Slow Query, click here.
NAML

Regards

Robert Campbell
+61412062971
[hidden email]

View this message in context: Re: Slow Query
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Slow Query

От

Venkata Balaji N

Дата:

12 августа 2015 г., 09:08:58

On Wed, Aug 12, 2015 at 3:29 PM, robbyc <robcampbell73@gmail.com> wrote:

Hi Venkata,

work_mem was set to 72MB, increased to 144MB, no change.

Increasing work_mem depends on various other factors like Table size (amount of data being sorted), available memory etc.

Added an index of type varchar_pattern_ops to Vacancy.JobTitle, this did not help either.

Sorry, I did not mean to say that an Index must be added straight away. The column must be eligible to have an Index. Meaning, Index will be beneficial if created on a column with high number of distinct values.

If either of the above does not help, then options to rewrite the query must be explored.

Thanks,

Venkata Balaji N

Fujitsu Australia

On Wed, Aug 12, 2015 at 2:09 PM, Venkata Balaji N [via PostgreSQL] <[hidden email]> wrote:
On Wed, Aug 12, 2015 at 12:34 PM, robbyc <[hidden email]> wrote:
Hi,

I am new to optimizing queries and i'm getting a slow running time
(~1.5secs) with the following SQL:

SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
"Vacancy"."DateCreated", "Vacancy"."CustomAccess"
, "Department"."Name" as "Department", list("Occupation"."Name") as
"Occupation", "Vacancy"."PositionNo"
, "Vacancy"."Template" from
"Vacancy"
LEFT JOIN "CategoryOption_TableRow" as "c_22"
ON ("c_22"."RowID" = "Vacancy"."ID"
and "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation"
ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33"
ON ("t_33"."Table1RowID" = "Vacancy"."ID"
and "t_33"."Table_TableID" = 33 )
LEFT JOIN "Department"
ON ("Department"."ID" = "t_33"."Table2RowID" and
"Department"."Active" = 't' and "Department"
."ClientID" = 263)
JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
and "c_50"."RowID" = "Vacancy"
."ID" and "c_50"."CategoryOptionID"=19205)
WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
DISTINCT("Vacancy"."ID")
FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
("ct126"."Category_TableID" = 126
and "RowID" = "Vacancy"."ID")
left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
and "Workflow"."Level"
= 1)
left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
"Workflow"."ID" and "c30"."Category_TableID"
= 30 and "c30"."CategoryOptionID" = 21923)
WHERE "Template" AND "ct126"."CategoryOptionID"
IN(34024,35254,35255,35256)) and "Vacancy"
."Template" = 't'
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"
."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
UNION
SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
"Vacancy"."DateCreated", "Vacancy"."CustomAccess"
, "Department"."Name" as "Department", list("Occupation"."Name") as
"Occupation", "Vacancy"."PositionNo"
, "Vacancy"."Template" from
"Vacancy"
LEFT JOIN "CategoryOption_TableRow" as "c_22"
ON ("c_22"."RowID" = "Vacancy"."ID"
and "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation"
ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33"
ON ("t_33"."Table1RowID" = "Vacancy"."ID"
and "t_33"."Table_TableID" = 33 )
LEFT JOIN "Department"
ON ("Department"."ID" = "t_33"."Table2RowID" and
"Department"."Active" = 't' and "Department"
."ClientID" = 263)
JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
and "c_50"."RowID" = "Vacancy"
."ID" and "c_50"."CategoryOptionID"=19205)
WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
DISTINCT("Vacancy"."ID")
FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
("ct126"."Category_TableID" = 126
and "RowID" = "Vacancy"."ID")
left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
and "Workflow"."Level"
= 1)
left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
"Workflow"."ID" and "c30"."Category_TableID"
= 30 and "c30"."CategoryOptionID" = 21923)
WHERE "Template" AND "ct126"."CategoryOptionID"
IN(34024,35254,35255,35256)) and "Vacancy"
."Template" <> 't' AND "Vacancy"."Level" = 1
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"
."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
ORDER BY "JobTitle"

Running explain analyze gives me the following information:
http://explain.depesz.com/s/pdC <http://explain.depesz.com/s/pdC>

For a total runtime: 2877.157 ms

If i remove the left joins on Department and TableRow_TableRow this reduces
the run time by about a third.
Additionally removing CategoryOption and CategoryOption_TableRow joins
further reduces by a about a third.

Given that i need both these joins for the information retrieved by them,
what would be the best way to re-factor this query so it runs faster?

Looking at the output of explain analyze the hash aggregates and sort seem
to be the primary issue.

The query has got a distinct and group-by/order-by clauses which seems to be taking time. Without looking at much details of the query code and Table size etc, did you try increasing the work_mem and then execute the query and see if that helps ? This will reduce the on-disk IO for sorting. Also, Vacancy.JobTitle seems to be a non-index column.

Regards,
Venkata Balaji

Fujitsu Australia

If you reply to this email, your message will be added to the discussion below:
http://postgresql.nabble.com/Slow-Query-tp5861835p5861839.html
To unsubscribe from Slow Query, click here.
NAML

--
Regards

Robert Campbell
+61412062971
[hidden email]

View this message in context: Re: Slow Query

Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Slow Query

От

Vik Fearing

Дата:

12 августа 2015 г., 14:34:55

On 08/12/2015 04:34 AM, robbyc wrote:
> Hi,
>
> I am new to optimizing queries and i'm getting a slow running time
> (~1.5secs) with the following SQL:

Before mucking about with work_mem and indexes, the first thing to do is
rewrite this query correctly.  Here are just some of the things wrong
with the query as written:

* You're doing a DISTINCT on the same set of columns also in a GROUP BY.
  This is redundant and causes needless deduplication.

* You're joining two GROUPed BY then DISTINCTed queries using the UNION
  operator which will do yet another pass for deduplication.

* You've got the entire query repeated for just a simple difference in
  the global WHERE clause.  These can be merged.

* You've kept LEFT JOINs in the subquery but you don't use any values
  from them.  These can be safely removed altogether.

* You're using a NOT IN clause which is almost never what you want.  Use
  NOT EXISTS instead.

What is this list() function?  How is it defined?  Can it be replaced
with string_agg()?

You're not doing yourself any favors at all with all this quoting and
mixed case stuff.

Here is a rewritten version, please let me know how it performs:

SELECT "Vacancy"."ID",
       "Vacancy"."JobTitle",
       "Vacancy"."DateCreated",
       "Vacancy"."CustomAccess",
       "Department"."Name" as "Department",
       list("Occupation"."Name") as "Occupation",
       "Vacancy"."PositionNo",
       "Vacancy"."Template"
FROM "Vacancy"
JOIN "CategoryOption_TableRow" as "c_50" ON (
        "c_50"."Category_TableID"= 50
    AND "c_50"."RowID" = "Vacancy"."ID"
    AND "c_50"."CategoryOptionID"=19205)
LEFT JOIN "CategoryOption_TableRow" as "c_22" ON (
        "c_22"."RowID" = "Vacancy"."ID"
    AND "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation" ON (
        "Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33" ON (
        "t_33"."Table1RowID" = "Vacancy"."ID"
    AND "t_33"."Table_TableID" = 33)
LEFT JOIN "Department" ON (
        "Department"."ID" = "t_33"."Table2RowID"
    AND "Department"."Active" = 't'
    AND "Department"."ClientID" = 263)
WHERE "Vacancy"."ClientID" = 263
  AND NOT EXISTS (
        SELECT 1
        FROM "Vacancy" as _Vacancy
        JOIN "CategoryOption_TableRow" "ct126" on (
                "ct126"."Category_TableID" = 126
            AND "RowID" = _Vacancy."ID")
        WHERE _Vacancy."Template"
          AND "ct126"."CategoryOptionID" IN (34024,35254,35255,35256)
          AND _Vacancy."ID" = "Vacancy"."ID")
  AND ("Vacancy"."Template" = 't' OR "Vacancy"."Level" = 1)
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
         "Vacancy"."CustomAccess", "Department"."Name",
         "Vacancy"."PositionNo", "Vacancy"."Template"


> SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
> "Vacancy"."DateCreated", "Vacancy"."CustomAccess"
> , "Department"."Name" as "Department", list("Occupation"."Name") as
> "Occupation", "Vacancy"."PositionNo"
> , "Vacancy"."Template" from
>    "Vacancy"
> LEFT JOIN "CategoryOption_TableRow" as "c_22"
>          ON ("c_22"."RowID" = "Vacancy"."ID"
>          and "c_22"."Category_TableID" = 22)
> LEFT JOIN "CategoryOption" as "Occupation"
>          ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
> LEFT JOIN "TableRow_TableRow" as "t_33"
>       ON ("t_33"."Table1RowID" = "Vacancy"."ID"
>       and "t_33"."Table_TableID" = 33 )
> LEFT JOIN "Department"
>          ON ("Department"."ID" = "t_33"."Table2RowID" and
> "Department"."Active" = 't' and "Department"
> ."ClientID" = 263)
> JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
> and "c_50"."RowID" = "Vacancy"
> ."ID" and "c_50"."CategoryOptionID"=19205)
> WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
> DISTINCT("Vacancy"."ID")
>             FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
> ("ct126"."Category_TableID" = 126
>  and "RowID" = "Vacancy"."ID")
>             left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
> and "Workflow"."Level"
> = 1)
>             left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
> "Workflow"."ID" and "c30"."Category_TableID"
>  = 30 and "c30"."CategoryOptionID" = 21923)
>             WHERE "Template" AND "ct126"."CategoryOptionID"
> IN(34024,35254,35255,35256)) and "Vacancy"
> ."Template" = 't'
> GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
> "Vacancy"."CustomAccess", "Department"
> ."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
> UNION
> SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
> "Vacancy"."DateCreated", "Vacancy"."CustomAccess"
> , "Department"."Name" as "Department", list("Occupation"."Name") as
> "Occupation", "Vacancy"."PositionNo"
> , "Vacancy"."Template" from
>    "Vacancy"
> LEFT JOIN "CategoryOption_TableRow" as "c_22"
>          ON ("c_22"."RowID" = "Vacancy"."ID"
>          and "c_22"."Category_TableID" = 22)
> LEFT JOIN "CategoryOption" as "Occupation"
>          ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
> LEFT JOIN "TableRow_TableRow" as "t_33"
>       ON ("t_33"."Table1RowID" = "Vacancy"."ID"
>       and "t_33"."Table_TableID" = 33 )
> LEFT JOIN "Department"
>          ON ("Department"."ID" = "t_33"."Table2RowID" and
> "Department"."Active" = 't' and "Department"
> ."ClientID" = 263)
> JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
> and "c_50"."RowID" = "Vacancy"
> ."ID" and "c_50"."CategoryOptionID"=19205)
> WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
> DISTINCT("Vacancy"."ID")
>             FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
> ("ct126"."Category_TableID" = 126
>  and "RowID" = "Vacancy"."ID")
>             left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
> and "Workflow"."Level"
> = 1)
>             left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
> "Workflow"."ID" and "c30"."Category_TableID"
>  = 30 and "c30"."CategoryOptionID" = 21923)
>             WHERE "Template" AND "ct126"."CategoryOptionID"
> IN(34024,35254,35255,35256))  and "Vacancy"
> ."Template" <> 't' AND "Vacancy"."Level" = 1
> GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
> "Vacancy"."CustomAccess", "Department"
> ."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
>  ORDER BY "JobTitle"
>
> Running explain analyze gives me the following information:
> http://explain.depesz.com/s/pdC <http://explain.depesz.com/s/pdC>
>
> For a total runtime: 2877.157 ms
>
> If i remove the left joins on Department and TableRow_TableRow this reduces
> the run time by about a third.
> Additionally removing CategoryOption and CategoryOption_TableRow joins
> further reduces by a about a third.
>
> Given that i need both these joins for the information retrieved by them,
> what would be the best way to re-factor this query so it runs faster?
>
> Looking at the output of explain analyze the hash aggregates and sort seem
> to be the primary issue.
>
> Thanks in advance


--
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Slow Query

От

robbyc

Дата:

13 августа 2015 г., 03:50:53

Hi Vik,

Thanks for your feedback, very helpful.

I modified your query slightly, this will return all vacancy templates and all level 1 vacancies which arent templates, and does so in about ~800-900ms less, an great improvement on the original query.

SELECT "Vacancy"."ID",
       "Vacancy"."JobTitle",
       "Vacancy"."DateCreated",
       "Vacancy"."CustomAccess",
       "Department"."Name" as "Department",
       list("Occupation"."Name") as "Occupation",
       "Vacancy"."PositionNo",
       "Vacancy"."Template"
FROM "Vacancy"
JOIN "CategoryOption_TableRow" as "c_50" ON (
        "c_50"."Category_TableID"= 50
    AND "c_50"."RowID" = "Vacancy"."ID"
    AND "c_50"."CategoryOptionID"=19205)
LEFT JOIN "CategoryOption_TableRow" as "c_22" ON (
        "c_22"."RowID" = "Vacancy"."ID"
    AND "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation" ON (
        "Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33" ON (
        "t_33"."Table1RowID" = "Vacancy"."ID"
    AND "t_33"."Table_TableID" = 33)
LEFT JOIN "Department" ON (
        "Department"."ID" = "t_33"."Table2RowID"
    AND "Department"."Active" = 't'
    AND "Department"."ClientID" = 263)
WHERE "Vacancy"."ClientID" = 263
AND NOT EXISTS (
        SELECT 1
        FROM "Vacancy" as "v"
        JOIN "CategoryOption_TableRow" "ct126" on (
                "ct126"."Category_TableID" = 126
            AND "RowID" = "v"."ID")
        WHERE "v"."Template"
          AND "ct126"."CategoryOptionID" IN (34024,35254,35255,35256)
          AND "v"."ID" = "Vacancy"."ID")
AND ("Vacancy"."Template" OR ("Vacancy"."Template" = 'f' AND "Vacancy"."Level" = 1))
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
         "Vacancy"."CustomAccess", "Department"."Name",
         "Vacancy"."PositionNo", "Vacancy"."Template"

On Wed, Aug 12, 2015 at 9:35 PM, Vik Fearing-3 [via PostgreSQL] <[hidden email]> wrote:

On 08/12/2015 04:34 AM, robbyc wrote:
> Hi,
>
> I am new to optimizing queries and i'm getting a slow running time
> (~1.5secs) with the following SQL:

Before mucking about with work_mem and indexes, the first thing to do is
rewrite this query correctly. Here are just some of the things wrong
with the query as written:

* You're doing a DISTINCT on the same set of columns also in a GROUP BY.
This is redundant and causes needless deduplication.

* You're joining two GROUPed BY then DISTINCTed queries using the UNION
operator which will do yet another pass for deduplication.

* You've got the entire query repeated for just a simple difference in
the global WHERE clause. These can be merged.

* You've kept LEFT JOINs in the subquery but you don't use any values
from them. These can be safely removed altogether.

* You're using a NOT IN clause which is almost never what you want. Use
NOT EXISTS instead.

What is this list() function? How is it defined? Can it be replaced
with string_agg()?

You're not doing yourself any favors at all with all this quoting and
mixed case stuff.

Here is a rewritten version, please let me know how it performs:

SELECT "Vacancy"."ID",
"Vacancy"."JobTitle",
"Vacancy"."DateCreated",
"Vacancy"."CustomAccess",
"Department"."Name" as "Department",
list("Occupation"."Name") as "Occupation",
"Vacancy"."PositionNo",
"Vacancy"."Template"
FROM "Vacancy"
JOIN "CategoryOption_TableRow" as "c_50" ON (
"c_50"."Category_TableID"= 50
AND "c_50"."RowID" = "Vacancy"."ID"
AND "c_50"."CategoryOptionID"=19205)
LEFT JOIN "CategoryOption_TableRow" as "c_22" ON (
"c_22"."RowID" = "Vacancy"."ID"
AND "c_22"."Category_TableID" = 22)
LEFT JOIN "CategoryOption" as "Occupation" ON (
"Occupation"."ID" = "c_22"."CategoryOptionID")
LEFT JOIN "TableRow_TableRow" as "t_33" ON (
"t_33"."Table1RowID" = "Vacancy"."ID"
AND "t_33"."Table_TableID" = 33)
LEFT JOIN "Department" ON (
"Department"."ID" = "t_33"."Table2RowID"
AND "Department"."Active" = 't'
AND "Department"."ClientID" = 263)
WHERE "Vacancy"."ClientID" = 263
AND NOT EXISTS (
SELECT 1
FROM "Vacancy" as _Vacancy
JOIN "CategoryOption_TableRow" "ct126" on (
"ct126"."Category_TableID" = 126
AND "RowID" = _Vacancy."ID")
WHERE _Vacancy."Template"
AND "ct126"."CategoryOptionID" IN (34024,35254,35255,35256)
AND _Vacancy."ID" = "Vacancy"."ID")
AND ("Vacancy"."Template" = 't' OR "Vacancy"."Level" = 1)
GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
"Vacancy"."CustomAccess", "Department"."Name",
"Vacancy"."PositionNo", "Vacancy"."Template"

> SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
> "Vacancy"."DateCreated", "Vacancy"."CustomAccess"
> , "Department"."Name" as "Department", list("Occupation"."Name") as
> "Occupation", "Vacancy"."PositionNo"
> , "Vacancy"."Template" from
> "Vacancy"
> LEFT JOIN "CategoryOption_TableRow" as "c_22"
> ON ("c_22"."RowID" = "Vacancy"."ID"
> and "c_22"."Category_TableID" = 22)
> LEFT JOIN "CategoryOption" as "Occupation"
> ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
> LEFT JOIN "TableRow_TableRow" as "t_33"
> ON ("t_33"."Table1RowID" = "Vacancy"."ID"
> and "t_33"."Table_TableID" = 33 )
> LEFT JOIN "Department"
> ON ("Department"."ID" = "t_33"."Table2RowID" and
> "Department"."Active" = 't' and "Department"
> ."ClientID" = 263)
> JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
> and "c_50"."RowID" = "Vacancy"
> ."ID" and "c_50"."CategoryOptionID"=19205)
> WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
> DISTINCT("Vacancy"."ID")
> FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
> ("ct126"."Category_TableID" = 126
> and "RowID" = "Vacancy"."ID")
> left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
> and "Workflow"."Level"
> = 1)
> left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
> "Workflow"."ID" and "c30"."Category_TableID"
> = 30 and "c30"."CategoryOptionID" = 21923)
> WHERE "Template" AND "ct126"."CategoryOptionID"
> IN(34024,35254,35255,35256)) and "Vacancy"
> ."Template" = 't'
> GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
> "Vacancy"."CustomAccess", "Department"
> ."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
> UNION
> SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle",
> "Vacancy"."DateCreated", "Vacancy"."CustomAccess"
> , "Department"."Name" as "Department", list("Occupation"."Name") as
> "Occupation", "Vacancy"."PositionNo"
> , "Vacancy"."Template" from
> "Vacancy"
> LEFT JOIN "CategoryOption_TableRow" as "c_22"
> ON ("c_22"."RowID" = "Vacancy"."ID"
> and "c_22"."Category_TableID" = 22)
> LEFT JOIN "CategoryOption" as "Occupation"
> ON ("Occupation"."ID" = "c_22"."CategoryOptionID")
> LEFT JOIN "TableRow_TableRow" as "t_33"
> ON ("t_33"."Table1RowID" = "Vacancy"."ID"
> and "t_33"."Table_TableID" = 33 )
> LEFT JOIN "Department"
> ON ("Department"."ID" = "t_33"."Table2RowID" and
> "Department"."Active" = 't' and "Department"
> ."ClientID" = 263)
> JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50
> and "c_50"."RowID" = "Vacancy"
> ."ID" and "c_50"."CategoryOptionID"=19205)
> WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT
> DISTINCT("Vacancy"."ID")
> FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on
> ("ct126"."Category_TableID" = 126
> and "RowID" = "Vacancy"."ID")
> left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID"
> and "Workflow"."Level"
> = 1)
> left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" =
> "Workflow"."ID" and "c30"."Category_TableID"
> = 30 and "c30"."CategoryOptionID" = 21923)
> WHERE "Template" AND "ct126"."CategoryOptionID"
> IN(34024,35254,35255,35256)) and "Vacancy"
> ."Template" <> 't' AND "Vacancy"."Level" = 1
> GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated",
> "Vacancy"."CustomAccess", "Department"
> ."Name", "Vacancy"."PositionNo", "Vacancy"."Template"
> ORDER BY "JobTitle"
>
> Running explain analyze gives me the following information:
> http://explain.depesz.com/s/pdC <http://explain.depesz.com/s/pdC>
>
> For a total runtime: 2877.157 ms
>
> If i remove the left joins on Department and TableRow_TableRow this reduces
> the run time by about a third.
> Additionally removing CategoryOption and CategoryOption_TableRow joins
> further reduces by a about a third.
>
> Given that i need both these joins for the information retrieved by them,
> what would be the best way to re-factor this query so it runs faster?
>
> Looking at the output of explain analyze the hash aggregates and sort seem
> to be the primary issue.
>
> Thanks in advance

--
Vik Fearing <a href="tel:%2B33%206%2046%2075%2015%2036" value="+33646751536" target="_blank">+33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

If you reply to this email, your message will be added to the discussion below:
http://postgresql.nabble.com/Slow-Query-tp5861835p5861873.html
To unsubscribe from Slow Query, click here.
NAML

Regards

Robert Campbell
+61412062971
[hidden email]

View this message in context: Re: Slow Query
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Slow Query

От

林士博

Дата:

13 августа 2015 г., 12:21:50

In the 'not exists' cluster, you do not have to search table "Vacancy as v" again.

I think it would be faster to use the outer Vacancy table as below.

In your case, that do the same work.

NOT EXISTS (

SELECT 1

FROM "CategoryOption_TableRow" "ct126"

WHERE "Vacancy"."Template"

AND "ct126"."CategoryOptionID" IN (34024,35254,35255,35256)

AND "ct126"."Category_TableID" = 126

AND "ct126"."RowID" = "Vacancy"."ID"

)

Re: Slow Query

От

robbyc

Дата:

14 августа 2015 г., 06:15:09

Hi,

Doing this returns 0 records

On Thu, Aug 13, 2015 at 7:22 PM, 林士博 [via PostgreSQL] <[hidden email]> wrote:

In the 'not exists' cluster, you do not have to search table "Vacancy as v" again.
I think it would be faster to use the outer Vacancy table as below.
In your case, that do the same work.

NOT EXISTS (
SELECT 1
FROM "CategoryOption_TableRow" "ct126"
WHERE "Vacancy"."Template"
AND "ct126"."CategoryOptionID" IN (34024,35254,35255,35256)
AND "ct126"."Category_TableID" = 126
AND "ct126"."RowID" = "Vacancy"."ID"
)

If you reply to this email, your message will be added to the discussion below:
http://postgresql.nabble.com/Slow-Query-tp5861835p5862008.html
To unsubscribe from Slow Query, click here.
NAML

Regards

Robert Campbell
+61412062971
[hidden email]

View this message in context: Re: Slow Query
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Slow Query

От

林士博

Дата:

14 августа 2015 г., 08:10:42

Is the "Vacancy"."ID" a primary key?

Or is unique in Vacancy table?

Re: Slow Query

От

林士博

Дата:

14 августа 2015 г., 09:10:00

OK.

If you benchmark that correctly,

then it seems that adding some redundant search can get the query faster in some special cases.

And without further info, I can not see any reason.

2015-08-14 14:35 GMT+09:00 Robert Campbell <robcampbell73@gmail.com>:

Hi,

My mistake, didnt apply the sub query properly the first time.

It does return records but not quite as fast as original query, about 200ms slower

Vacancy ID is a primary key.

On Fri, Aug 14, 2015 at 3:10 PM, 林士博 <lin@repica.co.jp> wrote:
Is the "Vacancy"."ID" a primary key?
Or is unique in Vacancy table?

--
Regards

Robert Campbell
+61412062971
robcampbell73@gmail.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Slow Query