Обсуждение: query taking much longer since Postgres 8.4 upgrade

Поиск
Список
Период
Сортировка

query taking much longer since Postgres 8.4 upgrade

От
"Davenport, Julie"
Дата:

When I run the following query in Postgres 8.0, it runs in 61,509.372 ms

 

When I run it in Postgres 8.4, it runs in 397,857.472 ms

 

Here is the query:

 

select

course_id AS EXTERNAL_COURSE_KEY,

user_id AS EXTERNAL_PERSON_KEY,

'Student' AS ROLE,

'Y' AS AVAILABLE_IND

from course_user_link

where instructor = false

and course_id in

  (

  select course_id

  from course_control

  where to_char(course_begin_date,'YYYYMMDD') IN ( '20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307' )

  and course_delivery LIKE 'O%'

  and course_cross_section IS NULL

  )

and user_id not in (select user_id from instr_as_stutemp)

 

(table instr_as_stutemp has just one column and only 4 rows)

 

What new feature of Postgres 8.4 would be making the query run so much more slowly?  Is there a better way to rewrite the query for 8.4 to make it run faster?

 

Many thanks,

Julie

Re: query taking much longer since Postgres 8.4 upgrade

От
Andrew Sullivan
Дата:
Output of explain (and as likely, explain analyze) for this would be
helpful.

A

On Wed, Mar 16, 2011 at 10:49:24AM -0500, Davenport, Julie wrote:
> When I run the following query in Postgres 8.0, it runs in 61,509.372 ms
>
> When I run it in Postgres 8.4, it runs in 397,857.472 ms
>
> Here is the query:
>
> select
> course_id AS EXTERNAL_COURSE_KEY,
> user_id AS EXTERNAL_PERSON_KEY,
> 'Student' AS ROLE,
> 'Y' AS AVAILABLE_IND
> from course_user_link
> where instructor = false
> and course_id in
>   (
>   select course_id
>   from course_control
>   where to_char(course_begin_date,'YYYYMMDD') IN (
'20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307'
)
>   and course_delivery LIKE 'O%'
>   and course_cross_section IS NULL
>   )
> and user_id not in (select user_id from instr_as_stutemp)
>
> (table instr_as_stutemp has just one column and only 4 rows)
>
> What new feature of Postgres 8.4 would be making the query run so much more slowly?  Is there a better way to rewrite
thequery for 8.4 to make it run faster? 
>
> Many thanks,
> Julie

--
Andrew Sullivan
ajs@crankycanuck.ca

Re: query taking much longer since Postgres 8.4 upgrade

От
tv@fuzzy.cz
Дата:
> When I run the following query in Postgres 8.0, it runs in 61,509.372 ms
>
> When I run it in Postgres 8.4, it runs in 397,857.472 ms

As Andrew already pointed out, we need to se EXPLAIN ANALYZE output from
both machines to see why this happens. Are you running both queries on the
same data, or is there much more data in 8.4? Have you analyzed the tables
recently (or is autovacuum running)?

BTW I see two possible issues with this query:

1) There's not a suitable index on course_begin_date, i.e. there's no
index at all or the index is not on the expression used in the query.

2) There's not a suitable index on course_delivery (it needs to use the
proper operator class).

regards
Tomas



Re: query taking much longer since Postgres 8.4 upgrade

От
Merlin Moncure
Дата:
On Wed, Mar 16, 2011 at 10:49 AM, Davenport, Julie <JDavenport@ctcd.edu> wrote:
> When I run the following query in Postgres 8.0, it runs in 61,509.372 ms
>
>
>
> When I run it in Postgres 8.4, it runs in 397,857.472 ms
>
>
>
> Here is the query:
>
>
>
> select
>
> course_id AS EXTERNAL_COURSE_KEY,
>
> user_id AS EXTERNAL_PERSON_KEY,
>
> 'Student' AS ROLE,
>
> 'Y' AS AVAILABLE_IND
>
> from course_user_link
>
> where instructor = false
>
> and course_id in
>
>   (
>
>   select course_id
>
>   from course_control
>
>   where to_char(course_begin_date,'YYYYMMDD') IN (
>
'20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307'
> )
>
>   and course_delivery LIKE 'O%'
>
>   and course_cross_section IS NULL
>
>   )
>
> and user_id not in (select user_id from instr_as_stutemp)
>
>
>
> (table instr_as_stutemp has just one column and only 4 rows)
>
>
>
> What new feature of Postgres 8.4 would be making the query run so much more
> slowly?  Is there a better way to rewrite the query for 8.4 to make it run
> faster?

another common problem following upgrades are locale issues -- what is
your setting for lc_collate?

merlin

Re: query taking much longer since Postgres 8.4 upgrade

От
Merlin Moncure
Дата:
On Wed, Mar 16, 2011 at 2:14 PM, Davenport, Julie <JDavenport@ctcd.edu> wrote:
> Hello Merlin,
> Thank you very much for your reply.
> I don't see any setting for lc_collate.  I assume it would be in postgresql.conf file if it were there?  These are
theonly lc_... settings I see in postgresql.conf: 
>
> lc_messages = 'en_US.UTF-8'     # locale for system error message
> lc_monetary = 'en_US.UTF-8'     # locale for monetary formatting
> lc_numeric = 'en_US.UTF-8'     # locale for number formatting
> lc_time = 'en_US.UTF-8'        # locale for time formatting
>
> Am I looking in the wrong place?  Thanks much,
> Julie
>
>
> Julie A. Davenport
> julie.davenport@ctcd.edu
>
>
>
>
> -----Original Message-----
> From: Merlin Moncure [mailto:mmoncure@gmail.com]
> Sent: Wednesday, March 16, 2011 1:37 PM
> To: Davenport, Julie
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
>
> On Wed, Mar 16, 2011 at 10:49 AM, Davenport, Julie <JDavenport@ctcd.edu> wrote:
>> When I run the following query in Postgres 8.0, it runs in 61,509.372 ms
>>
>>
>>
>> When I run it in Postgres 8.4, it runs in 397,857.472 ms
>>
>>
>>
>> Here is the query:
>>
>>
>>
>> select
>>
>> course_id AS EXTERNAL_COURSE_KEY,
>>
>> user_id AS EXTERNAL_PERSON_KEY,
>>
>> 'Student' AS ROLE,
>>
>> 'Y' AS AVAILABLE_IND
>>
>> from course_user_link
>>
>> where instructor = false
>>
>> and course_id in
>>
>>   (
>>
>>   select course_id
>>
>>   from course_control
>>
>>   where to_char(course_begin_date,'YYYYMMDD') IN (
>>
'20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307'
>> )
>>
>>   and course_delivery LIKE 'O%'
>>
>>   and course_cross_section IS NULL
>>
>>   )
>>
>> and user_id not in (select user_id from instr_as_stutemp)
>>
>>
>>
>> (table instr_as_stutemp has just one column and only 4 rows)
>>
>>
>>
>> What new feature of Postgres 8.4 would be making the query run so much more
>> slowly?  Is there a better way to rewrite the query for 8.4 to make it run
>> faster?
>
> another common problem following upgrades are locale issues -- what is
> your setting for lc_collate?


from psql, do:
show lc_collate;
more than likely, your lc_collate is set to UTF8, that means that
where a like 'foo%' will not use index, which is starting to sound
like your problem.

unfortunately, database collation is only settable when database is
created (or more typically with initdb).

merlin

Re: query taking much longer since Postgres 8.4 upgrade

От
Tomas Vondra
Дата:
Dne 16.3.2011 20:32, Merlin Moncure napsal(a):
> On Wed, Mar 16, 2011 at 2:14 PM, Davenport, Julie <JDavenport@ctcd.edu> wrote:
>> Hello Merlin,
>> Thank you very much for your reply.
>> I don't see any setting for lc_collate.  I assume it would be in postgresql.conf file if it were there?  These are
theonly lc_... settings I see in postgresql.conf: 
>>
>> lc_messages = 'en_US.UTF-8'     # locale for system error message
>> lc_monetary = 'en_US.UTF-8'     # locale for monetary formatting
>> lc_numeric = 'en_US.UTF-8'     # locale for number formatting
>> lc_time = 'en_US.UTF-8'        # locale for time formatting
>>
>> Am I looking in the wrong place?  Thanks much,
>> Julie
>>
>>
>> Julie A. Davenport
>> julie.davenport@ctcd.edu
>>
>>
>>
>>
>> -----Original Message-----
>> From: Merlin Moncure [mailto:mmoncure@gmail.com]
>> Sent: Wednesday, March 16, 2011 1:37 PM
>> To: Davenport, Julie
>> Cc: pgsql-general@postgresql.org
>> Subject: Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
>>
>> On Wed, Mar 16, 2011 at 10:49 AM, Davenport, Julie <JDavenport@ctcd.edu> wrote:
>>> When I run the following query in Postgres 8.0, it runs in 61,509.372 ms
>>>
>>>
>>>
>>> When I run it in Postgres 8.4, it runs in 397,857.472 ms
>>>
>>>
>>>
>>> Here is the query:
>>>
>>>
>>>
>>> select
>>>
>>> course_id AS EXTERNAL_COURSE_KEY,
>>>
>>> user_id AS EXTERNAL_PERSON_KEY,
>>>
>>> 'Student' AS ROLE,
>>>
>>> 'Y' AS AVAILABLE_IND
>>>
>>> from course_user_link
>>>
>>> where instructor = false
>>>
>>> and course_id in
>>>
>>>   (
>>>
>>>   select course_id
>>>
>>>   from course_control
>>>
>>>   where to_char(course_begin_date,'YYYYMMDD') IN (
>>>
'20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307'
>>> )
>>>
>>>   and course_delivery LIKE 'O%'
>>>
>>>   and course_cross_section IS NULL
>>>
>>>   )
>>>
>>> and user_id not in (select user_id from instr_as_stutemp)
>>>
>>>
>>>
>>> (table instr_as_stutemp has just one column and only 4 rows)
>>>
>>>
>>>
>>> What new feature of Postgres 8.4 would be making the query run so much more
>>> slowly?  Is there a better way to rewrite the query for 8.4 to make it run
>>> faster?
>>
>> another common problem following upgrades are locale issues -- what is
>> your setting for lc_collate?
>
>
> from psql, do:
> show lc_collate;
> more than likely, your lc_collate is set to UTF8, that means that
> where a like 'foo%' will not use index, which is starting to sound
> like your problem.

That probably is not the root cause here, according to a description
I've received off the list. There are no indexes at all (which is an
issue on it's own), but it means this kind of issue is not possible.

Or maybe there's a different lc_collate with much more complex rules?
That might result in very CPU-expensive operations, and as there's a lot
of sequential scans etc. (i.e. a lot of rows to process).

regards
Tomas

Re: query taking much longer since Postgres 8.4 upgrade

От
Tomas Vondra
Дата:
Dne 16.3.2011 21:38, Davenport, Julie napsal(a):
> OK, I did the explain analyze on both sides (using a file for output instead) and used the tool you suggested.
>
> 8.0 - http://explain.depesz.com/s/Wam
> 8.4 - http://explain.depesz.com/s/asJ

Great, that's exactly what I asked for. I'll repost that to the mailing
list so that the others can check it too.

> When I run the queries I get 59,881 rows on the 8.0 side and 59,880 on the 8.4 side, which is what I expect because
8.4side was updated a couple hours later and some minor changes make sense. 

Hm, obviously both versions got the row estimates wrong, but the 8.4
difference (200x) is much bigger that the 8.0 (10x). This might be one
of the reasons why a different plan is chosen.

Anyway both versions underestimate the course_control subquery, as they
believe there will be 1 row only, but in reality there's 2882 of them :-(


> After your first email I did a vacuum full analyze on the 8.4 side on each of the tables in the schema that the views
cover,then ran the query again, and it took even longer - up from 397,857 ms to 412,862 ms.  Another query that was
taking597248 ms before the vacuum/analyze took 617526 ms after. I don't understand why, but this is generally the
experiencewe've had with vacuum/analyze on these particular tables. We do large numbers of deletes and inserts to them
everyday, so I would think they would benefit from it. 

OK, so the tables were in a quite good shape - not bloated etc. The
slight increase is negligible I guess, the vacuum probably removed the
data from shared buffers or something like that.

> I did do a vacuum full analyze on instr_as_stutemp before the explain analyze, as you suggested.

OK, now the row estimate is correct

Seq Scan on instr_as_stutemp (cost=0.00..1.04 rows=4 width=9) (actual
time=0.052..0.098 rows=4 loops=1)

> I will consider the indexes and do some benchmark testing (I have considered the 'drop-load-reindex' trick in the
past).I'm sure increasing maintenance_work_mem will help. 

OK. But the question why the plan changed this way still remains
unanswered (or maybe it does and I don't see it).

One thing I've noticed is this difference in estimates:

8.0:
====================================================================
Subquery Scan course_control (cost=9462700.13..9492043.53 rows=1
width=32) (actual time=43368.204..45795.239 rows=2882 loops=1)
    * Filter: (((to_char(course_begin_date, 'YYYYMMDD'::text) =
'20100412'::text) OR (to_char(course_begin_date, 'YYYYMMDD'::text) =
...
'20110307'::text)) AND ((course_delivery)::text ~~ 'O%'::text) AND
(course_cross_section IS NULL))

8.4:
====================================================================
Subquery Scan course_control (cost=18710.12..548966.51 rows=1 width=32)
(actual time=1632.403..4438.949 rows=2882 loops=1)
    * Filter: ((course_control.course_cross_section IS NULL) AND
((course_control.course_delivery)::text ~~ 'O%'::text) AND
(to_char(course_control.course_begin_date, 'YYYYMMDD'::text) = ANY
('{20100412,20100510,...,20110110,20110207,20110307}'::text[])))

I.e. both verions use seqscan, both estimate the same number of rows
(incorrectly), yet the estimated cost is very different (9492043 vs.
548966).

Maybe the cost estimation really changed between 8.0 and 8.4, but just
for sure - what are the cost values? I mean what is set for those config
values:

cpu_index_tuple_cost
cpu_operator_cost
cpu_tuple_cost
random_page_cost
seq_page_cost
work_mem

(use 'show' to see the actual value).

regards
Tomas

Re: query taking much longer since Postgres 8.4 upgrade

От
Pavel Stehule
Дата:
2011/3/16 Tomas Vondra <tv@fuzzy.cz>:
> Dne 16.3.2011 21:38, Davenport, Julie napsal(a):
>> OK, I did the explain analyze on both sides (using a file for output instead) and used the tool you suggested.
>>
>> 8.0 - http://explain.depesz.com/s/Wam
>> 8.4 - http://explain.depesz.com/s/asJ
>
> Great, that's exactly what I asked for. I'll repost that to the mailing
> list so that the others can check it too.
>
>> When I run the queries I get 59,881 rows on the 8.0 side and 59,880 on the 8.4 side, which is what I expect because
8.4side was updated a couple hours later and some minor changes make sense. 
>
> Hm, obviously both versions got the row estimates wrong, but the 8.4
> difference (200x) is much bigger that the 8.0 (10x). This might be one
> of the reasons why a different plan is chosen.

the expression

to_char(course_begin_date, 'YYYYMMDD'::text) = '20101025'::text

should be a problem

much better is test on equality in date domain like:

course_begin_date = to_date('20101025', 'YYYYMMDD')

this is faster and probably better estimated

Regards

Pavel Stehule

Re: query taking much longer since Postgres 8.4 upgrade

От
Tomas Vondra
Дата:
Dne 16.3.2011 22:31, Pavel Stehule napsal(a):
> 2011/3/16 Tomas Vondra <tv@fuzzy.cz>:
>> Dne 16.3.2011 21:38, Davenport, Julie napsal(a):
>>> OK, I did the explain analyze on both sides (using a file for output instead) and used the tool you suggested.
>>>
>>> 8.0 - http://explain.depesz.com/s/Wam
>>> 8.4 - http://explain.depesz.com/s/asJ
>>
>> Great, that's exactly what I asked for. I'll repost that to the mailing
>> list so that the others can check it too.
>>
>>> When I run the queries I get 59,881 rows on the 8.0 side and 59,880 on the 8.4 side, which is what I expect because
8.4side was updated a couple hours later and some minor changes make sense. 
>>
>> Hm, obviously both versions got the row estimates wrong, but the 8.4
>> difference (200x) is much bigger that the 8.0 (10x). This might be one
>> of the reasons why a different plan is chosen.
>
> the expression
>
> to_char(course_begin_date, 'YYYYMMDD'::text) = '20101025'::text
>
> should be a problem
>
> much better is test on equality in date domain like:
>
> course_begin_date = to_date('20101025', 'YYYYMMDD')
>
> this is faster and probably better estimated

Which is not going to work if the course_begin_date column is a
timestamp, because of the time part.

But yes, there are several ways to improve this query, yet it does not
explain why the 8.4 is so much slower.

Tomas

Re: query taking much longer since Postgres 8.4 upgrade

От
Tomas Vondra
Дата:
OK, so the cost constants are equal in both versions (the only
difference is due to change of the default value).

Just out of curiosity, have you tried to throw a bit more work_mem at
the query? Try something like 8MB or 16MB so - just do this

db=$ set work_mem=8192

and then run the query (the change is valid in that session only, the
other sessions will still use 1MB).

Most of the sorts was performed on-disk insted of in memory, and it
might result in better plan.

regards
Tomas

Dne 16.3.2011 22:40, Davenport, Julie napsal(a):
> Tomas,
> Here are the settings on the 8.0 side:
>
> srn_mst=# show cpu_index_tuple_cost;
>  cpu_index_tuple_cost
> ----------------------
>  0.001
> (1 row)
>
> srn_mst=# show cpu_operator_cost;
>  cpu_operator_cost
> -------------------
>  0.0025
> (1 row)
>
> srn_mst=# show cpu_tuple_cost;
>  cpu_tuple_cost
> ----------------
>  0.01
> (1 row)
>
> srn_mst=# show random_page_cost;
>  random_page_cost
> ------------------
>  4
> (1 row)
>
> srn_mst=# show seq_page_cost;
> ERROR:  unrecognized configuration parameter "seq_page_cost"
>
> srn_mst=# show work_mem;
>  work_mem
> ----------
>  1024
> (1 row)
>
> Here are the settings on the 8.4 side:
>
> srn_mst=# show cpu_index_tuple_cost;
>  cpu_index_tuple_cost
> ----------------------
>  0.005
> (1 row)
>
> srn_mst=# show cpu_operator_cost;
>  cpu_operator_cost
> -------------------
>  0.0025
> (1 row)
>
> srn_mst=# show cpu_tuple_cost;
>  cpu_tuple_cost
> ----------------
>  0.01
> (1 row)
>
> srn_mst=# show random_page_cost;
>  random_page_cost
> ------------------
>  4
> (1 row)
>
> srn_mst=# show seq_page_cost;
>  seq_page_cost
> ---------------
>  1
> (1 row)
>
> srn_mst=# show work_mem;
>  work_mem
> ----------
>  1MB
> (1 row)
>
> Thanks,
> Julie
>
>
>
> -----Original Message-----
> From: Tomas Vondra [mailto:tv@fuzzy.cz]
> Sent: Wednesday, March 16, 2011 4:23 PM
> To: pgsql-general@postgresql.org
> Cc: Davenport, Julie
> Subject: Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
>
> Dne 16.3.2011 21:38, Davenport, Julie napsal(a):
>> OK, I did the explain analyze on both sides (using a file for output instead) and used the tool you suggested.
>>
>> 8.0 - http://explain.depesz.com/s/Wam
>> 8.4 - http://explain.depesz.com/s/asJ
>
> Great, that's exactly what I asked for. I'll repost that to the mailing
> list so that the others can check it too.
>
>> When I run the queries I get 59,881 rows on the 8.0 side and 59,880 on the 8.4 side, which is what I expect because
8.4side was updated a couple hours later and some minor changes make sense. 
>
> Hm, obviously both versions got the row estimates wrong, but the 8.4
> difference (200x) is much bigger that the 8.0 (10x). This might be one
> of the reasons why a different plan is chosen.
>
> Anyway both versions underestimate the course_control subquery, as they
> believe there will be 1 row only, but in reality there's 2882 of them :-(
>
>
>> After your first email I did a vacuum full analyze on the 8.4 side on each of the tables in the schema that the
viewscover, then ran the query again, and it took even longer - up from 397,857 ms to 412,862 ms.  Another query that
wastaking 597248 ms before the vacuum/analyze took 617526 ms after. I don't understand why, but this is generally the
experiencewe've had with vacuum/analyze on these particular tables. We do large numbers of deletes and inserts to them
everyday, so I would think they would benefit from it. 
>
> OK, so the tables were in a quite good shape - not bloated etc. The
> slight increase is negligible I guess, the vacuum probably removed the
> data from shared buffers or something like that.
>
>> I did do a vacuum full analyze on instr_as_stutemp before the explain analyze, as you suggested.
>
> OK, now the row estimate is correct
>
> Seq Scan on instr_as_stutemp (cost=0.00..1.04 rows=4 width=9) (actual
> time=0.052..0.098 rows=4 loops=1)
>
>> I will consider the indexes and do some benchmark testing (I have considered the 'drop-load-reindex' trick in the
past).I'm sure increasing maintenance_work_mem will help. 
>
> OK. But the question why the plan changed this way still remains
> unanswered (or maybe it does and I don't see it).
>
> One thing I've noticed is this difference in estimates:
>
> 8.0:
> ====================================================================
> Subquery Scan course_control (cost=9462700.13..9492043.53 rows=1
> width=32) (actual time=43368.204..45795.239 rows=2882 loops=1)
>     * Filter: (((to_char(course_begin_date, 'YYYYMMDD'::text) =
> '20100412'::text) OR (to_char(course_begin_date, 'YYYYMMDD'::text) =
> ...
> '20110307'::text)) AND ((course_delivery)::text ~~ 'O%'::text) AND
> (course_cross_section IS NULL))
>
> 8.4:
> ====================================================================
> Subquery Scan course_control (cost=18710.12..548966.51 rows=1 width=32)
> (actual time=1632.403..4438.949 rows=2882 loops=1)
>     * Filter: ((course_control.course_cross_section IS NULL) AND
> ((course_control.course_delivery)::text ~~ 'O%'::text) AND
> (to_char(course_control.course_begin_date, 'YYYYMMDD'::text) = ANY
> ('{20100412,20100510,...,20110110,20110207,20110307}'::text[])))
>
> I.e. both verions use seqscan, both estimate the same number of rows
> (incorrectly), yet the estimated cost is very different (9492043 vs.
> 548966).
>
> Maybe the cost estimation really changed between 8.0 and 8.4, but just
> for sure - what are the cost values? I mean what is set for those config
> values:
>
> cpu_index_tuple_cost
> cpu_operator_cost
> cpu_tuple_cost
> random_page_cost
> seq_page_cost
> work_mem
>
> (use 'show' to see the actual value).
>
> regards
> Tomas


Re: query taking much longer since Postgres 8.4 upgrade

От
Pavel Stehule
Дата:
2011/3/16 Davenport, Julie <JDavenport@ctcd.edu>:
> Yes, the column course_begin_date is a timestamp, so that would not work in this instance, but I will keep that in
mindfor future use elsewhere.  I agree, there are ways to rewrite this query, just wondering which is best to take
advantageof 8.4. 
> Thanks much.
>
>

ok, sorry, do column_course_begin::date = ...

:)

Pavel

>
>
> -----Original Message-----
> From: Tomas Vondra [mailto:tv@fuzzy.cz]
> Sent: Wednesday, March 16, 2011 4:40 PM
> To: Pavel Stehule
> Cc: pgsql-general@postgresql.org; Davenport, Julie
> Subject: Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
>
> Dne 16.3.2011 22:31, Pavel Stehule napsal(a):
>> 2011/3/16 Tomas Vondra <tv@fuzzy.cz>:
>>> Dne 16.3.2011 21:38, Davenport, Julie napsal(a):
>>>> OK, I did the explain analyze on both sides (using a file for output instead) and used the tool you suggested.
>>>>
>>>> 8.0 - http://explain.depesz.com/s/Wam
>>>> 8.4 - http://explain.depesz.com/s/asJ
>>>
>>> Great, that's exactly what I asked for. I'll repost that to the mailing
>>> list so that the others can check it too.
>>>
>>>> When I run the queries I get 59,881 rows on the 8.0 side and 59,880 on the 8.4 side, which is what I expect
because8.4 side was updated a couple hours later and some minor changes make sense. 
>>>
>>> Hm, obviously both versions got the row estimates wrong, but the 8.4
>>> difference (200x) is much bigger that the 8.0 (10x). This might be one
>>> of the reasons why a different plan is chosen.
>>
>> the expression
>>
>> to_char(course_begin_date, 'YYYYMMDD'::text) = '20101025'::text
>>
>> should be a problem
>>
>> much better is test on equality in date domain like:
>>
>> course_begin_date = to_date('20101025', 'YYYYMMDD')
>>
>> this is faster and probably better estimated
>
> Which is not going to work if the course_begin_date column is a
> timestamp, because of the time part.
>
> But yes, there are several ways to improve this query, yet it does not
> explain why the 8.4 is so much slower.
>
> Tomas
>

Re: query taking much longer since Postgres 8.4 upgrade

От
"Davenport, Julie"
Дата:
FYI, I implemented Pavel's suggestion to use:

 course_begin_date::date IN (
'20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307'
)

instead of 

to_char(course_begin_date,'YYYYMMDD') IN (
'20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307'
)

and it did help significantly.  The overall script (where there are several queries like this one) was taking 7.5 mins
onPostgres 8.0 and initially took 20 mins on 8.4; but now after this change (::date) it only takes 14.9 mins.
Progress! I have not yet had time to try Tomas' suggestion of bumping up the work_mem first (trying to figure out how
todo that from within a coldfusion script).  Many thanks for all your help guys!
 

Julie


-----Original Message-----
From: Pavel Stehule [mailto:pavel.stehule@gmail.com] 
Sent: Thursday, March 17, 2011 12:13 AM
To: Davenport, Julie
Cc: Tomas Vondra; pgsql-general@postgresql.org
Subject: Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade

2011/3/16 Davenport, Julie <JDavenport@ctcd.edu>:
> Yes, the column course_begin_date is a timestamp, so that would not work in this instance, but I will keep that in
mindfor future use elsewhere.  I agree, there are ways to rewrite this query, just wondering which is best to take
advantageof 8.4.
 
> Thanks much.
>
>

ok, sorry, do column_course_begin::date = ...

:)

Pavel

>
>
> -----Original Message-----
> From: Tomas Vondra [mailto:tv@fuzzy.cz]
> Sent: Wednesday, March 16, 2011 4:40 PM
> To: Pavel Stehule
> Cc: pgsql-general@postgresql.org; Davenport, Julie
> Subject: Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade
>
> Dne 16.3.2011 22:31, Pavel Stehule napsal(a):
>> 2011/3/16 Tomas Vondra <tv@fuzzy.cz>:
>>> Dne 16.3.2011 21:38, Davenport, Julie napsal(a):
>>>> OK, I did the explain analyze on both sides (using a file for output instead) and used the tool you suggested.
>>>>
>>>> 8.0 - http://explain.depesz.com/s/Wam
>>>> 8.4 - http://explain.depesz.com/s/asJ
>>>
>>> Great, that's exactly what I asked for. I'll repost that to the mailing
>>> list so that the others can check it too.
>>>
>>>> When I run the queries I get 59,881 rows on the 8.0 side and 59,880 on the 8.4 side, which is what I expect
because8.4 side was updated a couple hours later and some minor changes make sense.
 
>>>
>>> Hm, obviously both versions got the row estimates wrong, but the 8.4
>>> difference (200x) is much bigger that the 8.0 (10x). This might be one
>>> of the reasons why a different plan is chosen.
>>
>> the expression
>>
>> to_char(course_begin_date, 'YYYYMMDD'::text) = '20101025'::text
>>
>> should be a problem
>>
>> much better is test on equality in date domain like:
>>
>> course_begin_date = to_date('20101025', 'YYYYMMDD')
>>
>> this is faster and probably better estimated
>
> Which is not going to work if the course_begin_date column is a
> timestamp, because of the time part.
>
> But yes, there are several ways to improve this query, yet it does not
> explain why the 8.4 is so much slower.
>
> Tomas
>

Re: query taking much longer since Postgres 8.4 upgrade

От
Tomas Vondra
Дата:
Dne 17.3.2011 19:29, Davenport, Julie napsal(a):
> I have not yet had time to try Tomas' suggestion of bumping up the work_mem first (trying to figure out how to do
thatfrom within a coldfusion script).  Many thanks for all your help guys! 

Well, just execute this 'SQL query' just like the other ones

set work_mem='8MB'

and it will increase the amount of memory for that connection.

Tomas


Re: query taking much longer since Postgres 8.4 upgrade

От
"Davenport, Julie"
Дата:
This helped, is now down from 14.9 min to 10.9 min to run the entire script.  Thanks.


>>>>Dne 17.3.2011 19:29, Davenport, Julie napsal(a):
>>>> I have not yet had time to try Tomas' suggestion of bumping up the work_mem first (trying to >>>>figure out how to
dothat from within a coldfusion script).  Many thanks for all your help guys!
 

>>Well, just execute this 'SQL query' just like the other ones

>>set work_mem='8MB'

>>and it will increase the amount of memory for that connection.

>>Tomas


Re: query taking much longer since Postgres 8.4 upgrade

От
Tomas Vondra
Дата:
Dne 18.3.2011 16:42, Davenport, Julie napsal(a):
> This helped, is now down from 14.9 min to 10.9 min to run the entire script.  Thanks.

Still, it's way slower than the 8.0 :-(

regards
Tomas

Re: query taking much longer since Postgres 8.4 upgrade

От
Merlin Moncure
Дата:
On Fri, Mar 18, 2011 at 10:42 AM, Davenport, Julie <JDavenport@ctcd.edu> wrote:
> This helped, is now down from 14.9 min to 10.9 min to run the entire script.  Thanks.

can you try disabling nestloop and see what happens?   In the session,
before running the query, isssue:
set enable_nestloop = false;

merlin

Re: query taking much longer since Postgres 8.4 upgrade

От
Tomas Vondra
Дата:
Dne 18.3.2011 19:18, Merlin Moncure napsal(a):
> On Fri, Mar 18, 2011 at 10:42 AM, Davenport, Julie <JDavenport@ctcd.edu> wrote:
>> This helped, is now down from 14.9 min to 10.9 min to run the entire script.  Thanks.
>
> can you try disabling nestloop and see what happens?   In the session,
> before running the query, isssue:
> set enable_nestloop = false;

Yes, that'd be interesting. And provide 'exaplain analyze' as before
(using explain.depesz.com), please.

And a bit unrelated recommendation - based on the settings (cost
constants, work_mem etc.) it seems guess you have a default untuned
postgresql.conf. Is that right, Julie? In this case you can
significantly improve the load performance by several settings:

1) increase checkpoint_segments (default is 3, use 12 or something like
that - depends on the if there are warnings about checkpoint segments in
the log)

2) increase wal_buffers (just set it to 16MB and forget it)

The effect depends on the amount of data loaded and other things, but
it's worth a try.

regards
Tomas

Re: query taking much longer since Postgres 8.4 upgrade

От
tv@fuzzy.cz
Дата:
> Incredible!  Setting enable_nestloop off temporarily for the run of this
> script made it run in less than a minute (had been running in 10 or 11
> minutes).  I think you have found a solution for many of my slow running
> scripts that use these same type of joins.  Thanks again.
> Julie

Nice. Can you post EXPLAIN ANALYZE again, so that we can see why this plan
was evaluated as as more expensive before disabling nested loops?

regards
Tomas


Re: query taking much longer since Postgres 8.4 upgrade

От
Merlin Moncure
Дата:
On Mon, Mar 21, 2011 at 11:32 AM,  <tv@fuzzy.cz> wrote:
>> Incredible!  Setting enable_nestloop off temporarily for the run of this
>> script made it run in less than a minute (had been running in 10 or 11
>> minutes).  I think you have found a solution for many of my slow running
>> scripts that use these same type of joins.  Thanks again.
>> Julie
>
> Nice. Can you post EXPLAIN ANALYZE again, so that we can see why this plan
> was evaluated as as more expensive before disabling nested loops?

well the problem is obvious -- the planner is estimating ~ 250 loops,
when it in fact has to do ~ 60k.  That's a two orders of magnitude
miss.

merlin

Re: query taking much longer since Postgres 8.4 upgrade

От
tv@fuzzy.cz
Дата:
> On Mon, Mar 21, 2011 at 11:32 AM,  <tv@fuzzy.cz> wrote:
>>> Incredible!  Setting enable_nestloop off temporarily for the run of
>>> this
>>> script made it run in less than a minute (had been running in 10 or 11
>>> minutes).  I think you have found a solution for many of my slow
>>> running
>>> scripts that use these same type of joins.  Thanks again.
>>> Julie
>>
>> Nice. Can you post EXPLAIN ANALYZE again, so that we can see why this
>> plan
>> was evaluated as as more expensive before disabling nested loops?
>
> well the problem is obvious -- the planner is estimating ~ 250 loops,
> when it in fact has to do ~ 60k.  That's a two orders of magnitude
> miss.

Yeah, you're right, although I think the estimate is 1 loop vs. 2882 loops
in reality. The 250 vs. 60k is related to the result set.

Tomas


Re: query taking much longer since Postgres 8.4 upgrade

От
"Davenport, Julie"
Дата:
Incredible!  Setting enable_nestloop off temporarily for the run of this script made it run in less than a minute (had
beenrunning in 10 or 11 minutes).  I think you have found a solution for many of my slow running scripts that use these
sametype of joins.  Thanks again. 
Julie


>>-----Original Message-----
>>From: Merlin Moncure [mailto:mmoncure@gmail.com]
>>Sent: Friday, March 18, 2011 1:18 PM
>>To: Davenport, Julie
>>Cc: Tomas Vondra; pgsql-general@postgresql.org
>>Subject: Re: [GENERAL] query taking much longer since Postgres 8.4 upgrade

>>>>On Fri, Mar 18, 2011 at 10:42 AM, Davenport, Julie <JDavenport@ctcd.edu> wrote:
>>>> This helped, is now down from 14.9 min to 10.9 min to run the entire script.  Thanks.

>>can you try disabling nestloop and see what happens?   In the session,
>>before running the query, isssue:
>>set enable_nestloop = false;

>>merlin

Re: query taking much longer since Postgres 8.4 upgrade

От
"F. BROUARD / SQLpro"
Дата:
Try this :

1) rewrite your query as is :

select course_id AS EXTERNAL_COURSE_KEY,
        user_id AS EXTERNAL_PERSON_KEY,
        'Student' AS ROLE,
        'Y' AS AVAILABLE_IND
from   course_user_link AS CUL
        INNER JOIN course_control AS CC
              ON CUL.course_id = CC.course_id
where  CUL.instructor = false
   AND  CC.course_begin_date::date IN (

'20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307'

)
   and  CC.course_delivery LIKE 'O%'
   and  CC.course_cross_section IS NULL
   and  NOT EXISTS(SELECT *
                   FROM   instr_as_stutemp AS IAS
                   WHERE  C.user_id = IAS.user_id)

2) prefix all tables by your SQL schema (public by default)

3) create theses indexes (if not) :

CREATE INDEX X_CUL_INS_CRS_UID
     ON course_user_link (instructor,
                          course_id,
                          user_id);
CREATE INDEX X_CC_CDV_CCS_CBD_CID
     ON course_control (course_delivery,
                        course_cross_section,
                        course_begin_date,
                        course_id);
CREATE INDEX X_IAS ON IAS_UID
     ON instr_as_stutemp (user_id);

4) beware of using reserved words for the name of a database object like
ROLE !

A +

Le 16/03/2011 16:49, Davenport, Julie a écrit :
> select
>
> course_id AS EXTERNAL_COURSE_KEY,
>
> user_id AS EXTERNAL_PERSON_KEY,
>
> 'Student' AS ROLE,
>
> 'Y' AS AVAILABLE_IND
>
> from course_user_link
>
> where instructor = false
>
> and course_id in
>
>    (
>
>    select course_id
>
>    from course_control
>
>    where to_char(course_begin_date,'YYYYMMDD') IN (
>
'20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307'
> )
>
>    and course_delivery LIKE 'O%'
>
>    and course_cross_section IS NULL
>
>    )
>
> and user_id not in (select user_id from instr_as_stutemp)
>


--
Frédéric BROUARD - expert SGBDR et SQL - MVP SQL Server - 06 11 86 40 66
Le site sur le langage SQL et les SGBDR  :  http://sqlpro.developpez.com
Enseignant Arts & Métiers PACA, ISEN Toulon et CESI/EXIA Aix en Provence
Audit, conseil, expertise, formation, modélisation, tuning, optimisation
*********************** http://www.sqlspot.com *************************


Re: query taking much longer since Postgres 8.4 upgrade

От
Alban Hertroys
Дата:
On 22 Mar 2011, at 24:20, F. BROUARD / SQLpro wrote:

> Try this :
>
> 1) rewrite your query as is :

Indeed, a join is probably more efficient than a big IN-list. Good point ;)

>
> select course_id AS EXTERNAL_COURSE_KEY,
>       user_id AS EXTERNAL_PERSON_KEY,
>       'Student' AS ROLE,
>       'Y' AS AVAILABLE_IND
> from   course_user_link AS CUL
>       INNER JOIN course_control AS CC
>             ON CUL.course_id = CC.course_id
> where  CUL.instructor = false
>  AND  CC.course_begin_date::date IN (
'20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307'
)
>  and  CC.course_delivery LIKE 'O%'
>  and  CC.course_cross_section IS NULL

>  and  NOT EXISTS(SELECT *
>                  FROM   instr_as_stutemp AS IAS
>                  WHERE  C.user_id = IAS.user_id)

You could write this last condition as a LEFT OUTER JOIN even, where valid records match IAS.user_id IS NULL.

> 2) prefix all tables by your SQL schema (public by default)

I don't think that will matter much, it might shave off a tiny bit of planner execution time if tables aren't in the
firstschema in the search_path, but otherwise not worth the hassle. 

> 3) create theses indexes (if not) :

I think the usual convention is to suffix with _idx instead of prefixing with x_. It's what automatically created
indexesdo anyway. That's a matter of personal preference though. 

> CREATE INDEX X_CUL_INS_CRS_UID
>    ON course_user_link (instructor,
>                         course_id,
>                         user_id);
> CREATE INDEX X_CC_CDV_CCS_CBD_CID
>    ON course_control (course_delivery,
>                       course_cross_section,
>                       course_begin_date,
>                       course_id);

If queries where cross_section IS NULL (especially in combination with the other fields in this index) are very common,
whilethe opposite is quite rare, you may want to add a WHERE-clause with that condition to this index. 

> CREATE INDEX X_IAS ON IAS_UID
>    ON instr_as_stutemp (user_id);

> 4) beware of using reserved words for the name of a database object like ROLE !

Good advise, but not really needed in the case of aliases I think. There's also the possibility to quote those fields
asidentifiers (which also makes them case-sensitive, so beware!) - in this case that would be "ROLE". 

> Le 16/03/2011 16:49, Davenport, Julie a écrit :
>> select
>>
>> course_id AS EXTERNAL_COURSE_KEY,
>>
>> user_id AS EXTERNAL_PERSON_KEY,
>>
>> 'Student' AS ROLE,
>>
>> 'Y' AS AVAILABLE_IND
>>
>> from course_user_link
>>
>> where instructor = false
>>
>> and course_id in
>>
>>   (
>>
>>   select course_id
>>
>>   from course_control
>>
>>   where to_char(course_begin_date,'YYYYMMDD') IN (
>>
'20100412','20100510','20100607','20100712','20100830','20100927','20101025','20101122','20101213','20110110','20110207','20110307'
>> )
>>
>>   and course_delivery LIKE 'O%'
>>
>>   and course_cross_section IS NULL
>>
>>   )
>>
>> and user_id not in (select user_id from instr_as_stutemp)
>>
>
>
> --
> Frédéric BROUARD - expert SGBDR et SQL - MVP SQL Server - 06 11 86 40 66
> Le site sur le langage SQL et les SGBDR  :  http://sqlpro.developpez.com
> Enseignant Arts & Métiers PACA, ISEN Toulon et CESI/EXIA Aix en Provence
> Audit, conseil, expertise, formation, modélisation, tuning, optimisation
> *********************** http://www.sqlspot.com *************************
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4d8852ad651346607679948!



Re: query taking much longer since Postgres 8.4 upgrade

От
"Davenport, Julie"
Дата:
Here is the explain plan of the new query (same as original but with changes to the Date subquery to use ::date instead
ofto_char to truncate the time portion), when it is run after doing these 2 sets first:
 

set work_mem='8MB';

set enable_nestloop = false;

explain plan (8.4):  http://explain.depesz.com/s/tw8

thanks again for the help.
Julie




-----Original Message-----
From: tv@fuzzy.cz [mailto:tv@fuzzy.cz] 
Sent: Monday, March 21, 2011 11:33 AM
To: Davenport, Julie
Cc: Merlin Moncure; Tomas Vondra; pgsql-general@postgresql.org
Subject: RE: [GENERAL] query taking much longer since Postgres 8.4 upgrade

> Incredible!  Setting enable_nestloop off temporarily for the run of this
> script made it run in less than a minute (had been running in 10 or 11
> minutes).  I think you have found a solution for many of my slow running
> scripts that use these same type of joins.  Thanks again.
> Julie

Nice. Can you post EXPLAIN ANALYZE again, so that we can see why this plan
was evaluated as as more expensive before disabling nested loops?

regards
Tomas