Обсуждение: [GENERAL] Why is posgres picking a suboptimal plan for this query?

Поиск
Список
Период
Сортировка

[GENERAL] Why is posgres picking a suboptimal plan for this query?

От
Sam Saffron
Дата:
I have this query that is not picking the right index unless I hard code dates:


SELECT "topics".* FROM "topics"
WHERE topics.last_unread_at >= '2017-05-11 20:56:24'

"Index Scan using index_topics_on_last_unread_at on topics
(cost=0.41..8.43 rows=1 width=725) (actual time=0.005..0.065 rows=5
loops=1)"
" Index Cond: (last_unread_at >= '2017-05-11 20:56:24'::timestamp
without time zone)"
"Planning time: 0.136 ms"
"Execution time: 0.087 ms"


SELECT "topics".* FROM "topics"
WHERE topics.last_unread_at >= (select first_topic_unread_at from
user_stats us where us.user_id = 1)

"Seq Scan on topics (cost=8.30..24030.12 rows=20555 width=725) (actual
time=3.186..59.636 rows=5 loops=1)"
" Filter: (last_unread_at >= $0)"
" Rows Removed by Filter: 61660"
" InitPlan 1 (returns $0)"
" -> Index Scan using user_stats_pkey on user_stats us
(cost=0.29..8.30 rows=1 width=8) (actual time=0.008..0.009 rows=1
loops=1)"
" Index Cond: (user_id = 1)"
"Planning time: 0.147 ms"
"Execution time: 59.671 ms"

select first_topic_unread_at from user_stats us where us.user_id = 1
"2017-05-11 20:56:24.842356"

The results here simply do not make sense to me, should I be piping
dates in here to avoid this issue and running 2 queries instead of 1?


Re: [GENERAL] Why is posgres picking a suboptimal plan for this query?

От
Jeff Janes
Дата:
On Wed, May 24, 2017 at 1:42 PM, Sam Saffron <sam.saffron@gmail.com> wrote:
I have this query that is not picking the right index unless I hard code dates:


SELECT "topics".* FROM "topics"
WHERE topics.last_unread_at >= '2017-05-11 20:56:24'

"Index Scan using index_topics_on_last_unread_at on topics
(cost=0.41..8.43 rows=1 width=725) (actual time=0.005..0.065 rows=5
loops=1)"
" Index Cond: (last_unread_at >= '2017-05-11 20:56:24'::timestamp
without time zone)"
"Planning time: 0.136 ms"
"Execution time: 0.087 ms"


PostgreSQL knows that few entries come after 2017-05-11 (it thinks 1, actually 5) and comes up with a plan which works well for that situation.

SELECT "topics".* FROM "topics"
WHERE topics.last_unread_at >= (select first_topic_unread_at from
user_stats us where us.user_id = 1)

"Seq Scan on topics (cost=8.30..24030.12 rows=20555 width=725) (actual
time=3.186..59.636 rows=5 loops=1)"
" Filter: (last_unread_at >= $0)"
" Rows Removed by Filter: 61660"
" InitPlan 1 (returns $0)"
" -> Index Scan using user_stats_pkey on user_stats us
(cost=0.29..8.30 rows=1 width=8) (actual time=0.008..0.009 rows=1
loops=1)"
" Index Cond: (user_id = 1)"
"Planning time: 0.147 ms"
"Execution time: 59.671 ms"


At the time PostgreSQL plans this query, it doesn't know what the answer to the subquery is going to be.  Not having the true answer at its fingertips, it guesses that one third of the table is going to fall after the results of that subquery.

Maybe it should first execute the subquery and then re-plan the rest of the query based on the results.  But there is no provision for it to do that, and no concrete plans (that I know of) to implement such a thing.
 

The results here simply do not make sense to me, should I be piping
dates in here to avoid this issue and running 2 queries instead of 1?

That is the most pragmatic approach.  It isn't very nice, but the alternatives are worse.

Cheers,

Jeff

Re: [GENERAL] Why is posgres picking a suboptimal plan for this query?

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Wed, May 24, 2017 at 1:42 PM, Sam Saffron <sam.saffron@gmail.com> wrote:
>> I have this query that is not picking the right index unless I hard code
>> dates:
>> ...

> Maybe it should first execute the subquery and then re-plan the rest of the
> query based on the results.  But there is no provision for it to do that,
> and no concrete plans (that I know of) to implement such a thing.

I don't know of any plans for that, either.

>> The results here simply do not make sense to me, should I be piping
>> dates in here to avoid this issue and running 2 queries instead of 1?

> That is the most pragmatic approach.  It isn't very nice, but the
> alternatives are worse.

You could probably get the behavior you want by replacing the subquery
with a "stable" function:

create function first_topic_unread_for(userid int) returns timestamp as
'select first_topic_unread_at from user_stats us where us.user_id = $1'
language sql stable;

SELECT "topics".* FROM "topics"
WHERE topics.last_unread_at >= first_topic_unread_for(1);

This should convince the planner to pre-run the function to get an
estimated result at plan time.

            regards, tom lane


Re: [GENERAL] Why is posgres picking a suboptimal plan for this query?

От
Sam Saffron
Дата:
Awesome, thanks! I will give that a shot

On Wed, 24 May 2017 at 6:14 pm, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Wed, May 24, 2017 at 1:42 PM, Sam Saffron <sam.saffron@gmail.com> wrote:
>> I have this query that is not picking the right index unless I hard code
>> dates:
>> ...

> Maybe it should first execute the subquery and then re-plan the rest of the
> query based on the results.  But there is no provision for it to do that,
> and no concrete plans (that I know of) to implement such a thing.

I don't know of any plans for that, either.

>> The results here simply do not make sense to me, should I be piping
>> dates in here to avoid this issue and running 2 queries instead of 1?

> That is the most pragmatic approach.  It isn't very nice, but the
> alternatives are worse.

You could probably get the behavior you want by replacing the subquery
with a "stable" function:

create function first_topic_unread_for(userid int) returns timestamp as
'select first_topic_unread_at from user_stats us where us.user_id = $1'
language sql stable;

SELECT "topics".* FROM "topics"
WHERE topics.last_unread_at >= first_topic_unread_for(1);

This should convince the planner to pre-run the function to get an
estimated result at plan time.

                        regards, tom lane

Re: [GENERAL] Why is posgres picking a suboptimal plan for this query?

От
Sam Saffron
Дата:
OK, I committed a fix to Discourse, the suggested pattern by Tom works
like a charm, in my particular user case it cuts a query down from
200-500ms to 8ms.

Thank you heaps

https://github.com/discourse/discourse/commit/29fac1ac18acdc1f0d2c1650d33d2d4a1aab0a0b

On Wed, May 24, 2017 at 6:33 PM, Sam Saffron <sam.saffron@gmail.com> wrote:
> Awesome, thanks! I will give that a shot
>
> On Wed, 24 May 2017 at 6:14 pm, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Jeff Janes <jeff.janes@gmail.com> writes:
>> > On Wed, May 24, 2017 at 1:42 PM, Sam Saffron <sam.saffron@gmail.com>
>> > wrote:
>> >> I have this query that is not picking the right index unless I hard
>> >> code
>> >> dates:
>> >> ...
>>
>> > Maybe it should first execute the subquery and then re-plan the rest of
>> > the
>> > query based on the results.  But there is no provision for it to do
>> > that,
>> > and no concrete plans (that I know of) to implement such a thing.
>>
>> I don't know of any plans for that, either.
>>
>> >> The results here simply do not make sense to me, should I be piping
>> >> dates in here to avoid this issue and running 2 queries instead of 1?
>>
>> > That is the most pragmatic approach.  It isn't very nice, but the
>> > alternatives are worse.
>>
>> You could probably get the behavior you want by replacing the subquery
>> with a "stable" function:
>>
>> create function first_topic_unread_for(userid int) returns timestamp as
>> 'select first_topic_unread_at from user_stats us where us.user_id = $1'
>> language sql stable;
>>
>> SELECT "topics".* FROM "topics"
>> WHERE topics.last_unread_at >= first_topic_unread_for(1);
>>
>> This should convince the planner to pre-run the function to get an
>> estimated result at plan time.
>>
>>                         regards, tom lane