Re: : PostgreSQL Index behavior

Поиск
Список
Период
Сортировка
От Venkat Balaji
Тема Re: : PostgreSQL Index behavior
Дата
Msg-id CAFrxt0iehqe4m3jympYeCMc1xrjHUan=kcEBXpXqZKe0_UzJZw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: : PostgreSQL Index behavior  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: : PostgreSQL Index behavior
Список pgsql-performance
Thank you Jeff !

My comments are inline.

> explain  SELECT tv.short_code, tv.chn as pkg_subscription_chn,
>                tv.vert as pkg_vert, ubs.campaign_id as campaign,
> 'none'::varchar as referer,
>                CAST('CAMPAIGNWISE_SUBSCRIBER_BASE' AS VARCHAR) as vn,
> count(tv.msisdn) as n_count, '0'::numeric AS tot_revenue
>                FROM campaign_base ubs
>                JOIN tab_current_day_v2 tv
>                ON ubs.ubs_seq_id = tv.ubs_seq_id
>              AND tv.dt = CAST('2012-09-08' AS DATE)
>              GROUP BY tv.short_code, tv.vert, tv.chn, ubs.campaign_id, vn;
...
>
> The above plan shows "seq scan" on tab_current_day_v2 table, though there is
> an index on "ubs_seq_id" column which is an unique column.
>
> Can anyone please help us understand, why PostgreSQL optimizer is not
> prioritizing the unique column and hitting ubs_seq_id_idx Index here ?

The query where clause does not specify a constant value for
ubs_seq_id.  So it is likely that the only way to use that index would
be to reverse the order of the nested loop and seq scan the other
table.  Is there any reason to think that doing that would be faster?

I believe, I missed an important point here. Yes, since the constant value is not provided for ubs_seq_id, Index scan is not a prime preference. Makes sense. Further analysis is explained below.
 
> Later -
>
> We have created composite Index on "dt" (one distinct value) and
> "ubs_seq_id" (no duplicate values) and the index has been picked up.

Postgres seems to think that "dt" has no duplicate values, the
opposite of having one distinct value.
That is based on the estimates given in the explain plan, that teh seq
scan will return only one row after the filter on Filter: "(dt =
'2012-09-08'::date)".   This does seem to conflict with what you
report from pg_stats, but I'm not familiar with that view, and you
haven't told us what version of pgsql you are using.

We are using PostgreSQL-9.0.1. 

Yes, "dt" has one distinct value all the time is generated on daily basis.
"2012-09-08" is an non-existent value, so, Postgres seems to think there are no duplicates.

If i pass on the value which is existing in the table "2012-09-11", PostgreSQL optimizer is picking up "Seq Scan" (what ever Index is existing).

In our scenario, we cannot expect an Index scan to happen, because I believe, following are the reasons -

ubs_seq_id column in campaign_base table has 1.2 m rows -- all distinct
ubs_seq_id column in tab_current_day_v2 table has 1.9 m rows -- all distinct
dt has only 1 distinct value.

All being used with AND operator, extracted rows will be minimum 1.2 m. So, I believe, "seq scan" is the best choice PG is opting for.

I got the point. Thanks !

Regards,
Venkat

DISCLAIMER:

Please note that this message and any attachments may contain confidential and proprietary material and information and are intended only for the use of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any review, use, disclosure, dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received this email in error, please immediately notify the sender and delete this e-mail , whether electronic or printed. Please also note that any views, opinions, conclusions or commitments expressed in this message are those of the individual sender and do not necessarily reflect the views of Ver sé Innovation Pvt Ltd.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Planner selects different execution plans depending on limit
Следующее
От: Scott Marlowe
Дата:
Сообщение: Re: : PostgreSQL Index behavior