Re: Query plan choice issue

Поиск

Список

Период

Сортировка

От	Yaroslav Tykhiy
Тема	Re: Query plan choice issue
Дата	14 сентября 2010 г. 00:37:41
Msg-id	8E73D259-718E-4BFF-B212-1FE775920E2B@barnet.com.au обсуждение исходный текст
Ответ на	Re: Query plan choice issue (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Query plan choice issue
Список	pgsql-general

Дерево обсуждения

Hi Tom,

On 14/09/2010, at 12:41 AM, Tom Lane wrote:

> Yaroslav Tykhiy <yar@barnet.com.au> writes:
>>                      ->  Bitmap Heap Scan on dbmail_headervalue v
>> (cost=1409.82..221813.70 rows=2805 width=16) (actual
>> time=28543.411..28623.623 rows=1 loops=1)
>>                            Recheck Cond: (v.headername_id = n.id)
>>                            Filter: ("substring"(v.headervalue, 0,
>> 255) ~~* '%<...@mail.gmail.com>%'::text)
>>                            ->  Bitmap Index Scan on
>> dbmail_headervalue_testing  (cost=0.00..1409.82 rows=75940 width=0)
>> (actual time=17555.572..17555.572 rows=1877009 loops=1)
>>                                  Index Cond: (v.headername_id = n.id)
>
> I think the major problem you're having is that the planner is
> completely clueless about the selectivity of the condition
>     "substring"(v.headervalue, 0,  255) ~~* '%<...@mail.gmail.com>%'
> If it knew that that would match only one row, instead of several
> thousand, it would likely pick a different plan.
>
> In recent versions of PG you could probably make a noticeable
> improvement in this if you just dropped the substring() restriction
> ... do you actually need that?  Alternatively, if you don't want to
> change the query logic at all, I'd try making an index on
> substring(v.headervalue, 0, 255).  I'm not expecting the query
> to actually *use* the index, mind you.  But its existence will prompt
> ANALYZE to collect stats on the expression's value, and that will
> help the planner with estimating the ~~* condition.

Well, that substring() and ILIKE combo looked suspicious to me, too.
However, there already was an index on substring(v.headervalue, 0,
255) but the fast query plan didn't seem to use it, it used a
different index instead:

mail=# \d dbmail_headervalue
                               Table "public.dbmail_headervalue"
      Column     |  Type  |                             Modifiers
----------------+--------
+-------------------------------------------------------------------
  headername_id  | bigint | not null
  physmessage_id | bigint | not null
  id             | bigint | not null default
nextval('dbmail_headervalue_idnr_seq'::regclass)
  headervalue    | text   | not null default ''::text
Indexes:
     "dbmail_headervalue_pkey" PRIMARY KEY, btree (id)
     "dbmail_headervalue_1" UNIQUE, btree (physmessage_id, id)
     "dbmail_headervalue_2" btree (physmessage_id)
     "dbmail_headervalue_3" btree ("substring"(headervalue, 0, 255))
...
EXPLAIN ANALYZE...
                     -> Index Scan using dbmail_headervalue_2 on
dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual
time=0.028..0.029 rows=0 loops=358)
                           Index Cond: (v.physmessage_id =
m.physmessage_id)
                           Filter: ("substring"(v.headervalue, 0, 255)
~~* '%<...@mail.gmail.com>%'::text)
...

Meanwhile, a mate of mine lurking on this list pointed out that
reducing random_page_cost might help here and it did: random_page_cost
of 2 made the fast query favourable.

Can it mean that the default planner configuration slightly
overfavours seq scans?

Thank you all guys!

Yar

В списке pgsql-general по дате отправления:

Предыдущее

От: Yaroslav Tykhiy
Дата: 14 сентября 2010 г., 00:22:19
Сообщение: Re: Schema search path

Следующее

От: Craig Ringer
Дата: 14 сентября 2010 г., 01:59:50
Сообщение: Re: Post Install / Secure PostgreSQL

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Query plan choice issue

Предыдущее

Следующее