Re: Bad optimizer data for xml (WAS: xml data type implications of no =)

Поиск
Список
Период
Сортировка
От Mark Kirkwood
Тема Re: Bad optimizer data for xml (WAS: xml data type implications of no =)
Дата
Msg-id 4C0837E0.7060706@catalyst.net.nz
обсуждение исходный текст
Ответ на Re: xml data type implications of no =  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Ответы Re: Bad optimizer data for xml (WAS: xml data type implications of no =)
Список pgsql-bugs
On 27/05/10 13:37, Mark Kirkwood wrote:
> On 25/05/10 16:43, Mark Kirkwood wrote:
>> Today I ran into some interesting consequences of the xml data type
>> being without an "=" operator. One I thought I'd post here because it
>> has a *possible* planner impact. I'm not sure it is actually a bug as
>> such, but this seemed the best forum to post in initially:
>>
>> test=# \d bug
>>       Table "public.bug"
>>  Column |  Type   | Modifiers
>> --------+---------+-----------
>>  id     | integer |
>>  val    | xml     |
>>
>> test=# explain select val::text from bug;
>>                           QUERY PLAN
>> --------------------------------------------------------------
>>  Seq Scan on bug  (cost=0.00..58127.78 rows=1000278 width=32)
>>
>>
>> Note the width estimate. However a more realistic estimate for width is:
>>
>> test=# select 8192/(reltuples/relpages) as width from pg_class where
>> relname='bug';
>>       width
>> ------------------
>>  394.130431739976
>>
>> So we are going to massively underestimate the "size" of such a
>> dataset. Now this appears to be a consequence of no "=" operator
>> (std_typanalyze in analyze.c bails if there isn't one), so the
>> planner has no idea about how wide 'val' actually is. I'm wondering
>> if it is worth having at least an "=" operator to enable some minimal
>> stats to be available for xml columns.
>>
>
> Adding a minimal = op (see attached) and an analyze results in:
>
> test=# explain select val::text from bug;
>                           QUERY PLAN
> ---------------------------------------------------------------
>  Seq Scan on bug  (cost=0.00..62632.08 rows=1000008 width=385)
>
> which gives a much better indication of dataset size.
>
>
>

Maybe I gave this guy a bad title - is it a concern that the 'width'
estimate is so far off for xml datatypes (because of no = op)? It seemed
to me that this could result in some bad plan choices (e.g in subqueries
etc).

regards

Mark

В списке pgsql-bugs по дате отправления:

Предыдущее
От: "Alexander"
Дата:
Сообщение: BUG #5489: SELECT ... RETURNING INTO ... in ecpg
Следующее
От: Joshua Tolley
Дата:
Сообщение: Re: BUG #5488: pg_dump does not quote column names -> pg_restore may fail when upgrading