Re: Regarding EXPLAIN and width calculations

Поиск

Список

Период

Сортировка

От	Jon Nelson
Тема	Re: Regarding EXPLAIN and width calculations
Дата	19 ноября 2010 г. 18:18:17
Msg-id	AANLkTik+WkguxTGSLkLAHc=vXKyHJLU+Aa9+-+8L-p8Y@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Regarding EXPLAIN and width calculations (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-general

Дерево обсуждения

On Fri, Nov 19, 2010 at 1:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jon Nelson <jnelson+pgsql@jamponi.net> writes:
>> On Fri, Nov 19, 2010 at 12:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Hard to comment about this with such an incomplete view of the situation
>>> --- in particular, data types would be a critical factor, and I also
>>> wonder if you're admitting to all the columns involved.
>
>> Here is an example that, while super ugly, does show the problem:
>
> Hm.  In the UNION case, we have a measured average width for the inet
> column from ANALYZE for each table, and we just use that.  In the
> sub-select case, it seems to be falling back to a default estimate for
> the datatype, which surprises me a bit --- at least in HEAD it seems
> like it should be smarter.  I'll go look at that.
>
> As for the rowcount estimates, these aren't the same query so there's no
> reason for them to be the same.  In the UNION case, it's basically
> taking the pessimistic assumption that there are no duplicate rows;
> given the lack of cross-column stats there's no way to be much smarter.
> In the GROUP BY case, the question is how many distinct values of 'a'
> you suppose there are.  This is also pretty hard to be rigorous about
> --- we have an idea of that for each table, but no idea how many
> cross-table duplications there are.  The 200 is just a default estimate
> when it has no idea.  We could probably come up with some better though
> not rigorous estimate, but nobody's worked on it.

I've run into this '200' issue a *lot* (see the "HashAggregate
consumes all memory before crashing" issue I asked about earlier).  If
I might, perhaps a couple of alternatives seem more reasonable to me:

1. calculate the value as a percentage of the total rows (like, say, 15%)
2. make the value a variable, so that it could be set globally or even
per-query. it would be really nice to be able to *tell* the planner
"there is an expected 30% overlap between these tables".

Could you point me in the general direction in the source as to where
the 200 value comes from? With tables with hundreds of millions or
billions or rows I see the value 40,000. Both 200 and 40,000 seem like
arbitrary values - perhaps they are calculated similarly?

--
Jon

В списке pgsql-general по дате отправления:

Предыдущее

От: Tom Lane
Дата: 19 ноября 2010 г., 18:09:17
Сообщение: Re: Regarding EXPLAIN and width calculations

Следующее

От: Vick Khera
Дата: 19 ноября 2010 г., 20:10:43
Сообщение: Re: limits of constraint exclusion

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Regarding EXPLAIN and width calculations

Предыдущее

Следующее