Re: Abbreviated keys for Numeric

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: Abbreviated keys for Numeric
Дата	23 марта 2015 г. 22:04:51
Msg-id	CA+TgmoYwShrH=8Szk52V3eGY+PQQtLsDvwdLSUiWq9XeTGjz+Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Abbreviated keys for Numeric (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Ответы	Re: Abbreviated keys for Numeric (Peter Geoghegan <pg@heroku.com>)
Список	pgsql-hackers

Дерево обсуждения

On Sat, Mar 21, 2015 at 2:58 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
>  Peter> I don't really buy it, either way. In what sense is a NULL value
>  Peter> ever abbreviated? It isn't. Whatever about the cost model,
>  Peter> that's the truth of the matter. There is always going to be a
>  Peter> sort of tension in any cost model, between whether or not it's
>  Peter> worth making it more sophisticated, and the extent to which
>  Peter> tweaking the model is chasing diminishing returns.
>
> Comparisons between nulls and nulls, or between nulls and non-nulls, are
> cheap; only comparisons between non-nulls and non-nulls can be
> expensive.
>
> The purpose of abbreviation is to replace expensive comparisons by cheap
> ones where possible, and therefore the cost model used for abbreviation
> should ignore nulls entirely; all that matters is the number of non-null
> values and the probability of saving time by abbreviating them.
>
> So if you're sorting a million rows of which 900,000 are null and
> 100,000 contain 50 different non-null values, then the absolute time
> saved (not the proportion) by doing abbreviation should be on the same
> order as the absolute time saved by abbreviation when sorting just the
> 100,000 non-null rows.
>
> But if the cost model does 1,000,000/50 and gets 20,000, and decides
> "that's worse than my 1 in 10,000 target, I'll abort abbreviations",
> then you have sacrificed the time gain for no reason at all.  This is
> what I mean by "spurious".  This is why the cost model must compute the
> fraction as 100,000/50, ignoring the null inputs, if it's going to
> perform anything like optimally in the presence of nulls.

I think Andrew is right.

>  Peter> I also think that your explanation of the encoding schemes was
>  Peter> perfunctory.
>
> I'm interested in other opinions on that, because I find your
> replacement for it both confusingly organized and a bit misleading (for
> example saying the top bit is "wasted" is wrong, it's reserved because
> we need it free for the sign).
>
> (It is true that mine assumes that the reader knows what "excess-N"
> means, or can find out.)
>
> Here's mine, which is given as a single block comment:
>
> [ long explanatory comment ]
>
> Peter's, inline with the code (omitted here):
>
> [ long explanatory comment ]

In my opinion, Andrew's version is far clearer.  Peter's version is
full of jargon that I can't understand.  I could probably figure it
out with a few hours and a search engine, but that really shouldn't be
necessary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Fabien COELHO
Дата: 23 марта 2015 г., 22:00:16
Сообщение: Re: PATCH: pgbench - merging transaction logs

Следующее

От: Andrew Gierth
Дата: 23 марта 2015 г., 22:06:58
Сообщение: Re: Exposing PG_VERSION_NUM in pg_config

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Abbreviated keys for Numeric

Предыдущее

Следующее