Re: Abbreviated keys for Numeric

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Abbreviated keys for Numeric
Дата
Msg-id CA+TgmoYwShrH=8Szk52V3eGY+PQQtLsDvwdLSUiWq9XeTGjz+Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Abbreviated keys for Numeric  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Ответы Re: Abbreviated keys for Numeric  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
On Sat, Mar 21, 2015 at 2:58 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
>  Peter> I don't really buy it, either way. In what sense is a NULL value
>  Peter> ever abbreviated? It isn't. Whatever about the cost model,
>  Peter> that's the truth of the matter. There is always going to be a
>  Peter> sort of tension in any cost model, between whether or not it's
>  Peter> worth making it more sophisticated, and the extent to which
>  Peter> tweaking the model is chasing diminishing returns.
>
> Comparisons between nulls and nulls, or between nulls and non-nulls, are
> cheap; only comparisons between non-nulls and non-nulls can be
> expensive.
>
> The purpose of abbreviation is to replace expensive comparisons by cheap
> ones where possible, and therefore the cost model used for abbreviation
> should ignore nulls entirely; all that matters is the number of non-null
> values and the probability of saving time by abbreviating them.
>
> So if you're sorting a million rows of which 900,000 are null and
> 100,000 contain 50 different non-null values, then the absolute time
> saved (not the proportion) by doing abbreviation should be on the same
> order as the absolute time saved by abbreviation when sorting just the
> 100,000 non-null rows.
>
> But if the cost model does 1,000,000/50 and gets 20,000, and decides
> "that's worse than my 1 in 10,000 target, I'll abort abbreviations",
> then you have sacrificed the time gain for no reason at all.  This is
> what I mean by "spurious".  This is why the cost model must compute the
> fraction as 100,000/50, ignoring the null inputs, if it's going to
> perform anything like optimally in the presence of nulls.

I think Andrew is right.

>  Peter> I also think that your explanation of the encoding schemes was
>  Peter> perfunctory.
>
> I'm interested in other opinions on that, because I find your
> replacement for it both confusingly organized and a bit misleading (for
> example saying the top bit is "wasted" is wrong, it's reserved because
> we need it free for the sign).
>
> (It is true that mine assumes that the reader knows what "excess-N"
> means, or can find out.)
>
> Here's mine, which is given as a single block comment:
>
> [ long explanatory comment ]
>
> Peter's, inline with the code (omitted here):
>
> [ long explanatory comment ]

In my opinion, Andrew's version is far clearer.  Peter's version is
full of jargon that I can't understand.  I could probably figure it
out with a few hours and a search engine, but that really shouldn't be
necessary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: PATCH: pgbench - merging transaction logs
Следующее
От: Andrew Gierth
Дата:
Сообщение: Re: Exposing PG_VERSION_NUM in pg_config