Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Дата
Msg-id CAM3SWZQ0aNLgLofskQ9E62A8UFhfNHpt1p5KQEOKtph=HGcDWg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
On Tue, Dec 22, 2015 at 9:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> It has some nice properties, because unsigned integers are so simple
>> and flexible. For example, we can do it with int4 sometimes, and then
>> compare two attributes at once on 64-bit platforms. Maybe the second
>> attribute (the second set of 4 bytes) isn't an int4, though -- it's
>> any other type's abbreviated key (which can be assumed to have a
>> compatible representation). That's one more advanced possibility.
>
> Yikes.

I'm not suggesting we'd want to do that immediately, or even at all.
My point is -- stuff like this becomes possible. My intuition is that
it might come up in a bunch of places.

>> Another issue is that abbreviated keys in indexes are probably not
>> going to take 8 bytes, because they'll go in the ItemId array. It's
>> likely to be very useful to be able to take the first two bytes, and
>> know that a uint16 comparison is all that is needed, and have a basis
>> to perform an interpolation search rather than a binary search
>> (although that needs to be adaptive to different distributions, I
>> think it could be an effective technique -- there are many cache
>> misses for binary searches on internal B-Tree pages).
>
> Hmm, that seems iffy to me.  There are plenty of data sets where 8
> bytes is enough to get some meaningful information about what part of
> the keyspace you're in, but 2 bytes isn't.

Your experience with abbreviated keys for sorting is unlikely to
perfectly carry over here. That's because the 2 bytes are only put in
the internal pages (typically less than 1% of the total). These
internal pages typically have relatively low density anyway (we don't
use the The B* tree technique), so I think that the level of bloat
would be tiny. At the same time, these are the range of values that
naturally have very wide fan-out. For example, the entire range of
values in the index is represented at the root page.

All of that having been said, yes, it could still be useless. But
then, the overhead will still tend to be nill, due to the fact that
abbreviated key generation can be so effectively amortized (it happens
only once, per page split), and because branch prediction will tend to
remove any penalty.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Possible marginally-incompatible change to array subscripting
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Getting sorted data from foreign server for merge join