Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation
Дата
Msg-id CAH2-Wz=acjRtkwzGSY_-+uSD=nOHxAsjMrOWVx1dVDQ2Lb4rfw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation  (Stephen Frost <sfrost@snowman.net>)
Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Mon, Apr 22, 2019 at 8:36 AM Stephen Frost <sfrost@snowman.net> wrote:
> This seems like it would be helpful for global indexes as well, wouldn't
> it?

Yes, though that should probably work by reusing what we already do
with heap TID (use standard IndexTuple fields on the leaf level for
heap TID), plus an additional identifier for the partition number that
is located at the physical end of the tuple. IOW, I think that this
might benefit from a design that is half way between what we already
do with heap TIDs and what we would be required to do to make varwidth
logical row identifiers in tables work -- the partition number is
varwidth, though often only a single byte.

> I agree with trying to avoid having padding 'in the wrong place' and if
> it makes some indexes smaller, great, even if they're unlikely to be
> interesting in the vast majority of cases, they may still exist out
> there.  Of course, this is provided that it doesn't overly complicate
> the code, but it sounds like it wouldn't be too bad in this case.

Here is what it took:

* Removed the "conservative" MAXALIGN() within index_form_tuple(),
bringing it in line with heap_form_tuple(), which only MAXALIGN()s so
that the first attribute in tuple's data area can safely be accessed
on alignment-picky platforms, but doesn't do the same with data_len.

* Removed most of the MAXALIGN()s from nbtinsert.c, except one that
considers if a page split is required.

* Didn't change the nbtsplitloc.c code, because we need to assume
MAXALIGN()'d space quantities there. We continue to not trust the
reported tuple length to be MAXALIGN()'d, which is now essentially
rather than just defensive.

* Removed MAXALIGN()s within _bt_truncate(), and SHORTALIGN()'d the
whole tuple size in the case where new pivot tuple requires a heap TID
representation. We access TIDs as 3 2 byte integers, so this is
necessary for alignment-picky platforms.

I will pursue this as a project for PostgreSQL 13. It doesn't affect
on-disk compatibility, because BTreeTupleGetHeapTID() works just as
well with either the existing scheme, or this new one. Having the
"real" tuple length available will make it easier to implement "true"
suffix truncation, where we truncate *within* a text attribute (i.e.
generate a new, shorter value using new opclass infrastructure).

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: finding changed blocks using WAL scanning
Следующее
От: Tom Lane
Дата:
Сообщение: Re: clean up docs for v12