Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation
Дата	30 октября 2019 г. 23:25:14
Msg-id	20191030202514.cfqx2he7q6ooj5kl@alap3.anarazel.de обсуждение исходный текст
Ответ на	Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation (Peter Geoghegan <pg@bowt.ie>)
Ответы	Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation
Список	pgsql-hackers

Дерево обсуждения

Hi,

On 2019-10-30 12:37:50 -0700, Peter Geoghegan wrote:
> On Wed, Oct 30, 2019 at 12:03 PM Andres Freund <andres@anarazel.de> wrote:
> > I'd much rather not entrench this further, even leaving global indexes
> > aside. The 4 byte block number is a significant limitation for heap
> > tables too, and we should lift that at some point not too far away.
> > Then there's also other AMs that could really use a wider tid space.
> 
> I agree that that limitation is a problem that should be fixed before
> too long. But the solution probably shouldn't be a radical departure
> from what we have today. The vast majority of tables are not affected
> by the TID space limitation. Those tables that are can tolerate
> supporting fixed width "long" TIDs (perhaps 8 bytes long?) that are
> used for the higher portion of the heap TID space alone.
> 
> The idea here is that TID is varwidth, but actually uses the existing
> heap TID format most of the time. For larger tables it uses a wider
> fixed width struct that largely works the same as the old 6 byte
> struct.

I assume you mean that the index would dynamically recognize when it
needs the wider tids ("for the higher portion")? If so, yea, that makes
sense to me. Would that need to encode the 6/8byteness of a tid on a
per-element basis? Or are you thinking of being able to upconvert in a
general way?

> > > Though I suppose a posting list almost has to have fixed width TIDs to
> > > perform acceptably.
> >
> > Hm. It's not clear to me why that is?
> 
> Random access matters for things like determining the correct offset
> to split a posting list at. This is needed in the event of an
> overlapping insertion of a new duplicate tuple whose heap TID falls
> within the range of the posting list. Also, I want to be able to scan
> posting lists backwards for backwards scans. In general, fixed-width
> TIDs make the page space accounting fairly simple, which matters a lot
> in nbtree.

If we had variable width tids varying by more than 2 bytes, it might be
reasonable for cases like this to store all tids padded to the width of
the widest tid. I think that'd still be pretty OK, because you'd only
pay the price if you actually have long tids, and only on pages where
such tids are referenced. Obviously that means that such a posting list
could grow more than by the size of the inserted tid upon insertion, but
that might still be OK?  That'd require storing the width of the posting
list elements somewhere, unfortunately - not sure if that's a problem?

ISTM if we had varint style tids, we'd probably still save space on
average for today's heap that way. How important is it for you to be
able to split out the "block offset" and "page offset" bits?

I'm somewhat inclined to think that tids should just be a varint (or
maybe two, if we want to make it slightly simpler to keep compat to how
they look today), and that the AM internally makes sense of that.

> I can support varwidth TIDs in the future pretty well if the TID
> doesn't have to be *arbitrarily* wide.

I think it'd be perfectly reasonable to have a relatively low upper
limit for tid width. Not 8 bytes, but also not 200 bytes.

> Individual posting lists can themselves either use 6 byte or 8 byte
> TIDs, preserving the ability to access a posting list entry at random
> using simple pointer arithmetic.  This makes converting over index AMs
> a lot less painful -- it'll be pretty easy to avoid mixing together
> the 6 byte and 8 byte structs.

With varint style tids as I suggested, that ought to be fairly simple?

> I don't think "indirect" indexes are a realistic goal for
> Postgres. VACUUM is just too messy there (as is any other garbage
> collection mechanism).  Zedstore and Zheap don't change this.

Hm, I don't think there's a fundamental problem, but let's leave that
aside, there's enough other reasons to improve this.

Greetings,

Andres Freund

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 30 октября 2019 г., 22:58:34
Сообщение: pgstat.c has brittle response to transient problems

Следующее

От: David Rowley
Дата: 31 октября 2019 г., 00:47:11
Сообщение: Re: v12.0: ERROR: could not find pathkey item to sort

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation

Предыдущее

Следующее