Re: Refactoring of heapam code.

Поиск

Список

Период

Сортировка

От	Pavan Deolasee
Тема	Re: Refactoring of heapam code.
Дата	12 сентября 2016 г. 10:12:43
Msg-id	CABOikdM8yXfyC82Vt3ZvQsrm0MHeDoquGe56utGkrWrMgTTDqg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Refactoring of heapam code. (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Ответы	Re: Refactoring of heapam code.
Список	pgsql-hackers

Дерево обсуждения

On Tue, Sep 6, 2016 at 8:39 PM, Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote:

06.09.2016 07:44, Pavan Deolasee:
2. I don't understand the difference between PageGetItemHeapHeaderOnly() and PageGetItemHeap(). They seem to do exactly the same thing and can be used interchangeably.

The only difference between these two macros is that
PageGetItemHeapHeaderOnly() doesn't touch any key fields of a tuple,
it only checks header fields (see HeapTupleHeaderData). I divided it into
two separate functions, while I was working on page compression and
I wanted to avoid unnecessary decompression calls. These names are
just a kind of 'markers' to make the code reading and improving easier.

Ok. I still don't get it, but that's probably because I haven't seen a real use case of that. Right now, both macros look exactly the same.

3. While I like the idea of using separate interfaces to get heap/index tuple from a page, may be they can internally use a common interface instead of duplicating what PageGetItem() does already.

I don't sure I get it right. Do you suggest to leave PageGetItem and write
PageGetItemHeap() and PageGetItemIndex() as wrappers on it?
It sounds reasonable while we have similar layout for both heap and index pages.
In any case, it'll be easy to separate them when necessary.

Yes, that's what I was thinking.

4. Should we rename PageGetItemHeap() to PageGetHeapTuple() or something like that?

I don't feel like doing that, because HeapTuple is a different structure.
What we do get from page is a HeapTupleHeaderData structure
followed by user's data.

Ok, makes sense.

I also looked at the refactoring design doc. Looks like a fine approach to me, but the API will need much more elaborate discussions. I am not sure if the interfaces as presented are enough to support everything that even heapam can do today.

What features of heapam do you think could be unsupportable in this API?
Maybe I've just missed them.

I was thinking about locking, bulk reading (like page-mode API) etc. While you've added an API for vacuuming, we would probably also need an API to collect dead tuples, pruning etc (not sure if that can be combined with vacuum). Of course, these are probably very specific to current implementation of heap/MVCC and not all storages will need that.

I suggest refactoring, that will allow us to develop new heap-like access methods.
For the first version, they must have methods to "heapify" tuple i.e turn internal
representation of the data to regular HeapTuple, for example put some stubs into
MVCC fields. Of course this approach has its disadvantages, such as performance issues.
It definitely won't be enough to write column storage or to implement other great
data structures. But it allows us not to depend of the Executor's code.

Ok, understood.

- There are many upper modules that need access to system attributes (xmin, xmax for starters). How do you plan to handle that? You think we can provide enough abstraction so that the callers don't need to know the tuple structures of individual PAM?

To be honest, I didn't thought about it.
Do you mean external modules or upper levels of abstraction in Postgres?

I meant upper levels of abstraction like the executor. For example, while inserting a new tuple, the executor (the index AM's insert routine to be precise) may need to wait for another transaction to finish. Today, it can easily get that information by looking at the xmax of the conflicting tuple. How would we handle that with abstraction since not every PAM will have a notion of xmax?

Thanks,

Pavan

Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Refactoring of heapam code.