Re: Refactoring of heapam code.

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: Refactoring of heapam code.
Дата
Msg-id CABOikdM8yXfyC82Vt3ZvQsrm0MHeDoquGe56utGkrWrMgTTDqg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Refactoring of heapam code.  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Ответы Re: Refactoring of heapam code.  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers


On Tue, Sep 6, 2016 at 8:39 PM, Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote:

06.09.2016 07:44, Pavan Deolasee:

2. I don't understand the difference between PageGetItemHeapHeaderOnly() and PageGetItemHeap(). They seem to do exactly the same thing and can be used interchangeably.

The only difference between these two macros is that
PageGetItemHeapHeaderOnly() doesn't touch any key fields of a tuple,
it only checks header fields (see HeapTupleHeaderData). I divided it into
two separate functions, while I was working on page compression and
I wanted to avoid unnecessary decompression calls. These names are
just a kind of 'markers' to make the code reading and improving easier.


Ok. I still don't get it, but that's probably because I haven't seen a real use case of that. Right now, both macros look exactly the same.
 
3. While I like the idea of using separate interfaces to get heap/index tuple from a page, may be they can internally use a common interface instead of duplicating what PageGetItem() does already.

I don't sure I get it right. Do you suggest to leave PageGetItem and write
PageGetItemHeap() and PageGetItemIndex() as wrappers on it?
It sounds reasonable while we have similar layout for both heap and index pages.
In any case, it'll be easy to separate them when necessary.


Yes, that's what I was thinking.
 
4. Should we rename PageGetItemHeap() to PageGetHeapTuple() or something like that?

I don't feel like doing that, because HeapTuple is a different structure.
What we do get from page is a HeapTupleHeaderData structure
followed by user's data.

Ok, makes sense.
 


I also looked at the refactoring design doc. Looks like a fine approach to me, but the API will need much more elaborate discussions. I am not sure if the interfaces as presented are enough to support everything that even heapam can do today.

What features of heapam do you think could be unsupportable in this API?
Maybe I've just missed them.

I was thinking about locking, bulk reading (like page-mode API) etc. While you've added an API for vacuuming, we would probably also need an API to collect dead tuples, pruning etc (not sure if that can be combined with vacuum). Of course, these are probably very specific to current implementation of heap/MVCC and not all storages will need that. 
 

I suggest refactoring, that will allow us to develop new heap-like access methods.
For the first version, they must have methods to "heapify" tuple i.e turn internal
representation of the data to regular HeapTuple, for example put some stubs into
MVCC fields. Of course this approach has its disadvantages, such as performance issues.
It definitely won't be enough to write column storage or to implement other great
data structures. But it allows us not to depend of the Executor's code.


Ok, understood.
 

- There are many upper modules that need access to system attributes (xmin, xmax for starters). How do you plan to handle that? You think we can provide enough abstraction so that the callers don't need to know the tuple structures of individual PAM?

To be honest, I didn't thought about it.
Do you mean external modules or upper levels of abstraction in Postgres?

I meant upper levels of abstraction like the executor. For example, while inserting a new tuple, the executor (the index AM's insert routine to be precise) may need to wait for another transaction to finish. Today, it can easily get that information by looking at the xmax of the conflicting tuple. How would we handle that with abstraction since not every PAM will have a notion of xmax?
 
Thanks,
Pavan

 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: asynchronous and vectorized execution
Следующее
От: ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Дата:
Сообщение: [PATCH] Tab completion for ALTER TYPE … RENAMEVALUE …