Re: [HACKERS] Pluggable storage

Поиск
Список
Период
Сортировка
От Haribabu Kommi
Тема Re: [HACKERS] Pluggable storage
Дата
Msg-id CAJrrPGcUNVweTM27_yaAHQTvi_1e3nXtc9wQs34GL2=-BE6ukg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Pluggable storage  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [HACKERS] Pluggable storage  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers


On Sun, Aug 13, 2017 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Aug 12, 2017 at 10:31 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>>
>> Why do we need to store handler function in TupleDesc?  As of now, the
>> above patch series has it available in RelationData and
>> TupleTableSlot, I am not sure if instead of that keeping it in
>> TupleDesc is a good idea.  Which all kind of places require TupleDesc
>> to contain handler?  If those are few places, can we think of passing
>> it as a parameter?
>
>
> Till now I am to able to proceed without adding any storage handler
> functions to
> TupleDesc structure. Sure, I will try the way of passing as a parameter when
> there is a need of it.
>

Okay, I think it is better if you discuss such locations before
directly modifying those.

Sure. I will check with community before making any such changes.

 
> During the progress of the patch, I am facing problems in designing the
> storage API
> regarding the Buffer. For example To replace all the HeapTupleSatisfiesMVCC
> and
> related functions with function pointers, In HeapTuple format, the tuple may
> belongs
> to one buffer, so the buffer is passed to the HeapTupleSatifisifes***
> functions along
> with buffer, But in case of other storage formats, the single buffer may not
> contains
> the actual data.
>

Also, it is quite possible that some of the storage Am's don't even
want to return bool as a parameter from HeapTupleSatisfies* API's.  I
guess what we need here is to provide a way so that different storage
am's can register their function pointer for an equivalent to
satisfies function.  So, we need to change
SnapshotData.SnapshotSatisfiesFunc in some way so that different
handlers can register their function instead of using that directly.
I think that should address the problem you are planning to solve by
omitting buffer parameter.

Thanks for your suggestion. Yes, it is better to go in the direction of 
SnapshotSatisfiesFunc.

I verified the above idea of implementing the Tuple visibility functions
and assign them into the snapshotData structure based on the snapshot.

The Tuple visibility functions that are specific to the relation are available
with the RelationData structure and this structure may not be available,
so I changed the SnapShotData structure to hold an enum to represent
what type of snapshot it is, instead of storing the pointer to the tuple
visibility function. Whenever there is a need to check for the tuple visibilty
the storageam handler pointer corresponding to the snapshot type is
called and result is obtained as earlier.
 

> This buffer is used to set the Hint bits and mark the
> buffer as dirty.
> In case if the buffer is not available, the performance may affect for the
> following
> queries if the hint bits are not set.
>

I don't think it is advisable to change that for the current heap.

I didn't change the prototype of existing functions. Currently tuple visibility
functions assumes that Buffer is always proper, but that may not be correct
based on the storage.
 

> And also the Buffer is used to get from heap_fetch, heap_lock_tuple and
> related
> functions to check the Tuple visibility, but currently returning a buffer
> from the above
> heap_** function is not possible for other formats.
>

Why not?  I mean if we consider that all the formats we are worried at
this stage have TID (block number, tuple location), then we can get
the buffer.  We might want to consider passing TID as a parameter to
these API's if required to make that possible.  You also agreed above
[1] that we can first design the API considering storage formats
having TID.

The current approach is to support the storages that support TID bits.
But what I mean here, in some storage methods (for example column
storage), the tuple is not present in one buffer, the tuple data may be
calculated from many buffers and return the slot/storageTuple (until
unless we change everywhere to slot).

If any of the following code after the storage methods is expecting
a Buffer that should be valid may need some changes to check it first
whether it is a valid or not and perform the operations based on that.
 
> And also for the
> HeapTuple data,
> the tuple data is copied into palloced buffer instead of pointing directly
> to the page.
> So, returning a Buffer is a valid or not here?
>

Yeah, but I think for the sake of compatibility and not changing too
much in the current API's signature, we should try to avoid it.

Currently I am trying to avoid changing the current API's signatures.
Most of the signature changes are something like HeapTuple -> StorageTuple
and etc.

 
> Currently I am proceeding to remove the Buffer as parameter in the API and
> proceed
> further, In case if it affects the performance, we need to find out a
> different appraoch
> in handling the hint bits.
>

Leaving aside the performance concern, I am not convinced that it is a
good idea to remove Buffer as a parameter from the API's you have
mentioned above.  Would you mind thinking once again keeping the
suggestions provided above in this email to see if we can avoid
removing Buffer as a parameter?

Thanks for your suggestions.
Yes. I am able to proceed without removing Buffer parameter.


Regards,
Hari Babu
Fujitsu Australia

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: [HACKERS] advanced partition matching algorithm for partition-wise join
Следующее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] expanding inheritance in partition bound order