Re: [HACKERS] Pluggable storage

Поиск
Список
Период
Сортировка
От Haribabu Kommi
Тема Re: [HACKERS] Pluggable storage
Дата
Msg-id CAJrrPGfCLTaD8DOzYqm44TbPYcOicp9eyGA4nDW=R6EN=mrNig@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Pluggable storage  (Andres Freund <andres@anarazel.de>)
Ответы Re: [HACKERS] Pluggable storage  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Список pgsql-hackers


On Tue, Aug 15, 2017 at 4:53 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,


On 2017-06-13 11:50:27 +1000, Haribabu Kommi wrote:
> Here I attached WIP patches to support pluggable storage. The patch series
> are may not work individually. Still so many things are under development.
> These patches are just to share the approach of the current development.

Making a pass through the patchset to get a feel where this at, and
where this is headed.  I previously skimmed the thread to get a rough
sense on what's discused, but not in a very detailed manner.

Thanks for the review.
 

General:

- I think one important discussion we need to have is what kind of
  performance impact we're going to accept introducing this. It seems
  very likely that this'll cause some slowdown.  We can kind of
  alleviate that by doing some optimizations at the same time, but
  nevertheless, this abstraction is going to cost.
 
OK. May be to take some decision, we may need some performance
figures, I will measure the performance once the API's stabilized.

- I don't think we should introduce this without a user besides
  heapam. The likelihood that API will be usable by anything else
  without a testcase seems fairly remote.  I think some src/test/modules
  type implementation of a per-session, in-memory storage - relatively
  easy to implement - or such is necessary.

Sure, I will add a test module once the API's are stabilized.

 
- I think, and detailed some of that, we're going to need some cleanups
  that go in before this, to decrease the size / increase the quality of
  the new APIs.  It's going to get more painful to change APIs
  subsequently.

- We'll have to document clearly that these APIs are going to change for
  a while, even after the release introducing them.

Yes, that's correct, because this is the first time we are developing the 
storage API's to support pluggable storage, so it may needs some refinements
based on the usage to support different storage methods. 



StorageAm - Scan stuff:

- I think API isn't quite right. There's a lot of granular callback
  functionality like scan_begin_function / scan_begin_catalog /
  scan_begin_bm - these largely are convenience wrappers around the same
  function, and I don't think that would, or rather should, change in
  any abstracted storage layer.  So I think there needs to be some
  unification first (pretty close w/ beginscan_internal already, but
  perhaps we should get rid of a few of these wrappers).

OK. I will change the API to add a function to beginscan_internal and replace
the rest of the functions usage with beginscan_internal. And also there are
many bool flags that are passed to the beginscan_internal, I will try to optimize
them also.

 
- Some of the exposed functionality, e.g. scan_getpage,
  scan_update_snapshot, scan_rescan_set_params looks like it should just
  be excised, i.e. there's no reason for it to exist.

Currently these API's are used only in Bitmap and Sample scan's.
These scan methods are fully depends on the heap format. I will
check how to remove these API's.

- Minor: don't think the _function suffix for Storis necessary, just
  makes things long, and every member has it. Besides that, it's also
  easy to misunderstand - for a second I understood
  scan_getnext_function to be about getting the next function...

OK. How about adding _hook?

 
- Scans are still represented as HeapScanDesc - I don't think that's
  going to fly. Either this needs to be an opaque type (i.e. a struct
  that's not defined, just forward declared), or it needs to be a base
  struct that individual AMs embed in their own structs.  Individual AMs
  definitely are going to need different pieces of data.

Currently the internal members of the HeapScanDesc are directly used
in many places especially in Bitmap and Sample scan's. I am yet to write
the code in a best way to handle these scan methods and then removing
its usage will be easy.
 
Storage AM - tuple stuff:

- tuple_get_{xmin, updated_xid, cmin, itempointer, ctid, heaponly} are
  each individual functions, that seems pretty painful to maintain, and
  v/ likely to just grow and grow. Not sure what the solution is, but
  this seems like a hard sell.
 
OK. How about adding a one API and takes some flags to represent
what type of data that is needed from the tuple and returned the corresponding
data as void *. The caller must typecast the data to their corresponding
type before use it.

- The three *speculative* functions don't quite seem right to me, nor do
  I understand:
+        *
+        * Setting a tuple's speculative token is a slot-only operation, so no need
+        * for a storage AM method, but after inserting a tuple containing a
+        * speculative token, the insertion must be completed by these routines:
+        */
  I don't see anything related to slots, here?

The tuple_set_speculative_token API is not required. Just update the slot 
member directly with speculative token is fine and this value is used in
the tuple_insert API to form the tuple with speculative token. Later with
the other two API's the tuple is either finished or aborted.

 
Storage AM - slot stuff:


- I think there's a number of wrapper functions (slot_getattr,
  slot_getallattrs, getsomeattrs, attisnull) around the same
  functionality - that bloats the API and causes slowdowns. Instead we
  need something like slot_virtualize_tuple(int16 upto), and the rest
  should just be wrappers.


OK. I will change accordingly.
 
- I think it's wrong to have the slot functionality defined on the
  StorageAm level.  That'll cause even more indirect function calls (=>
  slowness), and besides that the TupleTableSlot struct will need
  members for each and every Am.

  I think the solution is to instead have an Am level "create slot"
  function, and the returned slot is allocated by the Am, with a base
  member of TupleTableSlot with basically just tts_nvalid, tts_values,
  tts_isnull as members.  Those are the only members that can be
  accessed without functions.

  Access to the individual functions (say store_tuple) would then be
  direct members of the TupleTableSlot interface. While that costs a bit
  of memory, it removes one indirection from an already performance
  critical path.

OK. This will change the structure to hold the minimal members that are
accessed irrespective of storage AM, and rest of data will be a void pointer
that can be accessed by only the storage AM.


- MinimalTuples should be one type of slot for the above, except it's
  not created by an StorageAm but by a function returning a
  TupleTableSlot.

  This should remove the need for the slot_copy_min_tuple,
  slot_is_physical_tuple functions.

OK.
 
- Right now TupleTableSlots are an executor datastructure, but
  these patches (rightly!) make it much more widely used. So I think it
  needs to be moved outside of executor/, and probably renamed to
  something like TupleHolder or something.

OK.
 
- The oid integration seems wrong - without an accessor oids won't be
  queryable with this unless you break through the API.  But from a
  higher level view I do wonder if it's not time to remove "special" oid
  columns and replace them with a normal column.  We should be hesitant
  enshrining crusty old concepts in new APIs.

OK.
 
Executor integration:

- I'm quite fearful that this'll cause slowdowns in a few tight paths.
  The most likely cases here seem to be a) bitmap indexscans b)
  indexscans c) highly selective sequential scans.   I do wonder if
  that can be partially addressed by switching out the individual
  executor routines in the relevant scan nodes by something using or
  similar to the infrastructure in cc9f08b6b8


Sorry, I didn't understand this point clearly. Can you provide some more
details.

Regards,
Hari Babu
Fujitsu Australia

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages
Следующее
От: Haribabu Kommi
Дата:
Сообщение: Re: [HACKERS] Pluggable storage