Re: Improve eviction algorithm in ReorderBuffer

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Improve eviction algorithm in ReorderBuffer
Дата
Msg-id 00de1318108c36f5f2349d1d52a19fddef884fb4.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: Improve eviction algorithm in ReorderBuffer  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: Improve eviction algorithm in ReorderBuffer  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
On Fri, 2024-04-05 at 16:58 +0900, Masahiko Sawada wrote:
> IIUC for example in ReorderBuffer, the heap key is transaction size
> and the hash key is xid.

Yes.

>
> I see your point. It assumes that the bh_node_type is a pointer or at
> least unique. So it cannot work with Datum being a value.

Right. One option might just be to add some comments explaining the API
and limitations, but in general I feel it's confusing to hash a pointer
without a good reason.

>
> It sounds like a data structure that mixes the hash table and the
> binary heap and we use it as the main storage (e.g. for
> ReorderBufferTXN) instead of using the binary heap as the secondary
> data structure. IIUC with your idea, the indexed binary heap has a
> hash table to store elements each of which has its index within the
> heap node array. I guess it's better to create it as a new data
> structure rather than extending the existing binaryheap, since APIs
> could be very different. I might be missing something, though.

You are right that this approach starts to feel like a new data
structure and is not v17 material.

I am interested in this for v18 though -- we could make the API more
like simplehash to be more flexible when using values (rather than
pointers) and to be able to inline the comparator.

> > * remove the hash table from binaryheap.c
> >
> > * supply a new callback to the binary heap with type like:
> >
> >   typedef void (*binaryheap_update_index)(
> >     bh_node_type node,
> >     int new_element_index);
> >
> > * make the remove, update_up, and update_down methods take the
> > element
> > index rather than the pointer

...

> This shows that the current indexed binaryheap is much slower than
> the
> other two implementations, and the xx_binaryheap has a good number in
> spite of also being indexed.

xx_binaryheap isn't indexed though, right?

> When it comes to
> implementing the above idea (i.e. changing binaryheap to
> xx_binaryheap), it was simple since we just replace the code where we
> update the hash table with the code where we call the callback, if we
> get the consensus on API change.

That seems reasonable to me.

> The fact that we use simplehash for the internal hash table might
> make
> this idea complex. If I understand your suggestion correctly, the
> caller needs to tell the hash table the hash function when creating a
> binaryheap but the hash function needs to be specified at a compile
> time. We can use a dynahash instead but it would make the binaryheap
> slow further.

simplehash.h supports private_data, which makes it easier to track a
callback.

In binaryheap.c, that would look something like:

  static inline uint32
  binaryheap_hash(bh_nodeidx_hash *tab, uint32 key)
  {
     binaryheap_hashfunc hashfunc = tab->private_data;
     return hashfunc(key);
  }

  ...
  #define SH_HASH_KEY(tb, key) binaryheap_hash(tb, key)
  ...

  binaryheap_allocate(int num_nodes, binaryheap_comparator compare,
                      void *arg, binaryheap_hashfunc hashfunc)
  {
    ...
    if (hashfunc != NULL)
    {
       /* could have a new structure, but we only need to
        * store one pointer, so don't bother with palloc/pfree */
       void *private_data = (void *)hashfunc;
       heap->bh_nodeidx = bh_nodeidx_create(..., private_data);
       ...


And in reorderbuffer.c, define the callback like:

  static uint32
  reorderbuffer_xid_hash(TransactionId xid)
  {
    /* fasthash32 is 'static inline' so may
     * be faster than hash_bytes()? */
    return fasthash32(&xid, sizeof(TransactionId), 0);
  }



In summary, there are two viable approaches for addressing the concerns
in v17:

1. Provide callback to update ReorderBufferTXN->heap_element_index, and
use that index (rather than the pointer) for updating the heap key
(transaction size) or removing elements from the heap.

2. Provide callback for hashing, so that binaryheap.c can hash the xid
value rather than the pointer.

I don't have a strong opinion about which one to use. I prefer
something closer to #1 for v18, but for v17 I suggest whichever one
comes out simpler.

Regards,
    Jeff Davis




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Fixing backslash dot for COPY FROM...CSV
Следующее
От: Jeff Davis
Дата:
Сообщение: fasthash32() returning uint64?