Обсуждение: Comment simplehash/dynahash trade-offs

Поиск
Список
Период
Сортировка

Comment simplehash/dynahash trade-offs

От
James Coleman
Дата:
In another thread [1] I'd mused that "there might be some value in a
README or comments
addition that would be a guide to what the various hash
implementations are useful for...so that we have something to make the
code base a bit more discoverable."

I'd solicited feedback from Andres (as the author of the simplehash
implementation) and gotten further explanation from Tomas (both cc'd
here) and have tried to condense that into the comment changes in this
patch series.

v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch
Contains the summaries mentioned above.

v1-0002-Improve-simplehash-usage-notes.patch
I'd noticed while adding a simplehash implementation in that other
thread that the facts that:
- The element type requires a status member.
- Why storing a hash in the element type is useful (i.e., when to
define SH_STORE_HASH/SH_GET_HASH).
- The availability of  private_data member for metadata access from callbacks.
are either undocumented or hard to discover, and so I've added the
information directly to the usage notes section.

v1-0003-Show-sample-simplehash-method-signatures.patch
I find it hard to read the macro code "templating" particularly for
seeing what the available API is and so added sample method signatures
in comments to the macro generated method signature defines.

James

[1]: https://www.postgresql.org/message-id/CAAaqYe956a-zbm1qR8pqz%3DiLbF8LW5vBrQGrzXVHXdLk3at5_Q%40mail.gmail.com

Вложения

Re: Comment simplehash/dynahash trade-offs

От
Thomas Munro
Дата:
On Fri, May 1, 2020 at 1:53 PM James Coleman <jtc331@gmail.com> wrote:
> In another thread [1] I'd mused that "there might be some value in a
> README or comments
> addition that would be a guide to what the various hash
> implementations are useful for...so that we have something to make the
> code base a bit more discoverable."

+1

> I'd solicited feedback from Andres (as the author of the simplehash
> implementation) and gotten further explanation from Tomas (both cc'd
> here) and have tried to condense that into the comment changes in this
> patch series.
>
> v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch
> Contains the summaries mentioned above.

+ * - It supports partitioning, which is useful for shared memory access using

I wonder if we should say a bit more about the shared memory mode.
Shared memory dynahash tables are allocated in a fixed size area at
startup, and are discoverable by name in other other processes that
need to get access to them, while simplehash assumes that it can get
memory from a MemoryContext or an allocator with a malloc/free-style
interface, which isn't very well suited for use in shared memory.
(I'm sure you can convince it to work in shared memory with some
work.)

> v1-0002-Improve-simplehash-usage-notes.patch

+ *    For convenience the hash table create functions accept a void pointer
+ *    will be stored in the hash table type's member private_data.

*that* will be stored?

> v1-0003-Show-sample-simplehash-method-signatures.patch
> I find it hard to read the macro code "templating" particularly for
> seeing what the available API is and so added sample method signatures
> in comments to the macro generated method signature defines.

I didn't double-check all the expansions of the macros but +1 for this
idea, it's very useful.



Re: Comment simplehash/dynahash trade-offs

От
James Coleman
Дата:
On Mon, Jul 20, 2020 at 1:29 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Fri, May 1, 2020 at 1:53 PM James Coleman <jtc331@gmail.com> wrote:
> > In another thread [1] I'd mused that "there might be some value in a
> > README or comments
> > addition that would be a guide to what the various hash
> > implementations are useful for...so that we have something to make the
> > code base a bit more discoverable."
>
> +1
>
> > I'd solicited feedback from Andres (as the author of the simplehash
> > implementation) and gotten further explanation from Tomas (both cc'd
> > here) and have tried to condense that into the comment changes in this
> > patch series.
> >
> > v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch
> > Contains the summaries mentioned above.
>
> + * - It supports partitioning, which is useful for shared memory access using
>
> I wonder if we should say a bit more about the shared memory mode.
> Shared memory dynahash tables are allocated in a fixed size area at
> startup, and are discoverable by name in other other processes that
> need to get access to them, while simplehash assumes that it can get
> memory from a MemoryContext or an allocator with a malloc/free-style
> interface, which isn't very well suited for use in shared memory.
> (I'm sure you can convince it to work in shared memory with some
> work.)

Added.

> > v1-0002-Improve-simplehash-usage-notes.patch
>
> + *    For convenience the hash table create functions accept a void pointer
> + *    will be stored in the hash table type's member private_data.
>
> *that* will be stored?

Fixed.

> > v1-0003-Show-sample-simplehash-method-signatures.patch
> > I find it hard to read the macro code "templating" particularly for
> > seeing what the available API is and so added sample method signatures
> > in comments to the macro generated method signature defines.
>
> I didn't double-check all the expansions of the macros but +1 for this
> idea, it's very useful.

James

Вложения

Re: Comment simplehash/dynahash trade-offs

От
Thomas Munro
Дата:
On Sat, Aug 1, 2020 at 7:22 AM James Coleman <jtc331@gmail.com> wrote:
> [v2 patch set]

I ran it through pgindent which insisted on adding some newlines, I
manually replaced some spaces with tabs to match nearby lines, I added
some asterisks in your example function prototypes where <element> is
returned because they seemed to be missing, and I pushed this.
Thanks!



Re: Comment simplehash/dynahash trade-offs

От
James Coleman
Дата:
On Fri, Jul 31, 2020 at 8:17 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Sat, Aug 1, 2020 at 7:22 AM James Coleman <jtc331@gmail.com> wrote:
> > [v2 patch set]
>
> I ran it through pgindent which insisted on adding some newlines, I
> manually replaced some spaces with tabs to match nearby lines, I added
> some asterisks in your example function prototypes where <element> is
> returned because they seemed to be missing, and I pushed this.
> Thanks!


Thanks for reviewing and committing!

James



Re: Comment simplehash/dynahash trade-offs

От
David Rowley
Дата:
On Sat, 1 Aug 2020 at 12:17, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Sat, Aug 1, 2020 at 7:22 AM James Coleman <jtc331@gmail.com> wrote:
> > [v2 patch set]
>
> I ran it through pgindent which insisted on adding some newlines, I
> manually replaced some spaces with tabs to match nearby lines, I added
> some asterisks in your example function prototypes where <element> is
> returned because they seemed to be missing, and I pushed this.
> Thanks!

I was just reading over this and wondered about the following:

+ *   The element type is required to contain a "uint32 status" member.

I see that PagetableEntry does not follow this and I also didn't
follow it when writing the Result Cache patch in [1]. I managed to
shrink the struct I was using for the hash table by 4 bytes by using a
char instead of an int. That sounds like a small amount of memory, but
it did result in much better cache hit ratios in the patch

Maybe it would be better just to get rid of the enum and just #define
the values. It seems unlikely that we're every going to need many more
states than what are there already, let along more than, say 127 of
them. It does look like manifest_file could be shrunk down a bit too
by making the status field a char.

[1] https://www.postgresql.org/message-id/flat/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com



Re: Comment simplehash/dynahash trade-offs

От
David Rowley
Дата:
On Sun, 2 Aug 2020 at 11:16, David Rowley <dgrowleyml@gmail.com> wrote:
> Maybe it would be better just to get rid of the enum and just #define
> the values. It seems unlikely that we're every going to need many more
> states than what are there already, let along more than, say 127 of
> them. It does look like manifest_file could be shrunk down a bit too
> by making the status field a char.

This didn't turn out quite as pretty as I had imagined.  I needed to
leave the two statuses defined in simplehash.h so that callers could
make use of them. (Result Cache will do this).

The change here would be callers would need to use SH_STATUS_IN_USE
rather than <prefix>_SH_IN_USE.

I'm not really that sold on doing things this way. I really just don't
want to have to make my status field a uint32 in Result Cache per what
the new comment states we must do. If there's a nicer way, then
perhaps that's worth considering.

David

Вложения

Re: Comment simplehash/dynahash trade-offs

От
Thomas Munro
Дата:
On Sun, Aug 2, 2020 at 1:54 PM David Rowley <dgrowleyml@gmail.com> wrote:
> On Sun, 2 Aug 2020 at 11:16, David Rowley <dgrowleyml@gmail.com> wrote:
> > Maybe it would be better just to get rid of the enum and just #define
> > the values. It seems unlikely that we're every going to need many more
> > states than what are there already, let along more than, say 127 of
> > them. It does look like manifest_file could be shrunk down a bit too
> > by making the status field a char.
>
> This didn't turn out quite as pretty as I had imagined.  I needed to
> leave the two statuses defined in simplehash.h so that callers could
> make use of them. (Result Cache will do this).
>
> The change here would be callers would need to use SH_STATUS_IN_USE
> rather than <prefix>_SH_IN_USE.
>
> I'm not really that sold on doing things this way. I really just don't
> want to have to make my status field a uint32 in Result Cache per what
> the new comment states we must do. If there's a nicer way, then
> perhaps that's worth considering.

Agreed that the new comment is wrong and should be changed.

I think you can probably go further, though, and make it require no
storage at all by making it optionally "intrusive", by using a special
value in an existing member, and supplying an expression to set and
test for that value.  For example, maybe some users have a pointer but
never want to use NULL, and maybe some users already have a field
holding various flags that are one bit wide and can spare a bit.



Re: Comment simplehash/dynahash trade-offs

От
David Rowley
Дата:
On Mon, 3 Aug 2020 at 11:10, Thomas Munro <thomas.munro@gmail.com> wrote:
> I think you can probably go further, though, and make it require no
> storage at all by making it optionally "intrusive", by using a special
> value in an existing member, and supplying an expression to set and
> test for that value.  For example, maybe some users have a pointer but
> never want to use NULL, and maybe some users already have a field
> holding various flags that are one bit wide and can spare a bit.

I agree that it would be good to allow users of simplehash.h that
additional flexibility. It may allow additional memory savings.
However, it would mean we'd need to do some additional work when we
create and grow the hash table to ensure that we mark new buckets as
empty. For now, we get that for free with the zeroing of the newly
allocated memory, but we couldn't rely on that if we allowed users to
define their own macro.

It looks like none of the current callers could gain from this

1. TupleHashEntryData does not have any reusable fields. The status
should fit in the padding on a 64-bit machine anyway.
2. PagetableEntry already has a status that fits into the padding.
3. manifest_file could have its status moved to the end of the struct
and made into a char and the struct would be the same size as if the
field did not exist.

So, with the current users, we'd stand to lose more than we'd gain
from doing it that way.

David



Re: Comment simplehash/dynahash trade-offs

От
David Rowley
Дата:
On Mon, 3 Aug 2020 at 11:36, David Rowley <dgrowleyml@gmail.com> wrote:
> So, with the current users, we'd stand to lose more than we'd gain
> from doing it that way.

FWIW, I'd be ok with just:

- *       The element type is required to contain a "uint32 status" member.
+ *       The element type is required to contain an integer-based
"status" member
+ *       which can store the range of values defined in the SH_STATUS enum.

David



Re: Comment simplehash/dynahash trade-offs

От
Thomas Munro
Дата:
On Mon, Aug 3, 2020 at 11:42 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Mon, 3 Aug 2020 at 11:36, David Rowley <dgrowleyml@gmail.com> wrote:
> > So, with the current users, we'd stand to lose more than we'd gain
> > from doing it that way.
>
> FWIW, I'd be ok with just:
>
> - *       The element type is required to contain a "uint32 status" member.
> + *       The element type is required to contain an integer-based
> "status" member
> + *       which can store the range of values defined in the SH_STATUS enum.

Thanks for the correction.  Pushed.