Обсуждение: Document hashtext() and Friends?

Поиск
Список
Период
Сортировка

Document hashtext() and Friends?

От
"David E. Wheeler"
Дата:
Hackers,

Is there a reason that hashtext() and friends are not documented? Given that they’re likely to be used more and more
forpartitioning and sharding, I think it would be useful to do so, starting with something like this. Comments? 

*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1557,1562 ****
--- 1557,1577 ----       <row>        <entry>         <indexterm>
+          <primary>hashtext</primary>
+         </indexterm>
+         <literal><function>hashtext(<parameter>string</parameter>)</function></literal>
+        </entry>
+        <entry><type>int</type></entry>
+        <entry>
+         Generate a hash value for string.
+        </entry>
+        <entry><literal>hashtext('greetings, human')</literal></entry>
+        <entry><literal>-1132466231</literal></entry>
+       </row>
+
+       <row>
+        <entry>
+         <indexterm>          <primary>left</primary>         </indexterm>
<literal><function>left(<parameter>str</parameter><type>text</type>, 

Best

David



Re: Document hashtext() and Friends?

От
Tom Lane
Дата:
"David E. Wheeler" <david@justatheory.com> writes:
> Is there a reason that hashtext() and friends are not documented?

Yes.  They are internal functions that exist for the convenience of the
system, not for users.  We've discussed this before, and decided that
we don't want people to rely on them continuing to have exactly the
current behavior.  One example of a possible future change is to widen
the results from 4 bytes to 8.
        regards, tom lane


Re: Document hashtext() and Friends?

От
Michael Glaesemann
Дата:
On Feb 21, 2012, at 15:01, Tom Lane wrote:

> "David E. Wheeler" <david@justatheory.com> writes:
>> Is there a reason that hashtext() and friends are not documented?
>
> Yes.  They are internal functions that exist for the convenience of the
> system, not for users.  We've discussed this before, and decided that
> we don't want people to rely on them continuing to have exactly the
> current behavior.  One example of a possible future change is to widen
> the results from 4 bytes to 8.

And hashtext *has* changed across versions, which is why Peter Eisentraut published a version-independent hash function
library:https://github.com/petere/pgvihash 

Michael Glaesemann
grzm seespotcode net





Re: Document hashtext() and Friends?

От
Peter Geoghegan
Дата:
On 21 February 2012 20:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "David E. Wheeler" <david@justatheory.com> writes:
>> Is there a reason that hashtext() and friends are not documented?
>
> Yes.  They are internal functions that exist for the convenience of the
> system, not for users.  We've discussed this before, and decided that
> we don't want people to rely on them continuing to have exactly the
> current behavior.  One example of a possible future change is to widen
> the results from 4 bytes to 8.

My pg_stat_statements normalisation patch actually extends the
underlying hash_any() function to support 8 byte results, exactly as
currently anticipated by comments above that function, while supplying
a compatibility macro that is used by existing hash_any() clients.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Document hashtext() and Friends?

От
"David E. Wheeler"
Дата:
On Feb 21, 2012, at 12:11 PM, Michael Glaesemann wrote:

> And hashtext *has* changed across versions, which is why Peter Eisentraut published a version-independent hash
functionlibrary: https://github.com/petere/pgvihash 

Yes, Marko wrote one, too:
 https://github.com/markokr/pghashlib

But as I’m about to build a system that is going to have many billions of nodes, I could use a variant that returns a
bigint.Anyone got a pointer to something like that? 

Thanks,

David



Re: Document hashtext() and Friends?

От
"David E. Wheeler"
Дата:
On Feb 21, 2012, at 12:14 PM, David E. Wheeler wrote:

>> And hashtext *has* changed across versions, which is why Peter Eisentraut published a version-independent hash
functionlibrary: https://github.com/petere/pgvihash 
>
> Yes, Marko wrote one, too:
>
>  https://github.com/markokr/pghashlib

Oh, and these are great extensions for PGXN. Any chance of seeing them submitted soon, Peter and Marko?

Thanks,

David



Re: Document hashtext() and Friends?

От
Tom Lane
Дата:
Peter Geoghegan <peter@2ndquadrant.com> writes:
> My pg_stat_statements normalisation patch actually extends the
> underlying hash_any() function to support 8 byte results,

... er, what?  That seems rather out of scope for that patch,
not to mention unnecessary.
        regards, tom lane


Re: Document hashtext() and Friends?

От
"ktm@rice.edu"
Дата:
On Tue, Feb 21, 2012 at 12:14:03PM -0800, David E. Wheeler wrote:
> On Feb 21, 2012, at 12:11 PM, Michael Glaesemann wrote:
>
> > And hashtext *has* changed across versions, which is why Peter Eisentraut published a version-independent hash
functionlibrary: https://github.com/petere/pgvihash 
>
> Yes, Marko wrote one, too:
>
>   https://github.com/markokr/pghashlib
>
> But as I’m about to build a system that is going to have many billions of nodes, I could use a variant that returns a
bigint.Anyone got a pointer to something like that? 
>
> Thanks,
>
> David
>

Hi David,

The existing hash_any() function can return a 64-bit hash, instead of the current
32-bit hash, by returning the b and c values, instead of the current which just
returns the c value, per the comment at the start of the function. It sounded like
Peter had already done this in his pg_stat_statements normalization patch, but I
could not find it.

Regards,
Ken


Re: Document hashtext() and Friends?

От
Peter Geoghegan
Дата:
On 21 February 2012 20:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <peter@2ndquadrant.com> writes:
>> My pg_stat_statements normalisation patch actually extends the
>> underlying hash_any() function to support 8 byte results,
>
> ... er, what?  That seems rather out of scope for that patch,
> not to mention unnecessary.

Well, assuming that you deem a uint64 query_id to be necessary, and
based on your earlier comments I take it that you do, that seemed like
the most natural way of going about getting such a value, particularly
since this change is anticipated by the comments above the function.

Of course, any further input you can give on that patch would be most
appreciated. I'm particularly eager to resolve the problems with core
infrastructure (such as that apparent bug with some Const locations),
so that we can at the very least be sure that the community won't have
to wait for the release of 9.3 at the earliest before having a
normalisation capability with stat_statements.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services