Re: Do we want a hashset type?

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: Do we want a hashset type?
Дата
Msg-id fae7d987-564d-7739-8a59-bc7f5186ac78@dunslane.net
обсуждение исходный текст
Ответ на Re: Do we want a hashset type?  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: Do we want a hashset type?  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Re: Do we want a hashset type?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers


On 2023-06-19 Mo 05:21, Tomas Vondra wrote:

On 6/18/23 18:45, Andrew Dunstan wrote:
On 2023-06-16 Fr 20:38, Joel Jacobson wrote:
New patch is attached, which will henceforth always be a complete patch,
to avoid the hassle of having to assemble incremental patches.

Cool, thanks.

It might still be convenient to keep it split into smaller, easier to
review, parts. A patch that introduces basic functionality and then
patches adding various "advanced" features.

A couple of random thoughts:


. It might be worth sending a version number with the send function
(c.f. jsonb_send / jsonb_recv). That way would would not be tied forever
to some wire representation.

. I think there are some important set operations missing: most notably
intersection, slightly less importantly asymmetric and symmetric
difference. I have no idea how easy these would be to add, but even for
your stated use I should have thought set intersection would be useful
("Who is a member of both this set of friends and that set of friends?").

. While supporting int4 only is OK for now, I think we would at least
want to support int8, and probably UUID since a number of systems I know
of use that as an object identifier.

I agree we should aim to support a wider range of data types. Could we
have a polymorphic type, similar to what we do for arrays and ranges? In
fact, CREATE TYPE allows specifying ELEMENT, so wouldn't it be possible
to implement this as a special variant of an array? Would be better than
having a set of functions for every supported data type.

(Note: It might still be possible to have a special implementation for
selected fixed-length data types, as it allows optimization at compile
time. But that could be done later.)


Interesting idea. There's also the keyword SETOF that we could possibly make use of.




The other thing I've been thinking about is the SQL syntax and what does
the SQL standard says about this.

AFAICS the standard only defines arrays and multisets. Arrays are pretty
much the thing we have, including the ARRAY[] constructor etc. Multisets
are similar to hashset discussed here, except that it tracks the number
of elements for each value (which would be trivial in hashset).

So if we want to make this a built-in feature, maybe we should aim to do
the multiset thing, with the standard SQL syntax? Extending the grammar
should not be hard, I think. I'm not sure of the underlying code
(ArrayType, ARRAY_SUBLINK stuff, etc.) we could reuse or if we'd need a
lot of separate code doing that.



Yes, Multisets (a.k.a. bags and a large number of other names) would be interesting. But I wouldn't like to abandon pure sets either. Maybe a typmod indicating the allowed multiplicity of the type?



cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Assert while autovacuum was executing
Следующее
От: "Joel Jacobson"
Дата:
Сообщение: Re: Do we want a hashset type?