Re: Do we want a hashset type?

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Do we want a hashset type?
Дата
Msg-id c8cb89a0-3cb8-7b70-c01d-4548128d9be3@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Do we want a hashset type?  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers

On 6/23/23 13:47, Andrew Dunstan wrote:
> 
> On 2023-06-23 Fr 04:23, Joel Jacobson wrote:
>> On Fri, Jun 23, 2023, at 08:40, jian he wrote:
>>> I played around array_func.c
>>> many of the code can be used for multiset data type.
>>> now I imagine multiset as something like one dimension array. (nested 
>>> is somehow beyond the imagination...).
>> Are you suggesting it might be a better idea to start over completely
>> and work on a new code base that is based on arrayfuncs.c,
>> and aim for MULTISET/SET or anyhashset from start, that would not
>> only support int4/int8/uuid but any type?
>>
> 
> Before we run too far down this rabbit hole, let's discuss the storage
> implications of using multisets. ISTM that for small base datums like
> integers it will be a substantial increase in size, since you'll need an
> addition int for the item count, unless some very clever tricks are played.
> 

I honestly don't quite understand what exactly is meant by the proposal
to "reuse array_func.c for multisets". We're implementing sets, not
multisets (those were mentioned only to illustrate behavior). And the
whole point is that sets are not arrays - no duplicates, ordering does
not matter (so no index).

I mentioned that maybe we can model sets based on arrays (say, gram.y
would do similar stuff for SET[] and ARRAY[], polymorphism), not that we
should store sets as arrays. Would it be possible - maybe, if we extend
arrays to also maintain some hash hash table. But I'd bet that'll just
make arrays more complex, and will make sets slower.

Or maybe I just don't understand the proposal. Perhaps it'd be best if
jian wrote a patch illustrating the idea, and showing how it performs
compared to the current approach.

As for the storage size, I don't think an extra "count" field would make
any measurable difference. If we're storing a hash table, we're bound to
have a couple percent of wasted space due to load factor (likely between
0.75 and 0.9).

> As for this older discussion referred to upthread, if the SQL Standards
> Committee hasn't acted on it by now it seem reasonable to think they are
> unlikely to.
> 

AFAIK multisets are included in SQL 2023, pretty much matching the draft
we discussed earlier. Yeah, it's unlikely to change in the future.

> Just for reference, Here's some description of Oracle's suport for
> Multisets from
>
<https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/Oracle-Support-for-Optional-Features-of-SQLFoundation2011.html#GUID-3BA98AEC-FAAD-4F21-A6AD-F696B5D36D56>:
> 

good to know


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: Bytea PL/Perl transform
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: logical decoding and replication of sequences, take 2