Re: Do we want a hashset type?

Поиск
Список
Период
Сортировка
От jian he
Тема Re: Do we want a hashset type?
Дата
Msg-id CACJufxFXSpL67dL+bgvB4DiJo7ovp8bxvtW00L1hV7qDzUb_6Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Do we want a hashset type?  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: Do we want a hashset type?  ("Joel Jacobson" <joel@compiler.org>)
Список pgsql-hackers

I played around array_func.c
many of the code can be used for multiset data type.
now I imagine multiset as something like one dimension array. (nested is somehow beyond the imagination...).

* A standard varlena array has the following internal structure:
 *   <vl_len_> - standard varlena header word
 *   <ndim> - number of dimensions of the array
 *   <dataoffset> - offset to stored data, or 0 if no nulls bitmap
 *   <elemtype> - element type OID
 *   <dimensions> - length of each array axis (C array of int)
 *   <lower bnds> - lower boundary of each dimension (C array of int)
 *   <null bitmap> - bitmap showing locations of nulls (OPTIONAL)
 *   <actual data> - whatever is the stored data

in set/multiset, we don't need {ndim,lower bnds}, since we are only one dimension, also we don't need subscript.
So for set we can have following
*  int32 vl_len_; /* varlena header (do not touch directly!) */
int32       capacity; /* # of capacity */
*  int32 dataoffset; /* offset to data, or 0 if no bitmap */
int32 nelements; /* number of items added to the hashset */
Oid elemtype; /* element type OID */
 *  <null bitmap> - bitmap showing locations of nulls (OPTIONAL)
 *  <bitmap> - bitmap showing this slot is empty or not  ( I am not sure this part)
 *  <actual data> - whatever is the stored data

many of the code in array_func.c can be reused.
array_isspace     ==> set_isspace
ArrayMetaState  ==> SetMetastate
ArrayCount         ==> SetCount (similar to ArrayCount return the dimension of set, should be zero (empty set) or one)
ArrayParseState ==> SetParseState
ReadArrayStr     ==>  ReadSetStr

attached is a demo shows that use array_func.c to parse cstring. have similar effect of  array_in.
for multiset_in set type input function. if no duplicate required then multiset_in would just like array, so more code can be copied from array_func.c
but if unique required then we need first palloc0(capacity * datums (size per type)) then put valid value into to a specific slot? 




On Fri, Jun 23, 2023 at 6:27 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
On 6/22/23 19:52, Joel Jacobson wrote:
> On Tue, Jun 20, 2023, at 14:10, Tomas Vondra wrote:
>> This is also what the SQL standard does for multisets - there's SQL:20nn
>> draft at http://www.wiscorp.com/SQLStandards.html, and the <member
>> predicate> section (p. 475) explains how this should work with NULL.
>
> I've looked again at the paper you mentioned and found something intriguing
> in section 2.6 (b). I'm a bit puzzled about this: why would we want to return
> null when we're certain it's not null but just doesn't have any elements?
>
> In the same vein, it says, "If it has more than one element, an exception is
> raised." Makes sense to me, but what about when there are no elements at all?
> Why not raise an exception in that case too?
>
> The ELEMENT function is designed to do one simple thing: return the element of
> a multiset if the multiset has only 1 element. This seems very similar to how
> our INTO STRICT operates, right?
>

I agree this looks a bit weird, but that's what I mentioned - this is an
initial a proposal, outlining the idea. Inevitably some of the stuff
will get reworked or just left out of the final version. It's useful
mostly to explain the motivation / goal.

I believe that's the case here - I don't think the ELEMENT got into the
standard at all, and the NULL rules for the MEMBER OF clause seem not to
have these strange bits.

> The SQL:20nn seems to still be in draft form, and I can't help but wonder if we
> should propose a bit of an improvement here:
>
> "If it doesn't have exactly one element, an exception is raised."
>
> Meaning, it would raise an exception both if there are more elements,
> or zero elements (no elements).
>
> I think this would make the semantics more intuitive and less surprising.
>

Well, the simple truth is the draft is freely available, but you'd need
to buy the final version. It doesn't mean it's still being worked on or
that no SQL standard was released since then. In fact, SQL 2023 was
released a couple weeks ago [1].

It'd be interesting to know the version that actually got into the SQL
standard (if at all), but I don't have access to the standard yet.

regards


[1] https://www.iso.org/standard/76584.html

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
 I recommend David Deutsch's <<The Beginning of Infinity>>

  Jian


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Pyhalov
Дата:
Сообщение: Re: memory leak in trigger handling (since PG12)
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Infinite Interval