Re: Do we want a hashset type?

Поиск
Список
Период
Сортировка
От jian he
Тема Re: Do we want a hashset type?
Дата
Msg-id CACJufxE=QbDzYYay4Z6MJ=BqdkYiHi_P34YOKduyQ0yyXmMhOw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Do we want a hashset type?  ("Joel Jacobson" <joel@compiler.org>)
Ответы Re: Do we want a hashset type?  ("Joel Jacobson" <joel@compiler.org>)
Список pgsql-hackers


On Fri, Jun 9, 2023 at 6:58 PM Joel Jacobson <joel@compiler.org> wrote:
On Thu, Jun 8, 2023, at 12:19, Tomas Vondra wrote:
> Would you be interested in helping with / working on some of that? I
> don't have immediate need for this stuff, so it's not very high on my
> TODO list.

Sure, I'm willing to help!

I've attached a patch that works on some of the items on your list,
including some additions to the README.md.

There were a bunch of places where `maxelements / 8` caused bugs,
that had to be changed to do proper integer ceiling division:

-       values = (int32 *) (set->data + set->maxelements / 8);
+       values = (int32 *) (set->data + (set->maxelements + 7) / 8);

Side note: I wonder if it would be good to add CEIL_DIV and FLOOR_DIV macros
to the PostgreSQL source code in general, since it's easy to make this mistake,
and quite verbose/error-prone to write it out manually everywhere.
Such macros could simplify code in e.g. numeric.c.

> There's a bunch of stuff that needs to be improved to make this properly
> usable, like:
>
> 1) better hash table implementation
TODO

> 2) input/output functions
I've attempted to implement these.
I thought comma separated values wrapped around curly braces felt as the most natural format,
example:
SELECT '{1,2,3}'::hashset;

> 3) support for other types (now it only works with int32)
TODO

> 4) I wonder if this might be done as an array-like polymorphic type.
That would be nice!
I guess the work-around would be to store the actual value of non-int type
in a lookup table, and then hash the int-based primary key in such table.

Do you think later implementing polymorphic type support would
mean a more or less complete rewrite, or can we carry on with int32-support
and add it later on?

> 5) more efficient storage format, with versioning etc.
TODO

> 6) regression tests
I've added some regression tests.

> Right. IMHO the query language is a separate thing, you still need to
> evaluate the query somehow - which is where hashset applies.

Good point, I fully agree.

/Joel



Hi, I am quite new about C.....
The following function I have 3 questions.
1. 7691,4201, I assume they are just random prime ints?
2. I don't get the last return set, even the return type should be bool.
3. I don't understand 13 in hash = (hash + 13) % set->maxelements;

static bool
hashset_contains_element(hashset_t *set, int32 value)
{	int		byte;	int		bit;	uint32	hash;	char   *bitmap;	int32  *values;
	hash = ((uint32) value * 7691 + 4201) % set->maxelements;
	bitmap = set->data;	values = (int32 *) (set->data + set->maxelements / 8);
	while (true)	{		byte = (hash / 8);		bit = (hash % 8);
		/* found an empty slot, value is not there */		if ((bitmap[byte] & (0x01 << bit)) == 0)			return false;
		/* is it the same value? */		if (values[hash] == value)			return true;
		/* move to the next element */		hash = (hash + 13) % set->maxelements;	}
	return set;
}




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: Views no longer in rangeTabls?
Следующее
От: "Fujii.Yuki@df.MitsubishiElectric.co.jp"
Дата:
Сообщение: RE: Partial aggregates pushdown