Re: Do we want a hashset type?

Поиск
Список
Период
Сортировка
От Joel Jacobson
Тема Re: Do we want a hashset type?
Дата
Msg-id 9f370c3e-9a1b-4db0-ad93-06f14e7dcc37@app.fastmail.com
обсуждение исходный текст
Ответ на Re: Do we want a hashset type?  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: Do we want a hashset type?  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
On Mon, Jun 5, 2023, at 01:44, Tomas Vondra wrote:
> Anyway, I hacked together a trivial set backed by an open addressing
> hash table:
>
>   https://github.com/tvondra/hashset
>
> It's super-basic, providing just some bare minimum of functions, but
> hopefully good enough for experiments.
>
> - hashset data type - hash table in varlena
> - hashset_init - create new hashset
> - hashset_add / hashset_contains - add value / check membership
> - hashset_merge - merge two hashsets
> - hashset_count - count elements
> - hashset_to_array - return
> - hashset(int) aggregate
>
> This allows rewriting the recursive query like this:
>
>     WITH RECURSIVE friends_of_friends AS (
...

Nice! I get similar results, 737 ms (hashset) vs 1809 ms (array_agg).

I think if you just add one more hashset function, it will be a win against roaringbitmap, which is 400 ms.

The missing function is an agg that takes hashset as input and returns hashset, similar to roaringbitmap's
rb_or_agg().

With such a function, we could add an adjacent nodes hashset column to the `nodes` table, which would eliminate the
needto scan the edges table for graph traversal:
 

We could then benchmark roaringbitmap against hashset querying the same table:

CREATE TABLE users AS
SELECT
    from_node AS id,
    rb_build_agg(to_node) AS friends_roaringbitmap,
    hashset(to_node) AS friends_hashset
FROM edges
GROUP BY 1;

/Joel



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Cleaning up nbtree after logical decoding on standby work
Следующее
От: Joe Conway
Дата:
Сообщение: Re: Let's make PostgreSQL multi-threaded