Re: Do we want a hashset type?

Поиск

Список

Период

Сортировка

От	Joel Jacobson
Тема	Re: Do we want a hashset type?
Дата	8 июня 2023 г. 12:41:35
Msg-id	0e98dbc4-4116-4e15-bf72-3a7d4ab0d27d@app.fastmail.com обсуждение исходный текст
Ответ на	Re: Do we want a hashset type? (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы	Re: Do we want a hashset type?
Список	pgsql-hackers

Дерево обсуждения

On Wed, Jun 7, 2023, at 19:37, Tomas Vondra wrote:
> Interesting, considering how dumb the the hash table implementation is.

That's promising.

>> I tested Neo4j and the results are surprising; it appears to be significantly *slower*.
>> However, I've probably misunderstood something, maybe I need to add some index or something.
>> Even so, it's interesting it's apparently not fast "by default".
>> 
>
> No idea how to fix that, but it's rather suspicious.

I've had a graph-db expert review my benchmark, and he suggested adding an index:

CREATE INDEX FOR (n:User) ON (n.id)

This did improve the execution time for Neo4j a bit, down from 819 ms to 528 ms, but PostgreSQL 299 ms is still a win.

Benchmark here: https://github.com/joelonsql/graph-query-benchmarks
Note, in this benchmark, I only test the naive RECURSIVE CTE approach using array_agg(DISTINCT ...).
And I couldn't even test the most connected user with Neo4j, the query never finish for some reason,
so I had to test with a less connected user.

The graph expert also said that other more realistic graph use-cases might be "multi-relational",
and pointed me to a link: https://github.com/totogo/awesome-knowledge-graph#knowledge-graph-dataset
No idea how such multi-relational datasets would affect the benchmarks.

I think we have already strong indicators that PostgreSQL with a hashset type will from a relative
performance perspective, do just fine processing basic graph queries, even with large datasets.

Then there will always be the case when users primarily write very different graph queries all day long,
who might prefer a graph query *language*, like SQL/PGQ in SQL:2023, Cypher or Gremlin.

/Joel

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Richard Guo
Дата: 08 июня 2023 г., 12:11:17
Сообщение: Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys

Следующее

От: Dave Cramer
Дата: 08 июня 2023 г., 12:53:11
Сообщение: Re: Named Prepared statement problems and possible solutions

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Do we want a hashset type?

Предыдущее

Следующее