Re: Unique index on hash of jsonb value - correct solution?

Поиск

Список

Период

Сортировка

От	Laurenz Albe
Тема	Re: Unique index on hash of jsonb value - correct solution?
Дата	19 мая 2020 г. 09:05:00
Msg-id	3dde4e83382ccd371216003f5d486ef49bc3d08c.camel@cybertec.at обсуждение исходный текст
Ответ на	Unique index on hash of jsonb value - correct solution? (Albrecht Dreß <albrecht.dress@arcor.de>)
Ответы	Re: Unique index on hash of jsonb value - correct solution? (Stephen Frost <sfrost@snowman.net>)
Список	pgsql-general

Дерево обсуждения

On Mon, 2020-05-18 at 18:43 +0200, Albrecht Dreß wrote:
> in a database I have a table with a text, a jsonb and a bytea column, which
> together shall be unique, like:
> 
> <snip>
>   Column  |  Type  | Collation | Nullable |               Default
> ---------+--------+-----------+----------+-------------------------------------
>   id      | bigint |           | not null | nextval('demotab_id_seq'::regclass)
>   textval | text   |           | not null |
>   jsonval | jsonb  |           | not null |
>   blobval | bytea  |           | not null |
> Indexes:
>      "demo_idx" UNIQUE, btree (textval, jsonval, blobval)
> </snip>
> 
> This seems to work just fine in most cases, but adding a bigger json value (the text
> and the bytea columns are always short) results in a “index row size 2840 exceeds
> maximum 2712 for index "…"” error.  Following the hint in the error message,
> I replaced the index by
> 
> <snip>
> Indexes:
>      "demo_idx" UNIQUE, btree (textval, md5(jsonval::text), blobval)
> </snip>
> 
> which seems to solve the issue.
> 
> My question: is this approach (a) correct and (b) still safe if the items in the
> jsonb (always a dict in my case) are re-ordered?  I tested a few cases, and trying
> to insert something like e.g. '{"a":1,"b":2}' and '{"b":2,"a":1}' actually does
> produce the same hash (i.e. the 2nd insert attempt is rejected due to the index),
> but is this guaranteed by design for every case?  Or is there a better solution
> for this use case?

"jsonb" uses an internal binary representation that reorders the attributes in
a deterministic fashin, so yes, that is guaranteed.

I would use an "md5" hash for the "bytea" column as well to keep the index smaller.

There can be collisions with an md5 hash, so it is possible for duplicates to
creep in.  Besides, if you have an auto-generated "bigint" column, you should
make that the primary key.  Perhaps then the uniqueness condition is no longer
necessary.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

В списке pgsql-general по дате отправления:

Предыдущее

От: Adrian Klaver
Дата: 19 мая 2020 г., 05:20:27
Сообщение: Re: template0 needing vacuum freeze?

Следующее

От: Kyotaro Horiguchi
Дата: 19 мая 2020 г., 10:14:43
Сообщение: Re: Hot and PITR backups in same server

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Unique index on hash of jsonb value - correct solution?

Предыдущее

Следующее