Re: Unique index on hash of jsonb value - correct solution?

Поиск
Список
Период
Сортировка
От Laurenz Albe
Тема Re: Unique index on hash of jsonb value - correct solution?
Дата
Msg-id 3dde4e83382ccd371216003f5d486ef49bc3d08c.camel@cybertec.at
обсуждение исходный текст
Ответ на Unique index on hash of jsonb value - correct solution?  (Albrecht Dreß <albrecht.dress@arcor.de>)
Ответы Re: Unique index on hash of jsonb value - correct solution?  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-general
On Mon, 2020-05-18 at 18:43 +0200, Albrecht Dreß wrote:
> in a database I have a table with a text, a jsonb and a bytea column, which
> together shall be unique, like:
> 
> <snip>
>   Column  |  Type  | Collation | Nullable |               Default
> ---------+--------+-----------+----------+-------------------------------------
>   id      | bigint |           | not null | nextval('demotab_id_seq'::regclass)
>   textval | text   |           | not null |
>   jsonval | jsonb  |           | not null |
>   blobval | bytea  |           | not null |
> Indexes:
>      "demo_idx" UNIQUE, btree (textval, jsonval, blobval)
> </snip>
> 
> This seems to work just fine in most cases, but adding a bigger json value (the text
> and the bytea columns are always short) results in a “index row size 2840 exceeds
> maximum 2712 for index "…"” error.  Following the hint in the error message,
> I replaced the index by
> 
> <snip>
> Indexes:
>      "demo_idx" UNIQUE, btree (textval, md5(jsonval::text), blobval)
> </snip>
> 
> which seems to solve the issue.
> 
> My question: is this approach (a) correct and (b) still safe if the items in the
> jsonb (always a dict in my case) are re-ordered?  I tested a few cases, and trying
> to insert something like e.g. '{"a":1,"b":2}' and '{"b":2,"a":1}' actually does
> produce the same hash (i.e. the 2nd insert attempt is rejected due to the index),
> but is this guaranteed by design for every case?  Or is there a better solution
> for this use case?

"jsonb" uses an internal binary representation that reorders the attributes in
a deterministic fashin, so yes, that is guaranteed.

I would use an "md5" hash for the "bytea" column as well to keep the index smaller.

There can be collisions with an md5 hash, so it is possible for duplicates to
creep in.  Besides, if you have an auto-generated "bigint" column, you should
make that the primary key.  Perhaps then the uniqueness condition is no longer
necessary.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: template0 needing vacuum freeze?
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Hot and PITR backups in same server