Re: jsonb and nested hstore

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: jsonb and nested hstore
Дата
Msg-id 531F9516.8080808@fuzzy.cz
обсуждение исходный текст
Ответ на Re: jsonb and nested hstore  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: jsonb and nested hstore  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
Hi,

I've spent a few hours stress-testing this a bit - loading a mail
archive with ~1M of messages (with headers stored in a jsonb column) and
then doing queries on that. Good news - no crashes or any such issues so
far. The queries that I ran manually seem to return sane results.

The only problem I ran into is with limited index row size with GIN
indexes. I understand it's not a bug, but I admit I haven't realized I
might run into it in this case ...

The data I used for testing is just a bunch of e-mail messages, with
headers stored as jsonb, so each row has something like this in
"headers" column:

{"from" : "John Doe <john@example.com>","to" : ["Jane Doe <jane@example.com>", "Jack Doe <jack@example.com>"],"cc" :
...,"bcc": ...,... various other headers ...
 
}

The snag is that some of the header values may be very long, exceeding
the limit of 1352 bytes and causing errors like this:
 ERROR:  index row size 1416 exceeds maximum 1352 for index "gin_idx"

A good example of such header is "dkim-signature" which basically
contains the whole message digitally signed with DKIM. The signature
tends to be long and non-compressible, thanks to the signature.

I'm wondering what's the best way around this, because I suspect many
new users (especially those attracted by jsonb and GIN improvements)
will run into this. Maybe not immediately, but eventully they'll try to
insert a jsonb with long value, and it will fail ...

With btree indexes on text I would probably create an index on
substr(column,0,1000) or something like that, but doing that with JSON
seems a bit strange.

I assume we need to store the actual values in the GIN index (so a hash
is not sufficient), right?

GIST indexes work, but with that I have to give up the significant
performance gains that we got thanks to Alexander's GIN patches.

regards
Tomas



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: GIN improvements part2: fast scan
Следующее
От: Tom Lane
Дата:
Сообщение: Re: db_user_namespace a "temporary measure"