Re: JSON, JSONB questions
От | Merlin Moncure |
---|---|
Тема | Re: JSON, JSONB questions |
Дата | |
Msg-id | CAHyXU0wnCWb3Ltj8xMAFH4Pdivb3OTF8Ap6KKi6ScCDE3Z8+sA@mail.gmail.com обсуждение исходный текст |
Ответ на | JSON, JSONB questions (Tom C <barteri@gmail.com>) |
Список | pgsql-novice |
On Wed, Sep 28, 2016 at 9:07 PM, Tom C <barteri@gmail.com> wrote: > I'm new to Postgres so forgive me if these are dumb question. > > I'm trying to understanding why you would use JSON over JSONB. From what I > have seen so far, JSONB doesn't take that much more space than JSON and > offers better indexing and search capabilities than plain JSON data type. Is > it really just about preserving whitespace and structure? If that was the > case then why bother having the JSON data type at all? Why not just stick > with varchar This was broadly discussed in the archives. The reasons are: *) pure varchar would drop validation and break code *) json came out first and dropping support would break some edge cases *) yes, whitespace and structure are in important in some cases. *) jsonb is pessimal for pure serialization strategies. If all I do is run various variants of to_json, marshaling the json to a hash table only to immediately re-serialize it to a string is pessimal. There are a lot of uses for json in postgres besides document storage *) some operations are faster textually. The basic rule of thumb is that if you are actually storing data, jsonb is a better choice. At the time the json functionality was being built out, I argued (and still think) the real mistakes made were: *) not packaging functionality into extensions *) using type prefixed function names > Second question is around indexing and searching on data types of JSONB. I > understand that you can create indexes for specific fields within the JSON. > Is there a performance penalty when adding indexes to specific fields in > large JSON payload? For example, let's say I have a very complex nested JSON > payload stored in a JSONB column. All I'm interested in indexing is a > collection array within the JSON payload. Will it be more efficient if I > break out the array into a separate JSONB column? Basically, no. The index lookup will run the same speed either way since (at least for btree) the indexed data is present in the index itself. During the scan you still have to verify the visibility information in the heap but this has to be done regardless. I think (but did not measure) certain special cases (like BRIN index) might benefit from breaking out the data since you always have to refer back to the heap. > Third and final question is how do you create an index into for an array > element within the JSONB structure? postgres=# create table foo (j jsonb); CREATE TABLE postgres=# insert into foo values('[{"a": "b"}]'); INSERT 0 1 postgres=# create index on foo((j->0->>'a')); CREATE INDEX -- disincline the server from correctly preferring seq scans for 1 record tables postgres=# set enable_seqscan to false; SET postgres=# explain select * from foo where j->0->>'a' = 'b'; QUERY PLAN ───────────────────────────────────────────────────────────────────────── Index Scan using foo_expr_idx on foo (cost=0.12..8.14 rows=1 width=32) Index Cond: (((j -> 0) ->> 'a'::text) = 'b'::text) (2 rows) postgres=# select * from foo where j->0->>'a' = 'b'; j ────────────── [{"a": "b"}] merlin
В списке pgsql-novice по дате отправления: