working around JSONB's lack of stats?
| От | Josh Berkus |
|---|---|
| Тема | working around JSONB's lack of stats? |
| Дата | |
| Msg-id | 54C738E1.8080405@agliodbs.com обсуждение исходный текст |
| Ответы |
Re: working around JSONB's lack of stats?
Re: working around JSONB's lack of stats? |
| Список | pgsql-performance |
Folks,
Currently, JSONB fields don't have statistics, and estimate a flat 1%
selectivity. This can result in poor query plans, and I'm wondering if
anyone has a suggested workaround for this short of hacking a new
selectivity function. For example, take the common case of using JSONB
to hold a list of "tags" for tagging documents:
Table "public.doc_tags_json"
Column | Type | Modifiers
--------+---------+-----------
doc_id | integer |
tags | jsonb |
Indexes:
"doc_tags_json_doc_id_idx" UNIQUE, btree (doc_id)
"doc_tags_json_tags_idx" gin (tags)
This query:
select doc_id
from doc_tags_json
where tags @> '[ "math", "physics" ]'
order by doc_id desc limit 25;
Uses this plan:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..709.79 rows=25 width=4) (actual time=24.529..340.499
rows=25 loops=1)
-> Index Scan Backward using doc_tags_json_doc_id_idx on
doc_tags_json (cost=0.43..283740.95 rows=10000 width=4) (actual
time=24.528..340.483 rows=25 loops=1)
Filter: (tags @> '["math", "physics"]'::jsonb)
Rows Removed by Filter: 1011878
Planning time: 0.090 ms
Execution time: 340.528 ms
It does this because it expects @> '["math", "physics"]' to match 10,000
rows, which means that it expects to only scan 25,000 entries in the
doc_id index to return the top 25. However, the matching condition is
much rarer than it thinks, so it's actually far faster to use the index
on the JSONB column:
drop index doc_tags_json_doc_id_idx;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=10517.08..10517.14 rows=25 width=4) (actual
time=7.594..7.602 rows=25 loops=1)
-> Sort (cost=10517.08..10542.08 rows=10000 width=4) (actual
time=7.593..7.596 rows=25 loops=1)
Sort Key: doc_id
Sort Method: top-N heapsort Memory: 26kB
-> Bitmap Heap Scan on doc_tags_json (cost=92.90..10234.89
rows=10000 width=4) (actual time=6.733..7.475 rows=257 loops=1)
Recheck Cond: (tags @> '["math", "physics"]'::jsonb)
Heap Blocks: exact=256
-> Bitmap Index Scan on doc_tags_json_tags_idx
(cost=0.00..90.40 rows=10000 width=0) (actual time=6.695..6.695 rows=257
loops=1)
Index Cond: (tags @> '["math", "physics"]'::jsonb)
Planning time: 0.093 ms
Execution time: 7.632 ms
On a normal column, I'd raise n_distinct to reflect the higher
selecivity of the search terms. However, since @> uses contsel,
n_distinct is ignored. Anyone know a clever workaround I don't
currently see?
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
В списке pgsql-performance по дате отправления: