Re: Indexing on JSONB field not working

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Indexing on JSONB field not working
Дата
Msg-id 20191221114003.yo56lv3o6thdvxtn@development
обсуждение исходный текст
Ответ на Indexing on JSONB field not working  ("Zhihong Zhang" <zhihong@gmail.com>)
Ответы Re: Indexing on JSONB field not working  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote:
>Run those 2 EXPLAINs with seqscan off and on. See what difference it makes!
>
>
>SET enable_seqscan = off;
>explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <
3.0limit 100;
 
>
>"Limit  (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
>"  ->  Index Scan using assets_floatvalue_idx on assets  (cost=0.43..2714072.57 rows=867607 width=53) (actual
time=0.089..0.230rows=7 loops=1)"
 
>"        Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
>"Planning Time: 1.617 ms"
>"Execution Time: 0.276 ms"
>
>SET enable_seqscan = on;
>explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <
3.0limit 100;
 
>
>"Limit  (cost=0.00..107.95 rows=100 width=53) (actual time=41021.311..313501.746 rows=7 loops=1)"
>"  ->  Seq Scan on assets  (cost=0.00..936605.40 rows=867607 width=53) (actual time=41021.309..313501.732 rows=7
loops=1)"
>"        Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
>"        Rows Removed by Filter: 2602824"
>"Planning Time: 0.283 ms"
>"Execution Time: 313501.777 ms"
>

Well, this confirms what I suspected before - the optimizer believes the
seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but
that mismatches the actual performance. The question is why ...

For LIMIT queries, I can think of two common issues - the cost estimate
is computed as a simple linear approximation in the input relation. For
example, we know the seqscan is expected to produce 867607 rows with a
total cost of 936605, so the cost of producing just 100 rows is

    100 * 936605 / 867607 = 107.95

But that assumes a number of things: (a) that the seqscan row estimate
is correct, and that (b) the matching rows are uniformly distributed in
the table. If it's misestimated, or if the rows are towards the end of
the relation (i.e. after doing a most of the costed work) this estimate
may be quite off.

Can you do explain analyze of the query without the LIMIT?

BTW a LIMIT without an ORDER BY is a bit strange. Also, maybe you could
do try using a partial index (if the where condition does not change).


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Следующее
От: Joe Conway
Дата:
Сообщение: Re: BUG #16176: NULL value returned by category_sql argument tocrosstab() causes segmentation fault