Обсуждение: performance issue with bitmap index scans on huge amounts of big jsonb documents

Поиск
Список
Период
Сортировка

performance issue with bitmap index scans on huge amounts of big jsonb documents

От
Marc-Olaf Jaschke
Дата:
Hi,

i have a performance issue with bitmap index scans on huge amounts of big jsonb documents.


===== Background =====

- table with big jsonb documents
- gin index on these documents
- queries using index conditions with low selectivity


===== Example =====

select version();
> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit

show work_mem;
> 1GB

-- setup test data
create table bitmap_scan_test as
select
i,
(select jsonb_agg(jsonb_build_object('x', i % 2, 'filler', md5(j::text))) from generate_series(0, 100) j) big_jsonb
from
generate_series(0, 100000) i;

create index on bitmap_scan_test using gin (big_jsonb);

analyze bitmap_scan_test;


--  query with bitmap scan
explain analyze
select
count(*)
from
bitmap_scan_test
where
big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';

Aggregate  (cost=272.74..272.75 rows=1 width=8) (actual time=622.272..622.272 rows=1 loops=1)
  ->  Bitmap Heap Scan on bitmap_scan_test  (cost=120.78..272.49 rows=100 width=0) (actual time=16.496..617.431
rows=50000loops=1) 
        Recheck Cond: (big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
        Heap Blocks: exact=637
        ->  Bitmap Index Scan on bitmap_scan_test_big_jsonb_idx  (cost=0.00..120.75 rows=100 width=0) (actual
time=16.371..16.371rows=50000 loops=1) 
              Index Cond: (big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
Planning time: 0.106 ms
Execution time: 622.334 ms


perf top -p... shows heavy usage of pglz_decompress:

Overhead  Shared Object     Symbol
  51,06%  postgres          [.] pglz_decompress
   7,33%  libc-2.12.so      [.] memcpy
...

===== End of example =====


I wonder why bitmap heap scan adds such a big amount of time on top of the plain bitmap index scan.
It seems to me, that the recheck is active although all blocks are exact [1] and that pg is loading the jsonb for the
recheck.

Is this an expected behavior?


Regards,
Marc-Olaf


[1] (http://dba.stackexchange.com/questions/106264/recheck-cond-line-in-query-plans-with-a-bitmap-index-scan)

Re: performance issue with bitmap index scans on huge amounts of big jsonb documents

От
Jeff Janes
Дата:

> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';


I wonder why bitmap heap scan adds such a big amount of time on top of the plain bitmap index scan.
It seems to me, that the recheck is active although all blocks are exact [1] and that pg is loading the jsonb for the recheck.

Is this an expected behavior?


Yes, this is expected.  The gin index is lossy.  It knows that all the elements are present (except when it doesn't--large elements might get hashed down and suffer hash collisions), but it doesn't know what the recursive structure between them is, and has to do a recheck.

For example, if you change your example where clause to:

big_jsonb @> '[{"filler": 1, "x": "cfcd208495d565ef66e7dff9f98764da"}]';

You will see that the index still returns 50,000 rows, but now all of them get rejected upon the recheck.

You could try changing the type of index to jsonb_path_ops.  In your given example, it won't make a difference, because you are actually counting half the table and so half the table needs to be rechecked.  But in my example, jsonb_path_ops successfully rejects all the rows at the index stage.

Cheers,

Jeff

Re: performance issue with bitmap index scans on huge amounts of big jsonb documents

От
Marc-Olaf Jaschke
Дата:
Thanks for the explanation!

Best Regards,
Marc-Olaf

Marc-Olaf Jaschke · Softwareentwickler
shopping24 GmbH

Werner-Otto-Straße 1-7 · 22179 Hamburg
Telefon: +49 (0) 40 6461 5830 · Fax: +49 (0) 40 64 61 7879
marc-olaf.jaschke@s24.com · www.s24.com
AG Hamburg HRB 63371
vertreten durch Dr. Björn Schäfers und Martin Mildner

2016-12-05 3:28 GMT+01:00 Jeff Janes <jeff.janes@gmail.com>:

> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';


I wonder why bitmap heap scan adds such a big amount of time on top of the plain bitmap index scan.
It seems to me, that the recheck is active although all blocks are exact [1] and that pg is loading the jsonb for the recheck.

Is this an expected behavior?


Yes, this is expected.  The gin index is lossy.  It knows that all the elements are present (except when it doesn't--large elements might get hashed down and suffer hash collisions), but it doesn't know what the recursive structure between them is, and has to do a recheck.

For example, if you change your example where clause to:

big_jsonb @> '[{"filler": 1, "x": "cfcd208495d565ef66e7dff9f98764da"}]';

You will see that the index still returns 50,000 rows, but now all of them get rejected upon the recheck.

You could try changing the type of index to jsonb_path_ops.  In your given example, it won't make a difference, because you are actually counting half the table and so half the table needs to be rechecked.  But in my example, jsonb_path_ops successfully rejects all the rows at the index stage.

Cheers,

Jeff

Re: [PERFORM] performance issue with bitmap index scans on hugeamounts of big jsonb documents

От
Merlin Moncure
Дата:
On Wed, Nov 30, 2016 at 6:26 AM, Marc-Olaf Jaschke
<marc-olaf.jaschke@s24.com> wrote:
> Hi,
>
> i have a performance issue with bitmap index scans on huge amounts of big jsonb documents.
>
>
> ===== Background =====
>
> - table with big jsonb documents
> - gin index on these documents
> - queries using index conditions with low selectivity
>
>
> ===== Example =====
>
> select version();
>> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit
>
> show work_mem;
>> 1GB
>
> -- setup test data
> create table bitmap_scan_test as
> select
> i,
> (select jsonb_agg(jsonb_build_object('x', i % 2, 'filler', md5(j::text))) from generate_series(0, 100) j) big_jsonb
> from
> generate_series(0, 100000) i;
>
> create index on bitmap_scan_test using gin (big_jsonb);
>
> analyze bitmap_scan_test;
>
>
> --  query with bitmap scan
> explain analyze
> select
> count(*)
> from
> bitmap_scan_test
> where
> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';
>
> Aggregate  (cost=272.74..272.75 rows=1 width=8) (actual time=622.272..622.272 rows=1 loops=1)
>   ->  Bitmap Heap Scan on bitmap_scan_test  (cost=120.78..272.49 rows=100 width=0) (actual time=16.496..617.431
rows=50000loops=1) 
>         Recheck Cond: (big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
>         Heap Blocks: exact=637
>         ->  Bitmap Index Scan on bitmap_scan_test_big_jsonb_idx  (cost=0.00..120.75 rows=100 width=0) (actual
time=16.371..16.371rows=50000 loops=1) 
>               Index Cond: (big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
> Planning time: 0.106 ms
> Execution time: 622.334 ms
>
>
> perf top -p... shows heavy usage of pglz_decompress:
>
> Overhead  Shared Object     Symbol
>   51,06%  postgres          [.] pglz_decompress
>    7,33%  libc-2.12.so      [.] memcpy

Another thing to possibly look at is configuring the column not to
compress; over half the time is spent decompressing the data.  See:
ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

Naturally this is a huge tradeoff so do some careful analysis before
making the change.

merlin