[GENERAL] jsonb case insensitive search

Поиск
Список
Период
Сортировка
От armand pirvu
Тема [GENERAL] jsonb case insensitive search
Дата
Msg-id 287C7A09-A660-49AB-ADA8-866121D5156C@gmail.com
обсуждение исходный текст
Ответы Re: [GENERAL] jsonb case insensitive search  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-general


Hi

The goal would be to be able to search case insensitive by any key:value combined with some other columns like outlined below, but initially would be Company:CompuTestSystems

A sample would be 

SELECT * 
FROM cfg_files_data 
WHERE cfg_files_data.show_id = 32 
AND cfg_files_data.file_id = 123
AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;


- Table definition
                                           Table "csischema.cfg_files_data"
       Column       |           Type           |                               Modifiers                               
--------------------+--------------------------+-----------------------------------------------------------------------
 file_data_id       | bigint                   | not null default nextval('cfg_files_data_file_data_id_seq'::regclass)
 file_id            | bigint                   | not null
 show_id            | bigint                   | not null
 file_data_record   | jsonb                    | not null
 additional_info    | jsonb                    | 
 file_data_add_by   | character varying(100)   | not null
 file_data_add_date | timestamp with time zone | not null default now()
Indexes:
    "cfg_files_data_pkey" PRIMARY KEY, btree (file_data_id, show_id)


- show_id, file_id data distribution

select show_id, file_id, count(*) from cfg_files_data group by show_id, file_id order by 1;
 show_id | file_id | count 
---------+---------+-------
       4 |       9 |     3
       4 |      68 |    22
       4 |       2 |     6
       6 |       3 |    13
       6 |       5 |   215
      13 |      13 |    13
      13 |       8 |    22
      21 |      11 |    13
      21 |      10 |    22
      26 |      12 |    13
      30 |      16 |     6
      32 |     123 | 53731
      35 |     125 |     5
      40 |     122 | 53731
      46 |      69 |     4
      46 |      64 |     4
      46 |      67 |     4
      46 |      70 |     4

For the example mentioned

SELECT * 
FROM cfg_files_data 
WHERE cfg_files_data.show_id = 32 
AND cfg_files_data.file_id = 123
AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;


create index cfg_files_data_record_idx on cfg_files_data (show_id, file_id, lower(file_data_record::text));

SELECT * 
FROM cfg_files_data 
WHERE cfg_files_data.show_id = 35 
AND cfg_files_data.file_id = 125
AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;


                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using cfg_files_data_record_idx on cfg_files_data  (cost=0.54..8.57 rows=1 width=359) (actual time=0.287..0.305 rows=5 loops=1)
   Index Cond: ((show_id = 35) AND (file_id = 125))
   Filter: (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text)
 Planning time: 0.271 ms
 Execution time: 0.370 ms
(5 rows)


SELECT * 
FROM cfg_files_data 
WHERE cfg_files_data.show_id = 32 
AND cfg_files_data.file_id = 123
AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;

                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on cfg_files_data  (cost=0.00..7276.20 rows=134 width=359) (actual time=194.817..194.848 rows=2 loops=1)
   Filter: ((show_id = 32) AND (file_id = 123) AND (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text))
   Rows Removed by Filter: 107829
 Planning time: 1.006 ms
 Execution time: 194.905 ms

Shall I understand that prior to apply the lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text) filtering, it goes by show_id and field_id and for 32,123 since it is way above 5% it gets into a seq scan ? Although the end result is just 2 rows


SET enable_seqscan = OFF;
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on cfg_files_data  (cost=2485.18..8544.40 rows=134 width=359) (actual time=173.314..173.317 rows=2 loops=1)
   Recheck Cond: (show_id = 32)
   Filter: ((file_id = 123) AND (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text))
   Rows Removed by Filter: 53729
   Heap Blocks: exact=2437
   ->  Bitmap Index Scan on cfg_files_data_pkey  (cost=0.00..2485.15 rows=53743 width=0) (actual time=12.195..12.195 rows=53731 loops=1)
         Index Cond: (show_id = 32)
 Planning time: 0.232 ms
 Execution time: 173.392 ms
(9 rows)

And that got me to look at 

SELECT * 
FROM cfg_files_data 
WHERE 
lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;


drop index cfg_files_data_record_idx ;
create index cfg_files_data_record_idx on cfg_files_data (lower(file_data_record::text));
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Seq Scan on cfg_files_data  (cost=0.00..6737.04 rows=539 width=359) (actual time=194.382..194.468 rows=8 loops=1)
   Filter: (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text)
   Rows Removed by Filter: 107823
 Planning time: 0.421 ms
 Execution time: 194.539 ms


Well the end result is 8 rows from a total of 100k+

Not sure why the index is ignored

But is is possible to go for a broader search, aka being able to search by any key:value , efficient and case insensitive ? What am I missing in this picture ?



Thanks for help

Armand



В списке pgsql-general по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: [GENERAL] Ora2Pg-Database migration report
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: [GENERAL] jsonb case insensitive search