Обсуждение: indexing elements of a csv ?

Поиск
Список
Период
Сортировка

indexing elements of a csv ?

От
"Gauthier, Dave"
Дата:

Hi:

 

v9.0.1 on linux.

 

I have a table with a column that is a csv.  Users will select records based upon the existence of an element of the csv.  There is an index on that column but I'm thinking that it won't be of much use in this situation.  Is there a way to facilitate these queries?

 

Example:

 

create table foo (col0 text, col1 text);

create index foo_col1 on foo (col1);

insert into foo (col0,col1) values ('moe','aa,bbb,c'),('larry','xxxxx,bbb,yyy'),('curly','m,nnnn,oo');

 

now...

select col0 from foo where <the csv element 'bbb' exists as a csv element of col1>

 

Some attempts, which get the right answers, but which probably won't be very efficient...

 

select col0 from foo where string_to_array('bbb','') <@ string_to_array(col1);

select col0 from foo where ','||col1||','  like  '%,bbb,%';

select col0 from foo where ((col1 like 'bbb,%') or (col1 like '%,bbb,%') or (col1 like '%,bbb'));

 

Long shot, but I thought I'd ask anyway.

Thanks in Advance !

Re: indexing elements of a csv ?

От
Ian Lawrence Barwick
Дата:
2013/3/12 Gauthier, Dave <dave.gauthier@intel.com>:
> Hi:
>
> v9.0.1 on linux.
>
> I have a table with a column that is a csv.  Users will select records based
> upon the existence of an element of the csv.  There is an index on that
> column but I'm thinking that it won't be of much use in this situation.  Is
> there a way to facilitate these queries?
>
> Example:
>
> create table foo (col0 text, col1 text);
>
> create index foo_col1 on foo (col1);
>
> insert into foo (col0,col1) values
> ('moe','aa,bbb,c'),('larry','xxxxx,bbb,yyy'),('curly','m,nnnn,oo');
>
> now...
>
> select col0 from foo where <the csv element 'bbb' exists as a csv element of
> col1>
>
>
> Some attempts, which get the right answers, but which probably won't be very
> efficient...
>
> select col0 from foo where string_to_array('bbb','') <@
> string_to_array(col1);
>
> select col0 from foo where ','||col1||','  like  '%,bbb,%';
>
> select col0 from foo where ((col1 like 'bbb,%') or (col1 like '%,bbb,%') or
> (col1 like '%,bbb'));
>
> Long shot, but I thought I'd ask anyway.

A GIN index might do the trick:

CREATE INDEX ix_col1_ix ON foo USING GIN(string_to_array(col1,','));

(This is assuming the CSV values can be cleanly converted to
an array using "string_to_array()").

You could then query it with:
SELECT col0 FROM foo WHERE string_to_array(col1,',') @>  '{bbb}'::text[];

HTH

Ian Barwick


Re: indexing elements of a csv ?

От
Steve Erickson
Дата:
An option would be to create a column of type tsvector.  That way you could do text searches using partial words or words and get results including those containing forms of the word.

From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Gauthier, Dave [dave.gauthier@intel.com]
Sent: Tuesday, March 12, 2013 8:50 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] indexing elements of a csv ?

Hi:

 

v9.0.1 on linux.

 

I have a table with a column that is a csv.  Users will select records based upon the existence of an element of the csv.  There is an index on that column but I'm thinking that it won't be of much use in this situation.  Is there a way to facilitate these queries?

 

Example:

 

create table foo (col0 text, col1 text);

create index foo_col1 on foo (col1);

insert into foo (col0,col1) values ('moe','aa,bbb,c'),('larry','xxxxx,bbb,yyy'),('curly','m,nnnn,oo');

 

now...

select col0 from foo where <the csv element 'bbb' exists as a csv element of col1>

 

Some attempts, which get the right answers, but which probably won't be very efficient...

 

select col0 from foo where string_to_array('bbb','') <@ string_to_array(col1);

select col0 from foo where ','||col1||','  like  '%,bbb,%';

select col0 from foo where ((col1 like 'bbb,%') or (col1 like '%,bbb,%') or (col1 like '%,bbb'));

 

Long shot, but I thought I'd ask anyway.

Thanks in Advance !

Re: indexing elements of a csv ?

От
Ian Lawrence Barwick
Дата:
2013/3/13 Ian Lawrence Barwick <barwick@gmail.com>:
> 2013/3/12 Gauthier, Dave <dave.gauthier@intel.com>:
>> Hi:
>>
>> v9.0.1 on linux.
>>
>> I have a table with a column that is a csv.  Users will select records based
>> upon the existence of an element of the csv.  There is an index on that
>> column but I'm thinking that it won't be of much use in this situation.  Is
>> there a way to facilitate these queries?
>>
>> Example:
>>
>> create table foo (col0 text, col1 text);
>>
>> create index foo_col1 on foo (col1);
>>
>> insert into foo (col0,col1) values
>> ('moe','aa,bbb,c'),('larry','xxxxx,bbb,yyy'),('curly','m,nnnn,oo');
>>
>> now...
>>
>> select col0 from foo where <the csv element 'bbb' exists as a csv element of
>> col1>
>>
>>
>> Some attempts, which get the right answers, but which probably won't be very
>> efficient...
>>
>> select col0 from foo where string_to_array('bbb','') <@
>> string_to_array(col1);
>>
>> select col0 from foo where ','||col1||','  like  '%,bbb,%';
>>
>> select col0 from foo where ((col1 like 'bbb,%') or (col1 like '%,bbb,%') or
>> (col1 like '%,bbb'));
>>
>> Long shot, but I thought I'd ask anyway.
>
> A GIN index might do the trick:
>
> CREATE INDEX ix_col1_ix ON foo USING GIN(string_to_array(col1,','));
>
> (This is assuming the CSV values can be cleanly converted to
> an array using "string_to_array()").
>
> You could then query it with:
> SELECT col0 FROM foo WHERE string_to_array(col1,',') @>  '{bbb}'::text[];

Just out of interest, I populated the table with around 1,000,000 rows of
randomly generated data (three items of random upper case characters
in col1), results with and without index below (using an untuned 9.2
installation
on a laptop with a slow hard drive).
Note that adding the index doubled the total table size, which might
be something
to watch out for if the table is very big and you have a lot of unique
values in the
"CSV" column.

Regards

Ian Barwick


testdb=# SELECT * from foo where string_to_array(col1,',') @>  '{PKRY}'::text[];
  col0  |       col1
--------+-------------------
 ARWC   | JIJ,MBDVU,PKRY
 FUNWOA | JKEK,PKRY,MQFUQTJ
 PJGTD  | KSO,HSTB,PKRY
(3 rows)

Time: 1325.536 ms
testdb=# explain SELECT * from foo where string_to_array(col1,',') @>
'{PKRY}'::text[];
                            QUERY PLAN
------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..28400.42 rows=5021 width=76)
   Filter: (string_to_array(col1, ','::text) @> '{PKRY}'::text[])
(2 rows)

testdb=# CREATE INDEX ix_col1_ix ON foo using gin(string_to_array(col1,','));
CREATE INDEX
Time: 170533.158 ms
testdb=# ANALYZE foo;
ANALYZE
Time: 1431.665 ms
testdb=# SELECT * from foo where string_to_array(col1,',') @>  '{PKRY}'::text[];
  col0  |       col1
--------+-------------------
 ARWC   | JIJ,MBDVU,PKRY
 FUNWOA | JKEK,PKRY,MQFUQTJ
 PJGTD  | KSO,HSTB,PKRY
(3 rows)

Time: 0.906 ms
testdb=# explain SELECT * from foo where string_to_array(col1,',') @>
'{PKRY}'::text[];
                                 QUERY PLAN
----------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=20.79..389.58 rows=101 width=24)
   Recheck Cond: (string_to_array(col1, ','::text) @> '{PKRY}'::text[])
   ->  Bitmap Index Scan on ix_col1_ix  (cost=0.00..20.76 rows=101 width=0)
         Index Cond: (string_to_array(col1, ','::text) @> '{PKRY}'::text[])
(4 rows)

Time: 0.377 ms