Обсуждение: Help with Query Tuning
Dear all,
I am facing a problem while creating the index to make the below query run faster. My table size is near about 1065 MB and 428467 rows.
explain analyze select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%'
OR content like '%jihad%' OR content like '%Mujahid%' OR
content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrorist%' OR
content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' or content like '%injure%');
Output:
Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual time=18564.631..18564.631 rows=1 loops=1)
-> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) (actual time=0.146..18529.371 rows=59918 loops=1)
Filter: ((content IS NOT NULL) AND (publishing_date ~~ '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 1) AND (((content)
::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ '%jihad%'::text) OR (
(content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR ((content)::text ~~ '%terrori
st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR ((content)::text
~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) OR ((content)::text
~~ '%ssb%'::text)))
Total runtime: 18564.673 ms
Index on that Table :
CREATE INDEX idx_page_id
ON page_content
USING btree
(crawled_page_id);
Index I create :
CREATE INDEX idx_page_id_content
ON page_content
USING btree
(crawled_page_id,content_language,publishing_date,isprocessable);
Index that fail to create:
CREATE INDEX idx_page_id_content1
ON page_content
USING btree
(crawled_page_id,content);
Error :-ERROR: index row requires 13240 bytes, maximum size is 8191
********** Error **********
ERROR: index row requires 13240 bytes, maximum size is 8191
SQL state: 54000
How to resolve this error
Please give any suggestion to tune the query.
Thanks & best Regards,
Adarsh Sharma
I am facing a problem while creating the index to make the below query run faster. My table size is near about 1065 MB and 428467 rows.
explain analyze select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%'
OR content like '%jihad%' OR content like '%Mujahid%' OR
content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrorist%' OR
content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' or content like '%injure%');
Output:
Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual time=18564.631..18564.631 rows=1 loops=1)
-> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) (actual time=0.146..18529.371 rows=59918 loops=1)
Filter: ((content IS NOT NULL) AND (publishing_date ~~ '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 1) AND (((content)
::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ '%jihad%'::text) OR (
(content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR ((content)::text ~~ '%terrori
st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR ((content)::text
~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) OR ((content)::text
~~ '%ssb%'::text)))
Total runtime: 18564.673 ms
Index on that Table :
CREATE INDEX idx_page_id
ON page_content
USING btree
(crawled_page_id);
Index I create :
CREATE INDEX idx_page_id_content
ON page_content
USING btree
(crawled_page_id,content_language,publishing_date,isprocessable);
Index that fail to create:
CREATE INDEX idx_page_id_content1
ON page_content
USING btree
(crawled_page_id,content);
Error :-ERROR: index row requires 13240 bytes, maximum size is 8191
********** Error **********
ERROR: index row requires 13240 bytes, maximum size is 8191
SQL state: 54000
How to resolve this error
Please give any suggestion to tune the query.
Thanks & best Regards,
Adarsh Sharma
On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote: > Dear all, > > I am facing a problem while creating the index to make the below query run > faster. My table size is near about 1065 MB and 428467 rows. > > explain analyze select count(*) from page_content where publishing_date > like '%2010%' and content_language='en' and content is not null and > isprocessable = 1 and (content like '%Militant%' > OR content like '%jihad%' OR content like '%Mujahid%' OR > content like '%fedayeen%' OR content like '%insurgent%' OR content like > '%terrorist%' OR > content like '%cadre%' OR content like '%civilians%' OR content like > '%police%' OR content like '%defence%' OR content like '%cops%' OR content > like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content > like '%kill%' or content like '%injure%'); > > *Output: > > * Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual > time=18564.631..18564.631 rows=1 loops=1) > -> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) > (actual time=0.146..18529.371 rows=59918 loops=1) > Filter: ((content IS NOT NULL) AND (publishing_date ~~ > '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = > 1) AND (((content) > ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND > (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ > '%jihad%'::text) OR ( > (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ > '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR > ((content)::text ~~ '%terrori > st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ > '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR > ((content)::text > ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR > ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) > OR ((content)::text > ~~ '%ssb%'::text))) > Total runtime: 18564.673 ms > > > *Index on that Table : > > *CREATE INDEX idx_page_id > ON page_content > USING btree > (crawled_page_id); > > *Index I create :* > CREATE INDEX idx_page_id_content > ON page_content > USING btree > (crawled_page_id,content_language,publishing_date,isprocessable); > > *Index that fail to create: > > *CREATE INDEX idx_page_id_content1 > ON page_content > USING btree > (crawled_page_id,content); > > Error :-ERROR: index row requires 13240 bytes, maximum size is 8191 > ********** Error ********** > > ERROR: index row requires 13240 bytes, maximum size is 8191 > SQL state: 54000 > > How to resolve this error > Please give any suggestion to tune the query. > > Thanks & best Regards, > > Adarsh Sharma > You should probably be looking at using full-text indexing: http://www.postgresql.org/docs/9.0/static/textsearch.html or limit the size of content for the index. Cheers, Ken
On 03/16/2011 05:13 AM, Adarsh Sharma wrote: > Dear all, > > I am facing a problem while creating the index to make the below query run faster. My table size is near about 1065 MBand 428467 rows. > > explain analyze select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and contentis not > null and isprocessable = 1 and (content like '%Militant%' > OR content like '%jihad%' OR content like '%Mujahid%' OR > content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrorist%' OR > content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR contentlike '%cops%' > OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' or content like '%injure%'); > > *Output: > > * Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual time=18564.631..18564.631 rows=1 loops=1) > -> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) (actual time=0.146..18529.371 rows=59918 loops=1) > Filter: ((content IS NOT NULL) AND (publishing_date ~~ '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable= 1) > AND (((content) > ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND (((content)::text ~~ '%Militant%'::text) OR ((content)::text > ~~ '%jihad%'::text) OR ( > (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text)OR > ((content)::text ~~ '%terrori > st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ '%civilians%'::text) OR ((content)::text ~~ > '%police%'::text) OR ((content)::text > ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR ((content)::text ~~ '%crpf%'::text) OR ((content)::text~~ > '%dsf%'::text) OR ((content)::text > ~~ '%ssb%'::text))) > Total runtime: 18564.673 ms > You should read the documentation regarding indices and pattern matching as well as fts. http://www.postgresql.org/docs/8.3/static/indexes-types.html The optimizer can also use a B-tree index for queries involving the pattern matching operators LIKE and ~ if the patternis a constant and is anchored to the beginning of the string — for example, col LIKE 'foo%' or col ~ '^foo', but not col LIKE'%bar'. However, if your server does not use the C locale you will need to create the index with a special operator class to support indexing of pattern-matching queries. See Section 11.9 below. It is also possible to use B-tree indexes for ILIKE and ~*,but only if the pattern starts with non-alphabetic characters, i.e. characters that are not affected by upper/lower case conversion. I believe that your query as written using '%pattern%' will always be forced to use sequential scans.
Thanks Marshall, would I need to change the data type of content column to tsvector and create a Gist Index on it.
Best Regards,
Adarsh
Kenneth Marshall wrote:
Best Regards,
Adarsh
Kenneth Marshall wrote:
On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote:Dear all, I am facing a problem while creating the index to make the below query run faster. My table size is near about 1065 MB and 428467 rows. explain analyze select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%' OR content like '%jihad%' OR content like '%Mujahid%' OR content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrorist%' ORcontent like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' or content like '%injure%'); *Output: * Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual time=18564.631..18564.631 rows=1 loops=1) -> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) (actual time=0.146..18529.371 rows=59918 loops=1) Filter: ((content IS NOT NULL) AND (publishing_date ~~ '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 1) AND (((content) ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ '%jihad%'::text) OR ( (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR ((content)::text ~~ '%terrori st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR ((content)::text ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) OR ((content)::text ~~ '%ssb%'::text))) Total runtime: 18564.673 ms *Index on that Table : *CREATE INDEX idx_page_idON page_contentUSING btree(crawled_page_id); *Index I create :* CREATE INDEX idx_page_id_contentON page_contentUSING btree(crawled_page_id,content_language,publishing_date,isprocessable); *Index that fail to create: *CREATE INDEX idx_page_id_content1ON page_contentUSING btree(crawled_page_id,content); Error :-ERROR: index row requires 13240 bytes, maximum size is 8191 ********** Error ********** ERROR: index row requires 13240 bytes, maximum size is 8191 SQL state: 54000 How to resolve this error Please give any suggestion to tune the query. Thanks & best Regards, Adarsh SharmaYou should probably be looking at using full-text indexing: http://www.postgresql.org/docs/9.0/static/textsearch.html or limit the size of content for the index. Cheers, Ken
Thanks, I understand it know :-
But My one doubt which isn't clear :
Original Query :-
select count(*) from page_content where (content like '%Militant%'
OR content like '%jihad%' OR content like '%Mujahid%' OR
content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrORist%' OR
content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' OR content like '%injure%');
Output :-
count
-------
57061
(1 row)
Time: 19726.555 ms
I need to tune it , use full-text searching as :
Modified Query :-
SELECT count(*) from page_content
WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' || 'jihad' || 'Militant' || 'fedayeen' || 'insurgent' || 'terrORist' || 'cadre' || 'civilians' || 'police' || 'defence' || 'cops' || 'crpf' || 'dsf' || 'ssb');
Output :-
count
-------
0
(1 row)
Time: 194685.125 ms
I try, SELECT count(*) from page_content
WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvectOR('english',content) @@ to_tsquery('english','%Mujahid%' || '%jihad%' || '%Militant%' || '%fedayeen%' || '%insurgent%' || '%terrORist%' || '%cadre%' || '%civilians%' || '%police%' || '%defence%' || '%cops%' || '%crpf%' || '%dsf%' || '%ssb%');
count
-------
0
(1 row)
Time: 194722.468 ms
I know I have to create index but index is the next step, first you have to get the correct result .
CREATE INDEX pgweb_idx ON page_content USING gin(to_tsvector('english', content));
Please guide me where I am going wrong.
Thanks & best Regards,
Adarsh Sharma
Kenneth Marshall wrote:
But My one doubt which isn't clear :
Original Query :-
select count(*) from page_content where (content like '%Militant%'
OR content like '%jihad%' OR content like '%Mujahid%' OR
content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrORist%' OR
content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' OR content like '%injure%');
Output :-
count
-------
57061
(1 row)
Time: 19726.555 ms
I need to tune it , use full-text searching as :
Modified Query :-
SELECT count(*) from page_content
WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' || 'jihad' || 'Militant' || 'fedayeen' || 'insurgent' || 'terrORist' || 'cadre' || 'civilians' || 'police' || 'defence' || 'cops' || 'crpf' || 'dsf' || 'ssb');
Output :-
count
-------
0
(1 row)
Time: 194685.125 ms
I try, SELECT count(*) from page_content
WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvectOR('english',content) @@ to_tsquery('english','%Mujahid%' || '%jihad%' || '%Militant%' || '%fedayeen%' || '%insurgent%' || '%terrORist%' || '%cadre%' || '%civilians%' || '%police%' || '%defence%' || '%cops%' || '%crpf%' || '%dsf%' || '%ssb%');
count
-------
0
(1 row)
Time: 194722.468 ms
I know I have to create index but index is the next step, first you have to get the correct result .
CREATE INDEX pgweb_idx ON page_content USING gin(to_tsvector('english', content));
Please guide me where I am going wrong.
Thanks & best Regards,
Adarsh Sharma
Kenneth Marshall wrote:
On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote:Dear all, I am facing a problem while creating the index to make the below query run faster. My table size is near about 1065 MB and 428467 rows. explain analyze select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%' OR content like '%jihad%' OR content like '%Mujahid%' OR content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrorist%' ORcontent like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' or content like '%injure%'); *Output: * Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual time=18564.631..18564.631 rows=1 loops=1) -> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) (actual time=0.146..18529.371 rows=59918 loops=1) Filter: ((content IS NOT NULL) AND (publishing_date ~~ '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 1) AND (((content) ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ '%jihad%'::text) OR ( (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR ((content)::text ~~ '%terrori st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR ((content)::text ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) OR ((content)::text ~~ '%ssb%'::text))) Total runtime: 18564.673 ms *Index on that Table : *CREATE INDEX idx_page_idON page_contentUSING btree(crawled_page_id); *Index I create :* CREATE INDEX idx_page_id_contentON page_contentUSING btree(crawled_page_id,content_language,publishing_date,isprocessable); *Index that fail to create: *CREATE INDEX idx_page_id_content1ON page_contentUSING btree(crawled_page_id,content); Error :-ERROR: index row requires 13240 bytes, maximum size is 8191 ********** Error ********** ERROR: index row requires 13240 bytes, maximum size is 8191 SQL state: 54000 How to resolve this error Please give any suggestion to tune the query. Thanks & best Regards, Adarsh SharmaYou should probably be looking at using full-text indexing: http://www.postgresql.org/docs/9.0/static/textsearch.html or limit the size of content for the index. Cheers, Ken
> *Modified Query :- > > *SELECT count(*) from page_content > WHERE publishing_date like '%2010%' and content_language='en' and > content is not null and isprocessable = 1 and > to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' || > 'jihad' || 'Militant' || 'fedayeen' || 'insurgent' || 'terrORist' || > 'cadre' || 'civilians' || 'police' || 'defence' || 'cops' || 'crpf' || > 'dsf' || 'ssb'); I guess there should be spaces between the words. This way it's just one very long word 'MujahidjihadMilitantfedayeen....' and I doubt that's what you're looking for. regards Tomas
Thanks , it works now .. :-)
Here is the output :
pdc_uima=# SELECT count(*) from page_content WHERE publishing_date like '%2010%' and
pdc_uima-# content_language='en' and content is not null and isprocessable = 1 and
pdc_uima-# to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | '
pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | '
pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || ' | '
pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || 'crpf' || ' | '
pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' );
count
--------
137193
(1 row)
Time: 195441.894 ms
But my original query is to use AND also i.e
select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%'
OR content like '%jihad%' OR content like '%Mujahid%' OR
content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrORist%' OR
content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' OR content like '%injure%');
count
-------
57061
(1 row)
Time: 19423.087 ms
Now I have to add AND condition ( AND (content like '%kill%' OR content like '%injure%') ) also.
Thanks & Regards,
Adarsh Sharma
tv@fuzzy.cz wrote:
Here is the output :
pdc_uima=# SELECT count(*) from page_content WHERE publishing_date like '%2010%' and
pdc_uima-# content_language='en' and content is not null and isprocessable = 1 and
pdc_uima-# to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | '
pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | '
pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || ' | '
pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || 'crpf' || ' | '
pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' );
count
--------
137193
(1 row)
Time: 195441.894 ms
But my original query is to use AND also i.e
select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%'
OR content like '%jihad%' OR content like '%Mujahid%' OR
content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrORist%' OR
content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' OR content like '%injure%');
count
-------
57061
(1 row)
Time: 19423.087 ms
Now I have to add AND condition ( AND (content like '%kill%' OR content like '%injure%') ) also.
Thanks & Regards,
Adarsh Sharma
tv@fuzzy.cz wrote:
tv@fuzzy.cz wrote:Yes , I think we caught the problem but it results in the below error : SELECT count(*) from page_content WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvector('english',content) @@ to_tsquery('english','Mujahid ' || 'jihad ' || 'Militant ' || 'fedayeen ' || 'insurgent ' || 'terrORist ' || 'cadre ' || 'civilians ' || 'police ' || 'defence ' || 'cops ' || 'crpf ' || 'dsf ' || 'ssb'); ERROR: syntax error in tsquery: "Mujahid jihad Militant fedayeen insurgent terrORist cadre civilians police defence cops crpf dsf ssb"The text passed to to_tsquery has to be a proper query, i.e. single tokens separated by boolean operators. In your case, you should put there '|' (which means OR) to get something like this 'Mujahid | jihad | Militant | ...' or you can use plainto_tsquery() as that accepts simple text, but it puts '&' (AND) between the tokens and I guess that's not what you want. TomasWhat to do to make it satisfies the OR condition to match any of the to_tsquery values as we got it right through like '%Mujahid' or ..... or ....You can't force the plainto_tsquery to somehow use the OR instead of AND. You need to modify the piece of code that produces the search text to put there '|' characters. So do something like this SELECT count(*) from page_content WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | ' || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen); Not sure where does this text come from, but you can do this in a higher level language, e.g. in PHP. Something like this $words = implode(' | ', explode(' ',$text)); and then pass the $words into the query. Or something like that. Tomas
On 03/18/2011 12:17 AM, Adarsh Sharma wrote: > Thanks , it works now ..:-) > > Here is the output : > > pdc_uima=# SELECT count(*) from page_content WHERE publishing_date like '%2010%' and > pdc_uima-# content_language='en' and content is not null and isprocessable = 1 and > pdc_uima-# to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | ' > pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | ' > pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || ' | ' > pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || 'crpf' || ' | ' > pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' ); > > count > -------- > 137193 > (1 row) > > Time: 195441.894 ms what is the type/content for column publishing_date? based on what you show above, I assume it's text? -- if so, whats the format of the date string?
> Thanks , it works now .. :-) > > Here is the output : > > pdc_uima=# SELECT count(*) from page_content WHERE publishing_date like > '%2010%' and > pdc_uima-# content_language='en' and content is not null and > isprocessable = 1 and > pdc_uima-# to_tsvector('english',content) @@ > to_tsquery('english','Mujahid' || ' | ' > pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | > ' > pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || > ' | ' > pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || > 'crpf' || ' | ' > pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' ); > > count > -------- > 137193 > (1 row) > > Time: 195441.894 ms > > > But my original query is to use AND also i.e Hi, just replace "AND" and "OR" (used with LIKE operator) for "&" and "|" (used with to_tsquery). So this (content like '%Militant%' OR content like '%jihad%') AND (content like '%kill%' OR content like '%injure%') becomes to_tsvector('english',content) @@ to_tsquery('english', '(Militant | jihad) & (kill | injure)') BTW it seems you somehow believe you'll get exactly the same result from those two queries (LIKE vs. tsearch) - that's false expectation. I believe the fulltext query is much better and more appropriate in this case, just don't expect the same results. regards Tomas