Обсуждение: Query tuning help

Поиск
Список
Период
Сортировка

Query tuning help

От
Dan Harris
Дата:
Sorry to bother everyone with yet another "my query isn't using an
index" problem but I am over my head on this one..  I am open to ways
of restructuring this query to perform better.

I have a table, 'ea', with 22 million rows in it.  VACUUM ANALYZE has
been just run on the table.

This is the result of:

explain analyze
select distinct
    em.incidentid,
    ea.recordtext as retdata,
    eg.long,
    eg.lat
from
    ea, em, eg
where
    em.incidentid = ea.incidentid and
    em.incidentid = eg.incidentid and
    em.entrydate >= '2005-1-1 00:00' and
    em.entrydate <= '2005-5-9 00:00'
    and ea.incidentid in (
        select
            incidentid
        from
            ea
        where
            recordtext like '%RED%'
    )

    and ea.incidentid in (
        select
            incidentid
        from
            ea
        where
            recordtext like '%CORVETTE%'
    )
    and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )
order by em.entrydate


---------------------
ANALYZE RESULTS
---------------------

  Unique  (cost=774693.72..774693.76 rows=1 width=159) (actual
time=446787.056..446787.342 rows=72 loops=1)
    ->  Sort  (cost=774693.72..774693.72 rows=1 width=159) (actual
time=446787.053..446787.075 rows=72 loops=1)
          Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat
          ->  Nested Loop  (cost=771835.10..774693.71 rows=1 width=159)
(actual time=444378.655..446786.746 rows=72 loops=1)
                ->  Nested Loop  (cost=771835.10..774688.81 rows=1
width=148) (actual time=444378.532..446768.381 rows=72 loops=1)
                      ->  Nested Loop IN Join
(cost=771835.10..774678.88 rows=2 width=81) (actual
time=444367.080..446191.864 rows=701 loops=1)
                            ->  Nested Loop  (cost=771835.10..774572.05
rows=42 width=64) (actual time=444366.859..445463.232 rows=1011
loops=1)
                                  ->  HashAggregate
(cost=771835.10..771835.10 rows=1 width=17) (actual
time=444366.702..444368.583 rows=473 loops=1)
                                        ->  Seq Scan on ea
(cost=0.00..771834.26 rows=335 width=17) (actual
time=259.746..444358.837 rows=592 loops=1)
                                              Filter:
((recordtext)::text ~~ '%CORVETTE%'::text)
                                  ->  Index Scan using ea1 on ea
(cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2
loops=473)
                                        Index Cond:
((ea.incidentid)::text = ("outer".incidentid)::text)
                                        Filter: (((recordtext)::text ~~
'%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))
                            ->  Index Scan using ea1 on ea
(cost=0.00..2733.81 rows=42 width=17) (actual time=0.703..0.703 rows=1
loops=1011)
                                  Index Cond:
(("outer".incidentid)::text = (ea.incidentid)::text)
                                  Filter: ((recordtext)::text ~~
'%RED%'::text)
                      ->  Index Scan using em_incidentid_idx on em
(cost=0.00..4.95 rows=1 width=67) (actual time=0.820..0.821 rows=0
loops=701)
                            Index Cond: (("outer".incidentid)::text =
(em.incidentid)::text)
                            Filter: ((entrydate >= '2005-01-01
00:00:00'::timestamp without time zone) AND (entrydate <= '2005-05-09
00:00:00'::timestamp without time zone))
                ->  Index Scan using eg_incidentid_idx on eg
(cost=0.00..4.89 rows=1 width=79) (actual time=0.245..0.246 rows=1
loops=72)
                      Index Cond: (("outer".incidentid)::text =
(eg.incidentid)::text)
  Total runtime: 446871.880 ms
(22 rows)


-------------------------
EXPLANATION
-------------------------
The reason for the redundant LIKE clause is that first, I only want
those "incidentid"s that contain the words 'RED' and 'CORVETTE'.  BUT,
those two words may exist across multiple records with the same
incidentid.  Then, I only want to actually work with the rows that
contain one of the words.  This query will repeat the same logic for
however many keywords are entered by the user.  I have investigated
text searching options and have not found them to be congruous with my
application.

Why is it choosing a sequential scan one part of the query when
searching for the words, yet using an index scan for another part of
it?  Is there a better way to structure the query to give it better
hints?

I'm using 8.0.1 on a 4-way Opteron with beefy RAID-10 and 12GB of RAM.

Thank you for any advice.

-Dan




Re: Query tuning help

От
Josh Berkus
Дата:
Dan,

>         and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )  

It is simply not possible to use B-tree indexes on these kind of text queries.
B-trees require you to start at the "left" side of the field, because B-trees
locate records via <> tests.  "Anywhere in the field" text search requires a
Full Text Index.

> The reason for the redundant LIKE clause is that first, I only want
> those "incidentid"s that contain the words 'RED' and 'CORVETTE'.  BUT,
> those two words may exist across multiple records with the same
> incidentid.  Then, I only want to actually work with the rows that
> contain one of the words.  This query will repeat the same logic for
> however many keywords are entered by the user.  I have investigated
> text searching options and have not found them to be congruous with my
> application.

Sounds like you either need to restructure your application, restructure your
database (so that you're not doing "anywhere in field" searches), or buy 32GB
of ram so that you can cache the whole table.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Query tuning help

От
Russell Smith
Дата:
On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
> Sorry to bother everyone with yet another "my query isn't using an
> index" problem but I am over my head on this one..  I am open to ways
> of restructuring this query to perform better.
>
> I have a table, 'ea', with 22 million rows in it.  VACUUM ANALYZE has
> been just run on the table.
>
> This is the result of:
>
> explain analyze
> select distinct
>  em.incidentid,
>  ea.recordtext as retdata,
>  eg.long,
>  eg.lat
> from
>  ea, em, eg
> where
>  em.incidentid = ea.incidentid and
>  em.incidentid = eg.incidentid and
>  em.entrydate >= '2005-1-1 00:00' and
>  em.entrydate <= '2005-5-9 00:00'
>  and ea.incidentid in (
>   select
>    incidentid
>   from
>    ea
>   where
>    recordtext like '%RED%'
>  )
>
>  and ea.incidentid in (
>   select
>    incidentid
>   from
>    ea
>   where
>    recordtext like '%CORVETTE%'
>  )
>  and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )
> order by em.entrydate
>
You cannot use an index for %CORVETTE%, or %RED%.  There is no way
for the index to know if a row had that in the middle without scanning the whole
index.  So it's much cheaper to do a sequence scan.

One possible way to make the query faster is to limit based on date, as you will only get about 700 rows.
And then don't use subselects, as they are doing full sequence scans.  I think this query does what you do
above, and I think it will be faster, but I don't know.

select distinct em.incidentid, ea.recordtext as retdata, eg.long,  eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >= '2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%' AND ea.recordtext like '%CORVETTE%')
JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like '%RED%' or recordtext like '%CORVETTE%'  );

>
> ---------------------
> ANALYZE RESULTS
> ---------------------
>
>   Unique  (cost=774693.72..774693.76 rows=1 width=159) (actual time=446787.056..446787.342 rows=72 loops=1)
>     ->  Sort  (cost=774693.72..774693.72 rows=1 width=159) (actual time=446787.053..446787.075 rows=72 loops=1)
>           Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat
>           ->  Nested Loop  (cost=771835.10..774693.71 rows=1 width=159) (actual time=444378.655..446786.746 rows=72
loops=1)
>                 ->  Nested Loop  (cost=771835.10..774688.81 rows=1 width=148) (actual time=444378.532..446768.381
rows=72loops=1) 
>                       ->  Nested Loop IN Join  (cost=771835.10..774678.88 rows=2 width=81) (actual
time=444367.080..446191.864rows=701 loops=1) 
>                             ->  Nested Loop  (cost=771835.10..774572.05 rows=42 width=64) (actual
time=444366.859..445463.232rows=1011 loops=1) 
>                                   ->  HashAggregate  (cost=771835.10..771835.10 rows=1 width=17) (actual
time=444366.702..444368.583rows=473 loops=1) 
>                                         ->  Seq Scan on ea  (cost=0.00..771834.26 rows=335 width=17) (actual
time=259.746..444358.837rows=592 loops=1) 
>                                               Filter: ((recordtext)::text ~~ '%CORVETTE%'::text)
>                                   ->  Index Scan using ea1 on ea  (cost=0.00..2736.43 rows=42 width=47) (actual
time=2.085..2.309rows=2 loops=473) 
>                                         Index Cond: ((ea.incidentid)::text = ("outer".incidentid)::text)
>                                         Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text ~~
'%CORVETTE%'::text))
>                             ->  Index Scan using ea1 on ea  (cost=0.00..2733.81 rows=42 width=17) (actual
time=0.703..0.703rows=1 loops=1011) 
>                                   Index Cond: (("outer".incidentid)::text = (ea.incidentid)::text)
>                                   Filter: ((recordtext)::text ~~ '%RED%'::text)
>                       ->  Index Scan using em_incidentid_idx on em  (cost=0.00..4.95 rows=1 width=67) (actual
time=0.820..0.821rows=0 loops=701) 
>                             Index Cond: (("outer".incidentid)::text = (em.incidentid)::text)
>                             Filter: ((entrydate >= '2005-01-01 00:00:00'::timestamp without time zone) AND (entrydate
<='2005-05-09 00:00:00'::timestamp without time zone)) 
>                 ->  Index Scan using eg_incidentid_idx on eg  (cost=0.00..4.89 rows=1 width=79) (actual
time=0.245..0.246rows=1 loops=72) 
>                       Index Cond: (("outer".incidentid)::text = (eg.incidentid)::text)
>   Total runtime: 446871.880 ms
> (22 rows)
>
>
> -------------------------
> EXPLANATION
> -------------------------
> The reason for the redundant LIKE clause is that first, I only want
> those "incidentid"s that contain the words 'RED' and 'CORVETTE'.  BUT,
> those two words may exist across multiple records with the same
> incidentid.  Then, I only want to actually work with the rows that
> contain one of the words.  This query will repeat the same logic for
> however many keywords are entered by the user.  I have investigated
> text searching options and have not found them to be congruous with my
> application.
>
> Why is it choosing a sequential scan one part of the query when
> searching for the words, yet using an index scan for another part of
> it?  Is there a better way to structure the query to give it better
> hints?
>
> I'm using 8.0.1 on a 4-way Opteron with beefy RAID-10 and 12GB of RAM.
>
> Thank you for any advice.
>
> -Dan
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>
>

Re: Query tuning help

От
Tom Lane
Дата:
Russell Smith <mr-russ@pws.com.au> writes:
> On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
>> and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )
>>
> You cannot use an index for %CORVETTE%, or %RED%.

Not a btree index anyway.  Dan might have some success here with a
full-text-indexing package (eg, contrib/tsearch2)

            regards, tom lane

Re: Query tuning help

От
Dan Harris
Дата:
On May 8, 2005, at 6:51 PM, Russell Smith wrote:

> On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
> You cannot use an index for %CORVETTE%, or %RED%.  There is no way
> for the index to know if a row had that in the middle without scanning
> the whole
> index.  So it's much cheaper to do a sequence scan.
>

While I believe you, I'm confused by this line in my original EXPLAIN
ANALYZE:

>> ->  Index Scan using ea1 on ea  (cost=0.00..2736.43 rows=42 width=47)
>> (actual time=2.085..2.309 rows=2 loops=473)
>>                                         Index Cond:
>> ((ea.incidentid)::text = ("outer".incidentid)::text)
>>                                         Filter: (((recordtext)::text
>> ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))

Doesn't that mean it was using an index to filter?  Along those lines,
before I created index 'ea1', the query was much much slower.  So, it
seemed like creating this index made a difference.

> One possible way to make the query faster is to limit based on date,
> as you will only get about 700 rows.
> And then don't use subselects, as they are doing full sequence scans.
> I think this query does what you do
> above, and I think it will be faster, but I don't know.
>

I REALLY like this idea! If I could just filter by date first and then
sequential scan through those, it should be very manageable.  Hopefully
I can keep this goal while still accommodating the requirement listed
in my next paragraph.

> select distinct em.incidentid, ea.recordtext as retdata, eg.long,
> eg.lat
> FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=
> '2005-1-1 00:00'
> AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%'
> AND ea.recordtext like '%CORVETTE%')
> JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like
> '%RED%' or recordtext like '%CORVETTE%'  );
>

I have run this, and while it is very fast, I'm concerned it's not
doing what I need.  Here's the situation:

Due to the format of the systems with which I integrate ( I have no
control over these formats ), we will get these 'recordtext' values one
line at a time, accumulating over time.  The only way I can find to
make this work is to insert a new record for each line.  The problem
is, that when someone wants to search multiple keywords, they expect
these words to be matched across multiple records with a given incident
number.

  For a very simple example:

IncidentID        Date                Recordtext
--------------        -------------
-------------------------------------------------------
11111            2005-05-01 14:21    blah blah blah RED blah blah
2222            2005-05-01 14:23    not what we are looking for
11111            2005-05-02 02:05    blah CORVETTE blah blah

So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE
'%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the
condition will only be applied to a single row of recordtext at a time,
not a whole group with the same incident number.

If I were to use tsearch2 for full-text indexing, would I need to
create another table that merges all of my recordtext rows into a
single 'text' field type?  If so, this is where I run into problems, as
my logic also needs to match multiple words in their original order.  I
may also receive additional updates to the previous data.  In that
case, I need to replace the original record with the latest version of
it.  If I have already concatenated these rows into a single field, the
logic to in-line replace only the old text that has changed is very
very difficult at best.  So, that's the reason I had to do two
subqueries in my example.  Please tell me if I misunderstood your logic
and it really will match given my condition above, but it didn't seem
like it would.

Thanks again for the quick responses!  This list has been a great
resource for me.

-Dan


Re: Query tuning help

От
Josh Berkus
Дата:
Dan,

> While I believe you, I'm confused by this line in my original EXPLAIN
>
> ANALYZE:
> >> ->  Index Scan using ea1 on ea  (cost=0.00..2736.43 rows=42 width=47)
> >> (actual time=2.085..2.309 rows=2 loops=473)
> >>                                         Index Cond:
> >> ((ea.incidentid)::text = ("outer".incidentid)::text)
> >>                                         Filter: (((recordtext)::text
> >> ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))

The index named is matching based on incidentid -- the join condition.  The
"filter" is applied against the table rows, i.e. a scan.

> If I were to use tsearch2 for full-text indexing, would I need to
> create another table that merges all of my recordtext rows into a
> single 'text' field type?

No.   Read the OpenFTS docs, they are fairly clear on how to set up a simple
FTS index. (TSearch2 ~~ OpenFTS)

> If so, this is where I run into problems, as
> my logic also needs to match multiple words in their original order.

You do that by doubling up ... that is, use the FTS index to pick all rows
that contain "RED" and "CORVETTE", and then check the order.  I'll also note
that your current query is not checking word order.

Example:
WHERE recordtext_fti @@ to_tsquery ('default', 'RED && CORVETTE')
    AND recordtext LIKE '%RED%CORVETTE%'

I'm doing something fairly similar on one of my projects and it works very
well.

The limitations on TSearch2 indexes are:
1) they are expensive to update, so your data loads would be noticably slower.
2) they are only fast when cached in RAM (and when cached, are *very* fast).
So if you have a variety of other processes that tend to fill up RAM between
searches, you may find them less useful.
3) You have to create a materialized index column next to recordtext, which
will increase the size of the table.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Query tuning help

От
Tom Lane
Дата:
Dan Harris <fbsd@drivefaster.net> writes:
>> ->  Index Scan using ea1 on ea  (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 loops=473)
>>       Index Cond: ((ea.incidentid)::text = ("outer".incidentid)::text)
>>       Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))

> Doesn't that mean it was using an index to filter?

No.  The "Index Cond" shows it is using the index only for the join
condition.  A "Filter" is an additional filter condition that happens to
get applied at this plan node --- but it'll be applied to every row the
index finds for the index condition.

            regards, tom lane

Re: Query tuning help

От
Dan Harris
Дата:
On May 8, 2005, at 8:06 PM, Josh Berkus wrote:
>
>> If I were to use tsearch2 for full-text indexing, would I need to
>> create another table that merges all of my recordtext rows into a
>> single 'text' field type?
>
> No.   Read the OpenFTS docs, they are fairly clear on how to set up a
> simple
> FTS index. (TSearch2 ~~ OpenFTS)
>
>> If so, this is where I run into problems, as
>> my logic also needs to match multiple words in their original order.

I have been reading the Tsearch2 docs and either I don't understand
something or I'm not communicating my situation clearly enough.  It
seems that Tsearch2 has a concept of "document".  And, in everything I
am reading, they expect your "document" to be all contained in a single
row.  Since my words can be spread across multiple rows, I don't see
that Tsearch2 will combine all 'recordtext' row values with the same
"incidentid" into a single vector.  Am I overlooking something in the
docs?

>
> I'm doing something fairly similar on one of my projects and it works
> very
> well.
>

I'd be curious what similarities they have?  Is it the searching across
multiple rows or the order of words?

> The limitations on TSearch2 indexes are:
> 1) they are expensive to update, so your data loads would be noticably
> slower.
> 2) they are only fast when cached in RAM (and when cached, are *very*
> fast).
> So if you have a variety of other processes that tend to fill up RAM
> between
> searches, you may find them less useful.
> 3) You have to create a materialized index column next to recordtext,
> which
> will increase the size of the table.

Duly noted.  If this method can search across rows, I'm willing to
accept this overhead for the speed it would add.

In the meantime, is there any way I can reach my goal without Tsearch2
by just restructuring my query to narrow down the results by date
first, then seq scan for the 'likes'?

-Dan


Re: Query tuning help

От
Russell Smith
Дата:
On Mon, 9 May 2005 11:49 am, Dan Harris wrote:
>
> On May 8, 2005, at 6:51 PM, Russell Smith wrote:
>
[snip]
> > select distinct em.incidentid, ea.recordtext as retdata, eg.long,
> > eg.lat
> > FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=
> > '2005-1-1 00:00'
> > AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%'
> > AND ea.recordtext like '%CORVETTE%')
> > JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like
> > '%RED%' or recordtext like '%CORVETTE%'  );
> >
>
> I have run this, and while it is very fast, I'm concerned it's not
> doing what I need.
How fast is very fast?


> Here's the situation:
>
> Due to the format of the systems with which I integrate ( I have no
> control over these formats ), we will get these 'recordtext' values one
> line at a time, accumulating over time.  The only way I can find to
> make this work is to insert a new record for each line.  The problem
> is, that when someone wants to search multiple keywords, they expect
> these words to be matched across multiple records with a given incident
> number.
>
>   For a very simple example:
>
> IncidentID  Date    Recordtext
> --------------  -------------
> -------------------------------------------------------
> 11111   2005-05-01 14:21 blah blah blah RED blah blah
> 2222   2005-05-01 14:23 not what we are looking for
> 11111   2005-05-02 02:05 blah CORVETTE blah blah
>
> So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE
> '%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the
> condition will only be applied to a single row of recordtext at a time,
> not a whole group with the same incident number.
>
> If I were to use tsearch2 for full-text indexing, would I need to
> create another table that merges all of my recordtext rows into a
> single 'text' field type?  If so, this is where I run into problems, as
> my logic also needs to match multiple words in their original order.  I
> may also receive additional updates to the previous data.  In that
> case, I need to replace the original record with the latest version of
> it.  If I have already concatenated these rows into a single field, the
> logic to in-line replace only the old text that has changed is very
> very difficult at best.  So, that's the reason I had to do two
> subqueries in my example.  Please tell me if I misunderstood your logic
> and it really will match given my condition above, but it didn't seem
> like it would.
>
> Thanks again for the quick responses!  This list has been a great
> resource for me.
>
select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=  '2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND (ea.recordtext like '%RED%'  OR ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid WHERE
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=  '2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid)  AND
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=  '2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%'))
JOIN eg ON em.incidentid = eg.incidentid)

This may be more accurate.  However I would cool it VERY NASTY. Josh's solutions may be better.
However much of the data should be in memory once the subplans are done, so it may be quite fast.
you may
> >

> -Dan
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>
>

Re: Query tuning help

От
Dan Harris
Дата:
On May 8, 2005, at 8:32 PM, Russell Smith wrote:
>> I have run this, and while it is very fast, I'm concerned it's not
>> doing what I need.
> How fast is very fast?
>

It took 35 seconds to complete versus ~450 my old way.

>
> select distinct em.incidentid, ea.recordtext as retdata, eg.long,
> eg.lat
> FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=
> '2005-1-1 00:00'
> AND em.entrydate <= '2005-5-9 00:00' AND (ea.recordtext like '%RED%'
> OR ea.recordtext like '%CORVETTE%'))
> JOIN eg ON em.incidentid = eg.incidentid WHERE
> em.incidentid IN
> (select distinct em.incidentid, ea.recordtext as retdata, eg.long,
> eg.lat
> FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=
> '2005-1-1 00:00'
> AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like
> '%CORVETTE%'))
> JOIN eg ON em.incidentid = eg.incidentid)  AND
> em.incidentid IN
> (select distinct em.incidentid, ea.recordtext as retdata, eg.long,
> eg.lat
> FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=
> '2005-1-1 00:00'
> AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%'))
> JOIN eg ON em.incidentid = eg.incidentid)
>

Yes, it is nasty, but so was my previous query :)  So long as this is
faster, I'm ok with that.  I'll see if i can make this work.  Thank you
very much.

-Dan


Re: Query tuning help

От
Klint Gore
Дата:
On Sun, 8 May 2005 20:31:38 -0600, Dan Harris <fbsd@drivefaster.net> wrote:
> Duly noted.  If this method can search across rows, I'm willing to
> accept this overhead for the speed it would add.

You could use intersect to search across rows.  Using tsearch2 will look
up the RED and CORVETTE using the index and intersect will pull out the
commmon rows.

> In the meantime, is there any way I can reach my goal without Tsearch2
> by just restructuring my query to narrow down the results by date
> first, then seq scan for the 'likes'?


select distinct
    em.incidentid,
    ea.recordtext as retdata,
    eg.long,
    eg.lat
>from
    ea, em, eg,
    (
        select
            ea.incidentid
        from
            ea, em
        where
            em.incidentid = ea.incidentid and
            em.entrydate >= '2005-1-1 00:00' and
            em.entrydate <= '2005-5-9 00:00' and
            recordtext like '%RED%'

        intersect

        select
            ea.incidentid
        from
            ea, em
        where
            em.incidentid = ea.incidentid and
            em.entrydate >= '2005-1-1 00:00' and
            em.entrydate <= '2005-5-9 00:00' and
            recordtext like '%CORVETTE%'
    ) as iid
where
    em.incidentid = ea.incidentid and
    em.incidentid = eg.incidentid and
    em.entrydate >= '2005-1-1 00:00' and
    em.entrydate <= '2005-5-9 00:00'
    and ea.incidentid = iid.incidentid
    and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )
order by em.entrydate

klint.

+---------------------------------------+-----------------+
: Klint Gore                            : "Non rhyming    :
: EMail   : kg@kgb.une.edu.au           :  slang - the    :
: Snail   : A.B.R.I.                    :  possibilities  :
: Mail      University of New England   :  are useless"   :
:           Armidale NSW 2351 Australia :     L.J.J.      :
: Fax     : +61 2 6772 5376             :                 :
+---------------------------------------+-----------------+

Re: Query tuning help

От
Harald Fuchs
Дата:
In article <7fc67646a961f5ebef90def7aeb95fd0@drivefaster.net>,
Dan Harris <fbsd@drivefaster.net> writes:

> On May 8, 2005, at 8:06 PM, Josh Berkus wrote:
>>
>>> If I were to use tsearch2 for full-text indexing, would I need to
>>> create another table that merges all of my recordtext rows into a
>>> single 'text' field type?
>>
>> No.   Read the OpenFTS docs, they are fairly clear on how to set up
>> a simple
>> FTS index. (TSearch2 ~~ OpenFTS)
>>
>>> If so, this is where I run into problems, as
>>> my logic also needs to match multiple words in their original order.

> I have been reading the Tsearch2 docs and either I don't understand
> something or I'm not communicating my situation clearly enough.  It
> seems that Tsearch2 has a concept of "document".  And, in everything I
> am reading, they expect your "document" to be all contained in a
> single row.  Since my words can be spread across multiple rows, I
> don't see that Tsearch2 will combine all 'recordtext' row values with
> the same "incidentid" into a single vector.  Am I overlooking
> something in the docs?

AFAICS no, but you could create a separate table containing just the
distinct incidentids and the tsearch2 vectors of all recordtexts
matching that incidentid.  This table would get updated solely by
triggers on the original table and would provide a fast way to get all
incidentids for RED and CORVETTE.  The question is: would this reduce
the number of rows to check more than filtering on date?

Re: Query tuning help

От
Mischa Sandberg
Дата:
Quoting Russell Smith <mr-russ@pws.com.au>:

> On Mon, 9 May 2005 11:49 am, Dan Harris wrote:
> > On May 8, 2005, at 6:51 PM, Russell Smith wrote:
> [snip]
> > select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
> > FROM em
> > JOIN ea ON em.incidentid = ea.incidentid  --- slight paraphrase /Mischa.
> >         AND em.entrydate between '2005-1-1' and '2005-5-9'
> >         AND ea.recordtext like '%RED%'  AND ea.recordtext like
'%CORVETTE%'

> > Here's the situation:
> > Due to the format of the systems with which I integrate ( I have no
> > control over these formats ), we will get these 'recordtext' values one
> > line at a time, accumulating over time.  The only way I can find to
> > make this work is to insert a new record for each line.  The problem
> > is, that when someone wants to search multiple keywords, they expect
> > these words to be matched across multiple records with a given incident
> > number.
> >
> >   For a very simple example:
> >
> > IncidentID  Date    Recordtext
> > --------------  -------------
> > 11111   2005-05-01 14:21 blah blah blah RED blah blah
> > 2222   2005-05-01 14:23 not what we are looking for
> > 11111   2005-05-02 02:05 blah CORVETTE blah blah
> >

select em.incidentid, ea.recordtest as retdata
from    em
join ( -- equivalent to "where incidentid in (...)", sometimes faster.
      select incidentid
      from  em join  ea using (incidentid)
      where em.entrydate between '2005-1-1' and '2005-5-9'
      group by incidentid
      having 1 = min(case when recordtest like '%RED%' then 1 end)
         and 1 = min(case when recordtest like '%CORVETTE%' then 1 end)
     ) as X using (incidentid);



Re: Query tuning help

От
Ulrich Wisser
Дата:
Hi Dan,

I tried to understand your query, but I couldn't get my understanding of
the query and your description in sync.

Why do you use sub selects? Wouldn't a simple "recordtext like '%RED%'"
do the trick too?

You combine all your where conditions with and. To me this looks like
you get only rows with RED and CORVETTE.

 From your description I would rewrite the query as

explain analyze
select distinct
     em.incidentid,
     ea.recordtext as retdata,
     eg.long,
     eg.lat
from
     ea join em using(incidentid) join eg using(incidentid)
where
         em.entrydate >= '2005-1-1 00:00'::date
     and em.entrydate <= '2005-5-9 00:00'::date
     and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )
order by em.entrydate


That should give you all rows containing one of the words.
Does it work?
Is is faster? Is it fast enough?

Ulrich