Обсуждение: poor execution plan because column dependence

Поиск
Список
Период
Сортировка

poor execution plan because column dependence

От
Václav Ovsík
Дата:
Hi,
I have done migration of the Request Tracker 3.8.9
(http://requesttracker.wikia.com/wiki/HomePage) from Mysql to
PostgreSQL in testing environment.
The RT schema used can be viewed at
https://github.com/bestpractical/rt/blob/3.8-trunk/etc/schema.Pg.
I have added full text search on table Attachments based on trigrams
(and still experimenting with it), but is is not interesting for the
problem (the problem is not caused by it directly).
The full text search alone works quite good. A user testing a new RT instance
reported a poor performance problem with a bit more complex query (more
conditions resulting in table joins).
Queries are constructed by module DBIx::SearchBuilder.
The problematic query logged:

rt=# EXPLAIN ANALYZE SELECT DISTINCT  main.* FROM Tickets main JOIN Transactions Transactions_1  ON (
Transactions_1.ObjectId= main.id ) JOIN Attachments Attachments_2  ON ( Attachments_2.TransactionId = Transactions_1.id
) WHERE (Transactions_1.ObjectType = 'RT::Ticket') AND (main.Status != 'deleted') AND (main.Status = 'resolved' AND
main.LastUpdated> '2008-12-31 23:00:00' AND main.Created > '2005-12-31 23:00:00' AND main.Queue = '15' AND  (
Attachments_2.trigrams@@ text_to_trgm_tsquery('uir') AND Attachments_2.Content ILIKE '%uir%' ) ) AND (main.Type =
'ticket')AND (main.EffectiveId = main.id)  ORDER BY main.id ASC; 

                                                  QUERY PLAN
                                                                                                                

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=23928.60..23928.67 rows=1 width=162) (actual time=5201.139..5207.965 rows=649 loops=1)
   ->  Sort  (cost=23928.60..23928.61 rows=1 width=162) (actual time=5201.137..5201.983 rows=5280 loops=1)
         Sort Key: main.effectiveid, main.issuestatement, main.resolution, main.owner, main.subject,
main.initialpriority,main.finalpriority, main.priority, main.timeestimated, main.timeworked, main.timeleft, main.told,
main.starts,main.started, main.due, main.resolved, main.lastupdatedby, main.lastupdated, main.creator, main.created,
main.disabled
         Sort Method:  quicksort  Memory: 1598kB
         ->  Nested Loop  (cost=0.00..23928.59 rows=1 width=162) (actual time=10.060..5120.834 rows=5280 loops=1)
               ->  Nested Loop  (cost=0.00..10222.38 rows=1734 width=166) (actual time=8.702..1328.970 rows=417711
loops=1)
                     ->  Seq Scan on tickets main  (cost=0.00..5687.88 rows=85 width=162) (actual time=8.258..94.012
rows=25410loops=1) 
                           Filter: (((status)::text <> 'deleted'::text) AND (lastupdated > '2008-12-31
23:00:00'::timestampwithout time zone) AND (created > '2005-12-31 23:00:00'::timestamp without time zone) AND
(effectiveid= id) AND (queue = 15) AND ((type)::text = 'ticket'::text) AND ((status)::text = 'resolved'::text)) 
                     ->  Index Scan using transactions1 on transactions transactions_1  (cost=0.00..53.01 rows=27
width=8)(actual time=0.030..0.039 rows=16 loops=25410) 
                           Index Cond: (((transactions_1.objecttype)::text = 'RT::Ticket'::text) AND
(transactions_1.objectid= main.effectiveid)) 
               ->  Index Scan using attachments2 on attachments attachments_2  (cost=0.00..7.89 rows=1 width=4) (actual
time=0.008..0.009rows=0 loops=417711) 
                     Index Cond: (attachments_2.transactionid = transactions_1.id)
                     Filter: ((attachments_2.trigrams @@ '''uir'''::tsquery) AND (attachments_2.content ~~*
'%uir%'::text))
 Total runtime: 5208.149 ms
(14 rows)

The above times are for already cached data (repeated query).
I think the execution plan is poor. Better would be to filter table attachments
at first and then join the rest. The reason is a bad estimate on number of rows
returned from table tickets (85 estimated -> 25410 in the reality).
Eliminating sub-condition...


rt=# explain analyze select * from tickets where effectiveid = id;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on tickets  (cost=0.00..4097.40 rows=530 width=162) (actual time=0.019..38.130 rows=101869 loops=1)
   Filter: (effectiveid = id)
 Total runtime: 54.318 ms
(3 rows)

Estimated 530 rows, but reality is 101869 rows.

The problem is the strong dependance between id and effectiveid. The RT
documentation says:

    EffectiveId:
    By default, a ticket's EffectiveId is the same as its ID. RT supports the
    ability to merge tickets together. When you merge a ticket into
    another one, RT sets the first ticket's EffectiveId to the second
    ticket's ID. RT uses this data to quickly look up which ticket
    you're really talking about when you reference a merged ticket.


I googled the page http://wiki.postgresql.org/wiki/Cross_Columns_Stats

Maybe I identified the already documented problem. What I can do with this
situation? Some workaround?

Thanks in advance for any suggestions.
Best Regards
--
Zito

Re: poor execution plan because column dependence

От
Bob Lunney
Дата:
Zito,

Using psql log in as the database owner and run "analyze verbose".  Happiness will ensue.

Also, when requesting help with a query its important to state the database version ("select version();") and what, if
any,configuration changes you have made in postgresql.conf.   Listing ony the ones that have changed is sufficient. 

Finally, the wiki has some good information on the care and feeding of a PostgreSQL database:

http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT



Bob Lunney

--- On Tue, 4/12/11, Václav Ovsík <vaclav.ovsik@i.cz> wrote:

> From: Václav Ovsík <vaclav.ovsik@i.cz>
> Subject: [PERFORM] poor execution plan because column dependence
> To: pgsql-performance@postgresql.org
> Date: Tuesday, April 12, 2011, 7:23 PM
> Hi,
> I have done migration of the Request Tracker 3.8.9
> (http://requesttracker.wikia.com/wiki/HomePage) from
> Mysql to
> PostgreSQL in testing environment.
> The RT schema used can be viewed at
> https://github.com/bestpractical/rt/blob/3.8-trunk/etc/schema.Pg.
> I have added full text search on table Attachments based on
> trigrams
> (and still experimenting with it), but is is not
> interesting for the
> problem (the problem is not caused by it directly).
> The full text search alone works quite good. A user testing
> a new RT instance
> reported a poor performance problem with a bit more complex
> query (more
> conditions resulting in table joins).
> Queries are constructed by module DBIx::SearchBuilder.
> The problematic query logged:
>
> rt=# EXPLAIN ANALYZE SELECT DISTINCT  main.* FROM
> Tickets main JOIN Transactions Transactions_1  ON (
> Transactions_1.ObjectId = main.id ) JOIN Attachments
> Attachments_2  ON ( Attachments_2.TransactionId =
> Transactions_1.id )  WHERE (Transactions_1.ObjectType =
> 'RT::Ticket') AND (main.Status != 'deleted') AND
> (main.Status = 'resolved' AND main.LastUpdated >
> '2008-12-31 23:00:00' AND main.Created > '2005-12-31
> 23:00:00' AND main.Queue = '15' AND  (
> Attachments_2.trigrams @@ text_to_trgm_tsquery('uir') AND
> Attachments_2.Content ILIKE '%uir%' ) ) AND (main.Type =
> 'ticket') AND (main.EffectiveId = main.id)  ORDER BY
> main.id ASC;
>                
>                
>                
>                
>                
>                
>                
>                
>                
>                
>            QUERY
> PLAN               
>                
>                
>                
>                
>                
>                
>                
>                
>                
>            
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Unique  (cost=23928.60..23928.67 rows=1 width=162)
> (actual time=5201.139..5207.965 rows=649 loops=1)
>    ->  Sort 
> (cost=23928.60..23928.61 rows=1 width=162) (actual
> time=5201.137..5201.983 rows=5280 loops=1)
>          Sort Key:
> main.effectiveid, main.issuestatement, main.resolution,
> main.owner, main.subject, main.initialpriority,
> main.finalpriority, main.priority, main.timeestimated,
> main.timeworked, main.timeleft, main.told, main.starts,
> main.started, main.due, main.resolved, main.lastupdatedby,
> main.lastupdated, main.creator, main.created, main.disabled
>          Sort Method: 
> quicksort  Memory: 1598kB
>          ->  Nested
> Loop  (cost=0.00..23928.59 rows=1 width=162) (actual
> time=10.060..5120.834 rows=5280 loops=1)
>            
>    ->  Nested Loop 
> (cost=0.00..10222.38 rows=1734 width=166) (actual
> time=8.702..1328.970 rows=417711 loops=1)
>                
>      ->  Seq Scan on tickets
> main  (cost=0.00..5687.88 rows=85 width=162) (actual
> time=8.258..94.012 rows=25410 loops=1)
>                
>            Filter:
> (((status)::text <> 'deleted'::text) AND (lastupdated
> > '2008-12-31 23:00:00'::timestamp without time zone) AND
> (created > '2005-12-31 23:00:00'::timestamp without time
> zone) AND (effectiveid = id) AND (queue = 15) AND
> ((type)::text = 'ticket'::text) AND ((status)::text =
> 'resolved'::text))
>                
>      ->  Index Scan using
> transactions1 on transactions transactions_1 
> (cost=0.00..53.01 rows=27 width=8) (actual time=0.030..0.039
> rows=16 loops=25410)
>                
>            Index Cond:
> (((transactions_1.objecttype)::text = 'RT::Ticket'::text)
> AND (transactions_1.objectid = main.effectiveid))
>            
>    ->  Index Scan using attachments2
> on attachments attachments_2  (cost=0.00..7.89 rows=1
> width=4) (actual time=0.008..0.009 rows=0 loops=417711)
>                
>      Index Cond:
> (attachments_2.transactionid = transactions_1.id)
>                
>      Filter: ((attachments_2.trigrams @@
> '''uir'''::tsquery) AND (attachments_2.content ~~*
> '%uir%'::text))
>  Total runtime: 5208.149 ms
> (14 rows)
>
> The above times are for already cached data (repeated
> query).
> I think the execution plan is poor. Better would be to
> filter table attachments
> at first and then join the rest. The reason is a bad
> estimate on number of rows
> returned from table tickets (85 estimated -> 25410 in
> the reality).
> Eliminating sub-condition...
>
>
> rt=# explain analyze select * from tickets where
> effectiveid = id;
>                
>                
>                
>   QUERY PLAN           
>                
>                
>      
> --------------------------------------------------------------------------------------------------------------
>  Seq Scan on tickets  (cost=0.00..4097.40 rows=530
> width=162) (actual time=0.019..38.130 rows=101869 loops=1)
>    Filter: (effectiveid = id)
>  Total runtime: 54.318 ms
> (3 rows)
>
> Estimated 530 rows, but reality is 101869 rows.
>
> The problem is the strong dependance between id and
> effectiveid. The RT
> documentation says:
>
>     EffectiveId:
>     By default, a ticket's EffectiveId is the
> same as its ID. RT supports the
>     ability to merge tickets together. When you
> merge a ticket into
>     another one, RT sets the first ticket's
> EffectiveId to the second
>     ticket's ID. RT uses this data to quickly
> look up which ticket
>     you're really talking about when you
> reference a merged ticket.
>
>
> I googled the page http://wiki.postgresql.org/wiki/Cross_Columns_Stats
>
> Maybe I identified the already documented problem. What I
> can do with this
> situation? Some workaround?
>
> Thanks in advance for any suggestions.
> Best Regards
> --
> Zito
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>




Re: poor execution plan because column dependence

От
Tom Lane
Дата:
=?iso-8859-1?Q?V=E1clav_Ovs=EDk?= <vaclav.ovsik@i.cz> writes:
> I think the execution plan is poor. Better would be to filter table attachments
> at first and then join the rest. The reason is a bad estimate on number of rows
> returned from table tickets (85 estimated -> 25410 in the reality).
> ...
> The problem is the strong dependance between id and effectiveid. The RT
> documentation says:

>     EffectiveId:
>     By default, a ticket's EffectiveId is the same as its ID. RT supports the
>     ability to merge tickets together. When you merge a ticket into
>     another one, RT sets the first ticket's EffectiveId to the second
>     ticket's ID. RT uses this data to quickly look up which ticket
>     you're really talking about when you reference a merged ticket.

> I googled the page http://wiki.postgresql.org/wiki/Cross_Columns_Stats

> Maybe I identified the already documented problem. What I can do with this
> situation? Some workaround?

Yeah, that main.EffectiveId = main.id clause is going to be
underestimated by a factor of about 200, which is most though not all of
your rowcount error for that table.  Not sure whether you can do much
about it, if the query is coming from a query generator that you can't
change.  If you can change it, try replacing main.EffectiveId = main.id
with the underlying function, eg if they're integers use
int4eq(main.EffectiveId, main.id).  This will bypass the overoptimistic
estimator for the "=" operator and get you a default selectivity
estimate of (IIRC) 0.3333.  Which is still off, but only by 3x not 200x,
and that should be close enough to get a decent plan.

            regards, tom lane

Re: poor execution plan because column dependence

От
Václav Ovsík
Дата:
Dear Bob,

On Tue, Apr 12, 2011 at 05:14:29PM -0700, Bob Lunney wrote:
> Zito,
>
> Using psql log in as the database owner and run "analyze verbose".  Happiness will ensue.

Unfortunately not. I ran "analyze" with different values
default_statistics_target till 1000 as first tries always with the same
problem described. I returned the value to the default 100 at the end:

> Also, when requesting help with a query its important to state the
> database version ("select version();") and what, if any, configuration
> changes you have made in postgresql.conf.   Listing ony the ones that
> have changed is sufficient.

You are right. I red about this, but after reading, analyzing,
experimenting finally forgot to mention this basic information :(. The reason
was I didn't feel to be interesting now also probably. The problem is
planner I am afraid.
Application and PostgreSQL is running on KVM virtual machine hosting Debian
GNU/Linux Squeeze. "select version();" returns:

'PostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit'

Changed interesting parameters in postgresql.conf:

max_connections = 48
shared_buffers = 1024MB
work_mem = 32MB
maintenance_work_mem = 256MB
checkpoint_segments = 24
effective_cache_size = 2048MB
log_min_duration_statement = 500

The virtual machine is the only one currently running on iron Dell
PowerEdge R710, 2 x CPU Xeon L5520 @ 2.27GHz (quad-core), 32GiB RAM.

PostgreSQL package installed is 8.4.7-0squeeze2.

The VM has allocated 6GiB RAM and 2 CPU.


One of my first hope was maybe a newer PostgreSQL series 9, can
behaves better.  I installed a second virtual machine with Debian
GNU/Linux Sid and PostgreSQL package version 9.0.3-1. The result was the
same.


> Finally, the wiki has some good information on the care and feeding of a PostgreSQL database:
>
> http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT

I red this already.
Thanks
--
Zito

Re: poor execution plan because column dependence

От
Václav Ovsík
Дата:
Dear Tom,

On Tue, Apr 12, 2011 at 08:52:15PM -0400, Tom Lane wrote:
>..
> Yeah, that main.EffectiveId = main.id clause is going to be
> underestimated by a factor of about 200, which is most though not all of
> your rowcount error for that table.  Not sure whether you can do much
> about it, if the query is coming from a query generator that you can't
> change.  If you can change it, try replacing main.EffectiveId = main.id
> with the underlying function, eg if they're integers use
> int4eq(main.EffectiveId, main.id).  This will bypass the overoptimistic
> estimator for the "=" operator and get you a default selectivity
> estimate of (IIRC) 0.3333.  Which is still off, but only by 3x not 200x,
> and that should be close enough to get a decent plan.

Great idea!

rt=# EXPLAIN ANALYZE SELECT DISTINCT  main.* FROM Tickets main JOIN Transactions Transactions_1  ON (
Transactions_1.ObjectId= main.id ) JOIN Attachments Attachments_2  ON ( Attachments_2.TransactionId = Transactions_1.id
) WHERE (Transactions_1.ObjectType = 'RT::Ticket') AND (main.Status != 'deleted') AND (main.Status = 'resolved' AND
main.LastUpdated> '2008-12-31 23:00:00' AND main.Created > '2005-12-31 23:00:00' AND main.Queue = '15' AND  (
Attachments_2.trigrams@@ text_to_trgm_tsquery('uir') AND Attachments_2.Content ILIKE '%uir%' ) ) AND (main.Type =
'ticket')AND int4eq(main.EffectiveId, main.id)  ORDER BY main.id ASC; 

                                                         QUERY PLAN

    

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=37504.61..37505.00 rows=6 width=162) (actual time=1377.087..1383.844 rows=649 loops=1)
   ->  Sort  (cost=37504.61..37504.62 rows=6 width=162) (actual time=1377.085..1377.973 rows=5280 loops=1)
         Sort Key: main.id, main.effectiveid, main.issuestatement, main.resolution, main.owner, main.subject,
main.initialpriority,main.finalpriority, main.priority, main.timeestimated, main.timeworked, main.timeleft, main.told,
main.starts,main.started, main.due, main.resolved, main.lastupdatedby, main.lastupdated, main.creator, main.created,
main.disabled
         Sort Method:  quicksort  Memory: 1598kB
         ->  Nested Loop  (cost=7615.47..37504.53 rows=6 width=162) (actual time=13.678..1322.292 rows=5280 loops=1)
               ->  Nested Loop  (cost=7615.47..37179.22 rows=74 width=4) (actual time=5.670..1266.703 rows=15593
loops=1)
                     ->  Bitmap Heap Scan on attachments attachments_2  (cost=7615.47..36550.26 rows=74 width=4)
(actualtime=5.658..1196.160 rows=15593 loops=1) 
                           Recheck Cond: (trigrams @@ '''uir'''::tsquery)
                           Filter: (content ~~* '%uir%'::text)
                           ->  Bitmap Index Scan on attachments_textsearch  (cost=0.00..7615.45 rows=8016 width=0)
(actualtime=3.863..3.863 rows=15972 loops=1) 
                                 Index Cond: (trigrams @@ '''uir'''::tsquery)
                     ->  Index Scan using transactions_pkey on transactions transactions_1  (cost=0.00..8.49 rows=1
width=8)(actual time=0.003..0.003 rows=1 loops=15593) 
                           Index Cond: (transactions_1.id = attachments_2.transactionid)
                           Filter: ((transactions_1.objecttype)::text = 'RT::Ticket'::text)
               ->  Index Scan using tickets5 on tickets main  (cost=0.00..4.38 rows=1 width=162) (actual
time=0.003..0.003rows=0 loops=15593) 
                     Index Cond: (main.id = transactions_1.objectid)
                     Filter: (((main.status)::text <> 'deleted'::text) AND (main.lastupdated > '2008-12-31
23:00:00'::timestampwithout time zone) AND (main.created > '2005-12-31 23:00:00'::timestamp without time zone) AND
int4eq(main.effectiveid,main.id) AND (main.queue = 15) AND ((main.type)::text = 'ticket'::text) AND
((main.status)::text= 'resolved'::text)) 
 Total runtime: 1384.038 ms
(18 rows)

Execution plan desired! :)

Indexes:
    "tickets_pkey" PRIMARY KEY, btree (id)
    "tickets1" btree (queue, status)
    "tickets2" btree (owner)
    "tickets3" btree (effectiveid)
    "tickets4" btree (id, status)
    "tickets5" btree (id, effectiveid)

Interesting the original index tickets5 is still used for
int4eq(main.effectiveid, main.id), no need to build a different.
Great!

I think no problem to do this small hack into the SearchBuilder. I did
already one for full text search so there will be two hacks :).

Thanks very much.
Best Regards
--
Zito

Re: poor execution plan because column dependence

От
Tom Lane
Дата:
=?iso-8859-1?Q?V=E1clav_Ovs=EDk?= <vaclav.ovsik@i.cz> writes:
> On Tue, Apr 12, 2011 at 08:52:15PM -0400, Tom Lane wrote:
>> ... If you can change it, try replacing main.EffectiveId = main.id
>> with the underlying function, eg if they're integers use
>> int4eq(main.EffectiveId, main.id).  This will bypass the overoptimistic
>> estimator for the "=" operator and get you a default selectivity
>> estimate of (IIRC) 0.3333.  Which is still off, but only by 3x not 200x,
>> and that should be close enough to get a decent plan.

> Great idea!

> Interesting the original index tickets5 is still used for
> int4eq(main.effectiveid, main.id), no need to build a different.

Well, no, it won't be.  This hack is entirely dependent on the fact that
the optimizer mostly works with operator expressions, and is blind to
the fact that the underlying functions are really the same thing.
(Which is something I'd like to see fixed someday, but in the meantime
it gives you an escape hatch.)  If you use the int4eq() construct in a
context where you'd like to see it transformed into an index qual, it
won't be.  For this particular case that doesn't matter because there's
no use in using an index for that clause anyway.  But you'll need to be
very careful that your changes in the query generator don't result in
using int4eq() in any contexts other than the "main.EffectiveId=main.id"
check.

            regards, tom lane

Re: poor execution plan because column dependence

От
Václav Ovsík
Дата:
On Wed, Apr 13, 2011 at 12:24:06PM -0400, Tom Lane wrote:
> > Interesting the original index tickets5 is still used for
> > int4eq(main.effectiveid, main.id), no need to build a different.
>
> Well, no, it won't be.  This hack is entirely dependent on the fact that
> the optimizer mostly works with operator expressions, and is blind to
> the fact that the underlying functions are really the same thing.
> (Which is something I'd like to see fixed someday, but in the meantime
> it gives you an escape hatch.)  If you use the int4eq() construct in a
> context where you'd like to see it transformed into an index qual, it
> won't be.  For this particular case that doesn't matter because there's
> no use in using an index for that clause anyway.  But you'll need to be
> very careful that your changes in the query generator don't result in
> using int4eq() in any contexts other than the "main.EffectiveId=main.id"
> check.

Sorry I'm not certain understand your paragraph completely...

I perfectly understand the fact that change from
    A = B    into    int4eq(A, B)
stopped bad estimate and execution plan is corrected, but that can
change someday in the future.

I'm not certain about your sentence touching int4eq() and index. The
execution plan as show in my previous mail contains information about
using index tickets5:

...
               ->  Index Scan using tickets5 on tickets main  (cost=0.00..4.38 rows=1 width=162) (actual
time=0.006..0.006rows=0 loops=15593) 
                     Index Cond: (main.id = transactions_1.objectid)
                     Filter: (((main.status)::text <> 'deleted'::text) AND (main.lastupdated > '2008-12-31
23:00:00'::timestampwithout time zone) AND (main.created > '2005-12-31 23:00:00'::timestamp without time zone) AND
int4eq(main.effectiveid,main.id) AND (main.queue = 15) AND ((main.type)::text = 'ticket'::text) AND
((main.status)::text= 'resolved'::text)) 
...


Filter condition contains int4eq(main.effectiveid, main.id) and tickets5
is: "tickets5" btree (id, effectiveid)

That means tickets5 index was used for int4eq(main.effectiveid, main.id).
Is it right? Or am I something missing?

Well the index will not be used generally probably, because of
selectivity of int4eq() you mention (33%). The planner thinks it is
better to use seq scan then. I tried this now.

I did hack for this particular case only:


diff --git a/local/lib/DBIx/SearchBuilder.pm b/local/lib/DBIx/SearchBuilder.pm
index f3ee1e1..9e3a6a6 100644
--- a/local/lib/DBIx/SearchBuilder.pm
+++ b/local/lib/DBIx/SearchBuilder.pm
@@ -1040,7 +1040,9 @@ sub _CompileGenericRestrictions {
                 $result .= ' '. $entry . ' ';
             }
             else {
-                $result .= join ' ', @{$entry}{qw(field op value)};
+                my $term = join ' ', @{$entry}{qw(field op value)};
+                $term =~ s/^(main|Tickets_\d+)\.(EffectiveId) = (\1)\.(id)$/int4eq($1.$2, $3.$4)/i;
+                $result .= $term;
             }
         }
         $result .= ')';


It works as expected.
Thanks
Best Regards
--
Zito

Re: poor execution plan because column dependence

От
Tom Lane
Дата:
=?iso-8859-1?Q?V=E1clav_Ovs=EDk?= <vaclav.ovsik@i.cz> writes:
> I'm not certain about your sentence touching int4eq() and index. The
> execution plan as show in my previous mail contains information about
> using index tickets5:

>                ->  Index Scan using tickets5 on tickets main  (cost=0.00..4.38 rows=1 width=162) (actual
time=0.006..0.006rows=0 loops=15593) 
>                      Index Cond: (main.id = transactions_1.objectid)
>                      Filter: (((main.status)::text <> 'deleted'::text) AND (main.lastupdated > '2008-12-31
23:00:00'::timestampwithout time zone) AND (main.created > '2005-12-31 23:00:00'::timestamp without time zone) AND
int4eq(main.effectiveid,main.id) AND (main.queue = 15) AND ((main.type)::text = 'ticket'::text) AND
((main.status)::text= 'resolved'::text)) 

> That means tickets5 index was used for int4eq(main.effectiveid, main.id).
> Is it right? Or am I something missing?

No, the clause that's being used with the index is
    main.id = transactions_1.objectid
The "filter condition" is just along for the ride --- it doesn't matter
what sort of expressions are in there, so long as they only use
variables available at this point in the plan.  But if you had coded
that clause as
    int4eq(main.id, transactions_1.objectid)
it would have been unable to create this plan at all.

            regards, tom lane

Re: poor execution plan because column dependence

От
Václav Ovsík
Дата:
Dear Tom,

On Thu, Apr 14, 2011 at 10:10:44AM -0400, Tom Lane wrote:
> =?iso-8859-1?Q?V=E1clav_Ovs=EDk?= <vaclav.ovsik@i.cz> writes:
> > I'm not certain about your sentence touching int4eq() and index. The
> > execution plan as show in my previous mail contains information about
> > using index tickets5:
>
> >                ->  Index Scan using tickets5 on tickets main  (cost=0.00..4.38 rows=1 width=162) (actual
time=0.006..0.006rows=0 loops=15593) 
> >                      Index Cond: (main.id = transactions_1.objectid)
> >                      Filter: (((main.status)::text <> 'deleted'::text) AND (main.lastupdated > '2008-12-31
23:00:00'::timestampwithout time zone) AND (main.created > '2005-12-31 23:00:00'::timestamp without time zone) AND
int4eq(main.effectiveid,main.id) AND (main.queue = 15) AND ((main.type)::text = 'ticket'::text) AND
((main.status)::text= 'resolved'::text)) 
>
> > That means tickets5 index was used for int4eq(main.effectiveid, main.id).
> > Is it right? Or am I something missing?
>
> No, the clause that's being used with the index is
>     main.id = transactions_1.objectid
> The "filter condition" is just along for the ride --- it doesn't matter
> what sort of expressions are in there, so long as they only use
> variables available at this point in the plan.  But if you had coded
> that clause as
>     int4eq(main.id, transactions_1.objectid)
> it would have been unable to create this plan at all.

Thanks you for the explanation and the patience with me. I have red the
chapter "Multicolumn Indexes" in the Pg doc and discover new things for
me. The planner can use multicolumn index with an index leftmost field
alone - I missed this. I understand things a bit better now.
Thanks!
Best Regards
--
Zito