Обсуждение: Sequencial scan in a JOIN

Поиск
Список
Период
Сортировка

Sequencial scan in a JOIN

От
Andrew Jaimes
Дата:
Hi everyone,

I am trying to run the following query:

SELECT  count(1) --DISTINCT l_userqueue.queueid
  FROM e_usersessions
  JOIN l_userqueue
    ON l_userqueue.userid = e_usersessions.entityid
  JOIN a_activity
    ON a_activity.activequeueid  = l_userqueue.queueid
   AND a_activity.vstatus        = 1
   AND a_activity.ventrydate     > 0
   AND a_activity.sbuid          = e_usersessions.sbuid
   AND a_activity.assignedtoid   = 0
   AND a_activity.status        <> '0'   
  WHERE e_usersessions.sessionkeepalivedatetime > 20120605082131943

Explain analyze:
'Aggregate  (cost=100402.10..100402.11 rows=1 width=0) (actual time=2249.051..2249.051 rows=1 loops=1)'
'  ->  Hash Join  (cost=10.93..99795.09 rows=242803 width=0) (actual time=0.541..2249.027 rows=33 loops=1)'
'        Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid) AND (a_activity.sbuid = e_usersessions.sbuid))'
'        ->  Seq Scan on a_activity  (cost=0.00..88462.52 rows=1208167 width=22) (actual time=0.010..1662.142 rows=1207855 loops=1)'
'              Filter: ((ventrydate > 0) AND ((status)::text <> '0'::text) AND (vstatus = 1) AND (assignedtoid = 0::numeric))'
'        ->  Hash  (cost=10.86..10.86 rows=5 width=22) (actual time=0.053..0.053 rows=4 loops=1)'
'              ->  Hash Join  (cost=9.38..10.86 rows=5 width=22) (actual time=0.033..0.048 rows=4 loops=1)'
'                    Hash Cond: (l_userqueue.userid = e_usersessions.entityid)'
'                    ->  Seq Scan on l_userqueue  (cost=0.00..1.23 rows=23 width=27) (actual time=0.003..0.009 rows=23 loops=1)'
'                    ->  Hash  (cost=9.31..9.31 rows=5 width=21) (actual time=0.018..0.018 rows=2 loops=1)'
'                          ->  Index Scan using i06_e_usersessions on e_usersessions  (cost=0.00..9.31 rows=5 width=21) (actual time=0.009..0.012 rows=2 loops=1)'
'                                Index Cond: (sessionkeepalivedatetime > 20120605082131943::bigint)'
'Total runtime: 2249.146 ms'

I am trying to understand the reason why the a sequencial scan is used on a_activity instead of using the index by activequeueid (i08_a_activity). If I run the this other query, I get a complete different results:

SELECT *
 FROM a_activity
 WHERE a_activity.activequeueid  = 123456
   AND a_activity.vstatus        = 1
   AND a_activity.ventrydate     > 0

Explain analyze:
'Index Scan using i08_a_activity on a_activity  (cost=0.00..303.57 rows=162 width=7287) (actual time=0.019..0.019 rows=0 loops=1)'
'  Index Cond: ((activequeueid = 123456::numeric) AND (vstatus = 1) AND (ventrydate > 0))'
'Total runtime: 0.076 ms'


This is the definition of the index :

CREATE INDEX i08_a_activity
  ON a_activity
  USING btree
  (activequeueid , vstatus , ventrydate );


 a_activity table has 1,216,134 rows



Thanks in advance,
Andrew


Re: Sequencial scan in a JOIN

От
Shaun Thomas
Дата:
On 06/05/2012 07:48 AM, Andrew Jaimes wrote:

> '  ->  Hash Join  (cost=10.93..99795.09 rows=242803 width=0) (actual
> time=0.541..2249.027 rows=33 loops=1)'
> '        Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid)
> AND (a_activity.sbuid = e_usersessions.sbuid))'
> '        ->  Seq Scan on a_activity  (cost=0.00..88462.52 rows=1208167
> width=22) (actual time=0.010..1662.142

I'd be willing to bet your stats are way, way off. It expected 242,803
rows in the hash, but only got 33. In that kind of scenario, I could
easily see the planner choosing a sequence scan over an index scan, as
doing that many index seeks would be much more expensive than scanning
the table.

What's your default_statistics_target, and when is the last time you
analyzed the tables in this query?

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Sequencial scan in a JOIN

От
Andrew Jaimes
Дата:
the default_statistics_target is set to 200, and I have run the analyze and reindex on these tables before writing the email.


Andrew

> Date: Tue, 5 Jun 2012 08:15:45 -0500
> From: sthomas@optionshouse.com
> To: andrewjaimes@hotmail.com
> CC: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Sequencial scan in a JOIN
>
> On 06/05/2012 07:48 AM, Andrew Jaimes wrote:
>
> > ' -> Hash Join (cost=10.93..99795.09 rows=242803 width=0) (actual
> > time=0.541..2249.027 rows=33 loops=1)'
> > ' Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid)
> > AND (a_activity.sbuid = e_usersessions.sbuid))'
> > ' -> Seq Scan on a_activity (cost=0.00..88462.52 rows=1208167
> > width=22) (actual time=0.010..1662.142
>
> I'd be willing to bet your stats are way, way off. It expected 242,803
> rows in the hash, but only got 33. In that kind of scenario, I could
> easily see the planner choosing a sequence scan over an index scan, as
> doing that many index seeks would be much more expensive than scanning
> the table.
>
> What's your default_statistics_target, and when is the last time you
> analyzed the tables in this query?
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-444-8534
> sthomas@optionshouse.com
>
> ______________________________________________
>
> See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Sequencial scan in a JOIN

От
Shaun Thomas
Дата:
On 06/05/2012 08:31 AM, Andrew Jaimes wrote:

> the default_statistics_target is set to 200, and I have run the analyze
> and reindex on these tables before writing the email.

Out of idle curiosity, how do these two variants treat you?

SELECT count(1)
   FROM e_usersessions s
   JOIN l_userqueue q ON (q.userid = s.entityid)
   JOIN a_activity a ON (a.activequeueid = q.queueid)
  WHERE s.sessionkeepalivedatetime > 20120605082131943
    AND a.vstatus        = 1
    AND a.ventrydate     > 0
    AND a.sbuid          = s.sbuid
    AND a.assignedtoid   = 0
    AND a.status        <> '0'

SELECT count(1)
   FROM e_usersessions s
   JOIN l_userqueue q ON (q.userid = s.entityid)
  WHERE s.sessionkeepalivedatetime > 20120605082131943
    AND EXISTS (
          SELECT 1 FROM a_activity a
           WHERE a.activequeueid = q.queueid
             AND a.sbuid         = s.sbuid
             AND a.vstatus       = 1
             AND a.ventrydate    > 0
             AND a.assignedtoid  = 0
             AND a.status       <> '0'
        )

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Sequencial scan in a JOIN

От
Shaun Thomas
Дата:
On 06/05/2012 09:41 AM, Andrew Jaimes wrote:

> The second query ran better than the first one:

That's what I figured. Ok, so looking back to your original message again:

CREATE INDEX i08_a_activity
   ON a_activity
   USING btree
   (activequeueid , vstatus , ventrydate );

Based on the query here, it doesn't appear that vstatus or ventrydate
are doing you any good in that index. Nor would your query even really
make use of them anyway, considering their catch-all equalities. If you
can make a clone of a_activity, could you try this index instead with
your original query:

CREATE INDEX idx_a_activity_queue
     ON a_activity_clone (activequeueid);

Then compare to this:

CREATE INDEX idx_a_activity_queue_sbuid
     ON a_activity_clone (activequeueid, sbuid);

And the results of this query would also be handy:

SELECT attname, n_distinct
   FROM pg_stats
  WHERE tablename='a_activity';

Generally you want to order your composite indexes in order of
uniqueness, if you even want to make a composite index in the first
place. I noticed in both cases, it's basically ignoring sbuid aside from
the implied hash to exclude non-matches.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: Sequencial scan in a JOIN

От
Robert Haas
Дата:
On Tue, Jun 5, 2012 at 8:48 AM, Andrew Jaimes <andrewjaimes@hotmail.com> wrote:
> Hi everyone,
>
> I am trying to run the following query:
>
> SELECT  count(1) --DISTINCT l_userqueue.queueid
>   FROM e_usersessions
>   JOIN l_userqueue
>     ON l_userqueue.userid = e_usersessions.entityid
>   JOIN a_activity
>     ON a_activity.activequeueid  = l_userqueue.queueid
>    AND a_activity.vstatus        = 1
>    AND a_activity.ventrydate     > 0
>    AND a_activity.sbuid          = e_usersessions.sbuid
>    AND a_activity.assignedtoid   = 0
>    AND a_activity.status        <> '0'
>   WHERE e_usersessions.sessionkeepalivedatetime > 20120605082131943
>
> Explain analyze:
> 'Aggregate  (cost=100402.10..100402.11 rows=1 width=0) (actual
> time=2249.051..2249.051 rows=1 loops=1)'
> '  ->  Hash Join  (cost=10.93..99795.09 rows=242803 width=0) (actual
> time=0.541..2249.027 rows=33 loops=1)'
> '        Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid) AND
> (a_activity.sbuid = e_usersessions.sbuid))'
> '        ->  Seq Scan on a_activity  (cost=0.00..88462.52 rows=1208167
> width=22) (actual time=0.010..1662.142 rows=1207855 loops=1)'
> '              Filter: ((ventrydate > 0) AND ((status)::text <> '0'::text)
> AND (vstatus = 1) AND (assignedtoid = 0::numeric))'
> '        ->  Hash  (cost=10.86..10.86 rows=5 width=22) (actual
> time=0.053..0.053 rows=4 loops=1)'
> '              ->  Hash Join  (cost=9.38..10.86 rows=5 width=22) (actual
> time=0.033..0.048 rows=4 loops=1)'
> '                    Hash Cond: (l_userqueue.userid =
> e_usersessions.entityid)'
> '                    ->  Seq Scan on l_userqueue  (cost=0.00..1.23 rows=23
> width=27) (actual time=0.003..0.009 rows=23 loops=1)'
> '                    ->  Hash  (cost=9.31..9.31 rows=5 width=21) (actual
> time=0.018..0.018 rows=2 loops=1)'
> '                          ->  Index Scan using i06_e_usersessions on
> e_usersessions  (cost=0.00..9.31 rows=5 width=21) (actual time=0.009..0.012
> rows=2 loops=1)'
> '                                Index Cond: (sessionkeepalivedatetime >
> 20120605082131943::bigint)'
> 'Total runtime: 2249.146 ms'
>
> I am trying to understand the reason why the a sequencial scan is used on
> a_activity instead of using the index by activequeueid (i08_a_activity).

I'm chiming in a bit late here, but it seems like you're hoping that
the query plan will form the outer join as a nested loop, with the
inner and outer sides swapped, so that the results of the join between
l_userqueue and e_usersessions are used to drive a series of index
scans on a_activity that avoid scanning the whole table.  PostgreSQL
9.2 will be the first release that has the ability to generate that
kind of plan, so it would be interesting to see what happens if you
try this on 9.2beta.

Older releases should be able consider a nested loop join with
l_userqueue as the inner rel, driving an index scan over a_activity,
and then performing the join to e_usersessions afterwards.  But that
plan might not be nearly as good, since then we'd have to do 23
index-scans on a_activity rather than just 4.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company