Обсуждение: Sequencial scan in a JOIN
Hi everyone,
I am trying to run the following query:
SELECT count(1) --DISTINCT l_userqueue.queueid
FROM e_usersessions
JOIN l_userqueue
ON l_userqueue.userid = e_usersessions.entityid
JOIN a_activity
ON a_activity.activequeueid = l_userqueue.queueid
AND a_activity.vstatus = 1
AND a_activity.ventrydate > 0
AND a_activity.sbuid = e_usersessions.sbuid
AND a_activity.assignedtoid = 0
AND a_activity.status <> '0'
WHERE e_usersessions.sessionkeepalivedatetime > 20120605082131943
Explain analyze:
'Aggregate (cost=100402.10..100402.11 rows=1 width=0) (actual time=2249.051..2249.051 rows=1 loops=1)'
' -> Hash Join (cost=10.93..99795.09 rows=242803 width=0) (actual time=0.541..2249.027 rows=33 loops=1)'
' Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid) AND (a_activity.sbuid = e_usersessions.sbuid))'
' -> Seq Scan on a_activity (cost=0.00..88462.52 rows=1208167 width=22) (actual time=0.010..1662.142 rows=1207855 loops=1)'
' Filter: ((ventrydate > 0) AND ((status)::text <> '0'::text) AND (vstatus = 1) AND (assignedtoid = 0::numeric))'
' -> Hash (cost=10.86..10.86 rows=5 width=22) (actual time=0.053..0.053 rows=4 loops=1)'
' -> Hash Join (cost=9.38..10.86 rows=5 width=22) (actual time=0.033..0.048 rows=4 loops=1)'
' Hash Cond: (l_userqueue.userid = e_usersessions.entityid)'
' -> Seq Scan on l_userqueue (cost=0.00..1.23 rows=23 width=27) (actual time=0.003..0.009 rows=23 loops=1)'
' -> Hash (cost=9.31..9.31 rows=5 width=21) (actual time=0.018..0.018 rows=2 loops=1)'
' -> Index Scan using i06_e_usersessions on e_usersessions (cost=0.00..9.31 rows=5 width=21) (actual time=0.009..0.012 rows=2 loops=1)'
' Index Cond: (sessionkeepalivedatetime > 20120605082131943::bigint)'
'Total runtime: 2249.146 ms'
I am trying to understand the reason why the a sequencial scan is used on a_activity instead of using the index by activequeueid (i08_a_activity). If I run the this other query, I get a complete different results:
SELECT *
FROM a_activity
WHERE a_activity.activequeueid = 123456
AND a_activity.vstatus = 1
AND a_activity.ventrydate > 0
Explain analyze:
'Index Scan using i08_a_activity on a_activity (cost=0.00..303.57 rows=162 width=7287) (actual time=0.019..0.019 rows=0 loops=1)'
' Index Cond: ((activequeueid = 123456::numeric) AND (vstatus = 1) AND (ventrydate > 0))'
'Total runtime: 0.076 ms'
This is the definition of the index :
CREATE INDEX i08_a_activity
ON a_activity
USING btree
(activequeueid , vstatus , ventrydate );
a_activity table has 1,216,134 rows
Thanks in advance,
Andrew
I am trying to run the following query:
SELECT count(1) --DISTINCT l_userqueue.queueid
FROM e_usersessions
JOIN l_userqueue
ON l_userqueue.userid = e_usersessions.entityid
JOIN a_activity
ON a_activity.activequeueid = l_userqueue.queueid
AND a_activity.vstatus = 1
AND a_activity.ventrydate > 0
AND a_activity.sbuid = e_usersessions.sbuid
AND a_activity.assignedtoid = 0
AND a_activity.status <> '0'
WHERE e_usersessions.sessionkeepalivedatetime > 20120605082131943
Explain analyze:
'Aggregate (cost=100402.10..100402.11 rows=1 width=0) (actual time=2249.051..2249.051 rows=1 loops=1)'
' -> Hash Join (cost=10.93..99795.09 rows=242803 width=0) (actual time=0.541..2249.027 rows=33 loops=1)'
' Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid) AND (a_activity.sbuid = e_usersessions.sbuid))'
' -> Seq Scan on a_activity (cost=0.00..88462.52 rows=1208167 width=22) (actual time=0.010..1662.142 rows=1207855 loops=1)'
' Filter: ((ventrydate > 0) AND ((status)::text <> '0'::text) AND (vstatus = 1) AND (assignedtoid = 0::numeric))'
' -> Hash (cost=10.86..10.86 rows=5 width=22) (actual time=0.053..0.053 rows=4 loops=1)'
' -> Hash Join (cost=9.38..10.86 rows=5 width=22) (actual time=0.033..0.048 rows=4 loops=1)'
' Hash Cond: (l_userqueue.userid = e_usersessions.entityid)'
' -> Seq Scan on l_userqueue (cost=0.00..1.23 rows=23 width=27) (actual time=0.003..0.009 rows=23 loops=1)'
' -> Hash (cost=9.31..9.31 rows=5 width=21) (actual time=0.018..0.018 rows=2 loops=1)'
' -> Index Scan using i06_e_usersessions on e_usersessions (cost=0.00..9.31 rows=5 width=21) (actual time=0.009..0.012 rows=2 loops=1)'
' Index Cond: (sessionkeepalivedatetime > 20120605082131943::bigint)'
'Total runtime: 2249.146 ms'
I am trying to understand the reason why the a sequencial scan is used on a_activity instead of using the index by activequeueid (i08_a_activity). If I run the this other query, I get a complete different results:
SELECT *
FROM a_activity
WHERE a_activity.activequeueid = 123456
AND a_activity.vstatus = 1
AND a_activity.ventrydate > 0
Explain analyze:
'Index Scan using i08_a_activity on a_activity (cost=0.00..303.57 rows=162 width=7287) (actual time=0.019..0.019 rows=0 loops=1)'
' Index Cond: ((activequeueid = 123456::numeric) AND (vstatus = 1) AND (ventrydate > 0))'
'Total runtime: 0.076 ms'
This is the definition of the index :
CREATE INDEX i08_a_activity
ON a_activity
USING btree
(activequeueid , vstatus , ventrydate );
a_activity table has 1,216,134 rows
Thanks in advance,
Andrew
On 06/05/2012 07:48 AM, Andrew Jaimes wrote: > ' -> Hash Join (cost=10.93..99795.09 rows=242803 width=0) (actual > time=0.541..2249.027 rows=33 loops=1)' > ' Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid) > AND (a_activity.sbuid = e_usersessions.sbuid))' > ' -> Seq Scan on a_activity (cost=0.00..88462.52 rows=1208167 > width=22) (actual time=0.010..1662.142 I'd be willing to bet your stats are way, way off. It expected 242,803 rows in the hash, but only got 33. In that kind of scenario, I could easily see the planner choosing a sequence scan over an index scan, as doing that many index seeks would be much more expensive than scanning the table. What's your default_statistics_target, and when is the last time you analyzed the tables in this query? -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
the default_statistics_target is set to 200, and I have run the analyze and reindex on these tables before writing the email.
Andrew
Andrew
> Date: Tue, 5 Jun 2012 08:15:45 -0500
> From: sthomas@optionshouse.com
> To: andrewjaimes@hotmail.com
> CC: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Sequencial scan in a JOIN
>
> On 06/05/2012 07:48 AM, Andrew Jaimes wrote:
>
> > ' -> Hash Join (cost=10.93..99795.09 rows=242803 width=0) (actual
> > time=0.541..2249.027 rows=33 loops=1)'
> > ' Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid)
> > AND (a_activity.sbuid = e_usersessions.sbuid))'
> > ' -> Seq Scan on a_activity (cost=0.00..88462.52 rows=1208167
> > width=22) (actual time=0.010..1662.142
>
> I'd be willing to bet your stats are way, way off. It expected 242,803
> rows in the hash, but only got 33. In that kind of scenario, I could
> easily see the planner choosing a sequence scan over an index scan, as
> doing that many index seeks would be much more expensive than scanning
> the table.
>
> What's your default_statistics_target, and when is the last time you
> analyzed the tables in this query?
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-444-8534
> sthomas@optionshouse.com
>
> ______________________________________________
>
> See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> From: sthomas@optionshouse.com
> To: andrewjaimes@hotmail.com
> CC: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Sequencial scan in a JOIN
>
> On 06/05/2012 07:48 AM, Andrew Jaimes wrote:
>
> > ' -> Hash Join (cost=10.93..99795.09 rows=242803 width=0) (actual
> > time=0.541..2249.027 rows=33 loops=1)'
> > ' Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid)
> > AND (a_activity.sbuid = e_usersessions.sbuid))'
> > ' -> Seq Scan on a_activity (cost=0.00..88462.52 rows=1208167
> > width=22) (actual time=0.010..1662.142
>
> I'd be willing to bet your stats are way, way off. It expected 242,803
> rows in the hash, but only got 33. In that kind of scenario, I could
> easily see the planner choosing a sequence scan over an index scan, as
> doing that many index seeks would be much more expensive than scanning
> the table.
>
> What's your default_statistics_target, and when is the last time you
> analyzed the tables in this query?
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-444-8534
> sthomas@optionshouse.com
>
> ______________________________________________
>
> See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
On 06/05/2012 08:31 AM, Andrew Jaimes wrote: > the default_statistics_target is set to 200, and I have run the analyze > and reindex on these tables before writing the email. Out of idle curiosity, how do these two variants treat you? SELECT count(1) FROM e_usersessions s JOIN l_userqueue q ON (q.userid = s.entityid) JOIN a_activity a ON (a.activequeueid = q.queueid) WHERE s.sessionkeepalivedatetime > 20120605082131943 AND a.vstatus = 1 AND a.ventrydate > 0 AND a.sbuid = s.sbuid AND a.assignedtoid = 0 AND a.status <> '0' SELECT count(1) FROM e_usersessions s JOIN l_userqueue q ON (q.userid = s.entityid) WHERE s.sessionkeepalivedatetime > 20120605082131943 AND EXISTS ( SELECT 1 FROM a_activity a WHERE a.activequeueid = q.queueid AND a.sbuid = s.sbuid AND a.vstatus = 1 AND a.ventrydate > 0 AND a.assignedtoid = 0 AND a.status <> '0' ) -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 06/05/2012 09:41 AM, Andrew Jaimes wrote: > The second query ran better than the first one: That's what I figured. Ok, so looking back to your original message again: CREATE INDEX i08_a_activity ON a_activity USING btree (activequeueid , vstatus , ventrydate ); Based on the query here, it doesn't appear that vstatus or ventrydate are doing you any good in that index. Nor would your query even really make use of them anyway, considering their catch-all equalities. If you can make a clone of a_activity, could you try this index instead with your original query: CREATE INDEX idx_a_activity_queue ON a_activity_clone (activequeueid); Then compare to this: CREATE INDEX idx_a_activity_queue_sbuid ON a_activity_clone (activequeueid, sbuid); And the results of this query would also be handy: SELECT attname, n_distinct FROM pg_stats WHERE tablename='a_activity'; Generally you want to order your composite indexes in order of uniqueness, if you even want to make a composite index in the first place. I noticed in both cases, it's basically ignoring sbuid aside from the implied hash to exclude non-matches. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On Tue, Jun 5, 2012 at 8:48 AM, Andrew Jaimes <andrewjaimes@hotmail.com> wrote: > Hi everyone, > > I am trying to run the following query: > > SELECT count(1) --DISTINCT l_userqueue.queueid > FROM e_usersessions > JOIN l_userqueue > ON l_userqueue.userid = e_usersessions.entityid > JOIN a_activity > ON a_activity.activequeueid = l_userqueue.queueid > AND a_activity.vstatus = 1 > AND a_activity.ventrydate > 0 > AND a_activity.sbuid = e_usersessions.sbuid > AND a_activity.assignedtoid = 0 > AND a_activity.status <> '0' > WHERE e_usersessions.sessionkeepalivedatetime > 20120605082131943 > > Explain analyze: > 'Aggregate (cost=100402.10..100402.11 rows=1 width=0) (actual > time=2249.051..2249.051 rows=1 loops=1)' > ' -> Hash Join (cost=10.93..99795.09 rows=242803 width=0) (actual > time=0.541..2249.027 rows=33 loops=1)' > ' Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid) AND > (a_activity.sbuid = e_usersessions.sbuid))' > ' -> Seq Scan on a_activity (cost=0.00..88462.52 rows=1208167 > width=22) (actual time=0.010..1662.142 rows=1207855 loops=1)' > ' Filter: ((ventrydate > 0) AND ((status)::text <> '0'::text) > AND (vstatus = 1) AND (assignedtoid = 0::numeric))' > ' -> Hash (cost=10.86..10.86 rows=5 width=22) (actual > time=0.053..0.053 rows=4 loops=1)' > ' -> Hash Join (cost=9.38..10.86 rows=5 width=22) (actual > time=0.033..0.048 rows=4 loops=1)' > ' Hash Cond: (l_userqueue.userid = > e_usersessions.entityid)' > ' -> Seq Scan on l_userqueue (cost=0.00..1.23 rows=23 > width=27) (actual time=0.003..0.009 rows=23 loops=1)' > ' -> Hash (cost=9.31..9.31 rows=5 width=21) (actual > time=0.018..0.018 rows=2 loops=1)' > ' -> Index Scan using i06_e_usersessions on > e_usersessions (cost=0.00..9.31 rows=5 width=21) (actual time=0.009..0.012 > rows=2 loops=1)' > ' Index Cond: (sessionkeepalivedatetime > > 20120605082131943::bigint)' > 'Total runtime: 2249.146 ms' > > I am trying to understand the reason why the a sequencial scan is used on > a_activity instead of using the index by activequeueid (i08_a_activity). I'm chiming in a bit late here, but it seems like you're hoping that the query plan will form the outer join as a nested loop, with the inner and outer sides swapped, so that the results of the join between l_userqueue and e_usersessions are used to drive a series of index scans on a_activity that avoid scanning the whole table. PostgreSQL 9.2 will be the first release that has the ability to generate that kind of plan, so it would be interesting to see what happens if you try this on 9.2beta. Older releases should be able consider a nested loop join with l_userqueue as the inner rel, driving an index scan over a_activity, and then performing the join to e_usersessions afterwards. But that plan might not be nearly as good, since then we'd have to do 23 index-scans on a_activity rather than just 4. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company