Обсуждение: optimizing a (simple?) query on a largeish table
Hi.
I'm just getting started with PostgreSQL. Porting over an huge Oracle
database application for a fun first project :^)
I've got the following query which I'm trying to run on a 4.2 million row table:
SELECT ActionItems.*
FROM ActionItems
WHERE
attn=upper(SESSION_USER)
or attn in (
select upper(groname)
from pg_group
where (select oid from pg_roles where rolname = SESSION_USER) = ANY(grolist)
)
ORDER BY dateTimeCreated
That is, "match any ActionItem directed to me personally, or to the
groups to which I belong".
It currently takes about 8 seconds.
I have indexes on both used columns in the large table ("attn" and
"dateTimeCreated"), but it doesn't seem to be using them --- I've
attached the "EXPLAIN" result below.
Any ideas about what's going on here? How can I reduce the execution time?
Thanks,
Kurt
---
Sort (cost=1242644.46..1247909.54 rows=2106033 width=200)
Sort Key: datetimecreated
-> Seq Scan on actionitems (cost=573.01..186430.80 rows=2106033 width=200)
Filter: (((attn)::text = upper(("session_user"())::text)) OR
(hashed subplan))
SubPlan
-> Seq Scan on pg_authid (cost=5.10..573.01 rows=2 width=64)
Filter: ((NOT rolcanlogin) AND ($0 = ANY ((subplan))))
InitPlan
-> Seq Scan on pg_authid (cost=0.00..5.10 rows=1 width=4)
Filter: (rolname = "session_user"())
SubPlan
-> Seq Scan on pg_auth_members (cost=0.00..4.01
rows=15 width=4)
Filter: (roleid = $1)
"Dr. Kurt Ruff" <kurt.ruff@gmail.com> writes:
> I've got the following query which I'm trying to run on a 4.2 million row table:
> SELECT ActionItems.*
> FROM ActionItems
> WHERE
> attn=upper(SESSION_USER)
> or attn in (
> select upper(groname)
> from pg_group
> where (select oid from pg_roles where rolname = SESSION_USER) = ANY(grolist)
> )
> ORDER BY dateTimeCreated
Replacing the OR with a UNION or UNION ALL might help, though I also
wonder whether you've selected a compatible datatype for "attn".
The upper() calls will yield type TEXT.
[ fools around a bit... ] Another possibility, if you're using PG 8.2
or later, is to replace the "attn IN (sub-SELECT)" with "attn = ANY
(ARRAY(sub-SELECT))". This is a hack --- the planner probably ought to
think of that for itself --- but currently it doesn't.
All this advice is predicated on the assumption that there are few
enough matching rows that multiple indexscans really are a better plan
than one seqscan. Since you didn't say how many rows you expect, it's
not impossible that the plan you've got is in fact the best.
regards, tom lane