Обсуждение: Improving a simple query?
I'm not an SQL or PostgreSQL expert.
I'm getting abysmal performance on a nested query and
need some help on finding ways to improve the performance:
Background:
RH 8.0 dual-CPU machine (1.2GHz athlon)
Postgresql 7.2
1GB ram
(Machine is dedicated to postgres, so there's
not much else running.)
The table has ~500K rows.
Table definition:
lab.devel.configdb=# \d attributes_table
Table "attributes_table"
Column | Type | Modifiers
--------+--------------------------+---------------
id | character varying(64) | not null
name | character varying(64) | not null
units | character varying(32) |
value | text |
time | timestamp with time zone | default now()
Indexes: id_index,
name_index
Primary key: attributes_table_pkey
Triggers: trigger_insert
View definition:
lab.devel.configdb=# \d attributes;
View "attributes"
Column | Type | Modifiers
--------+-----------------------+-----------
id | character varying(64) |
name | character varying(64) |
units | character varying(32) |
value | text |
View definition: SELECT attributes_table.id,
attributes_table.name, attributes_table.units,
attributes_table.value FROM attributes_table;
Query:
select * from attributes_table where id in (select id from
attributes where (name='obsid') and (value='oid00066'));
Now, the inner SELECT is fast:
lab.devel.configdb=# explain analyze select id from attributes
where (name='obsid') and (value='oid00066');
NOTICE: QUERY PLAN:
Index Scan using name_index on attributes_table
(cost=0.00..18187.48 rows=15 width=25)
(actual time=0.33..238.06 rows=2049 loops=1)
Total runtime: 239.28 msec
EXPLAIN
But the outer SELECT insists on using a sequential scan [it should
pick up about 20K-40K rows (normally, access is through a
script].
How slow? Slow enough that:
explain analyze select * from attributes_table where id in
(select id from attributes where (name='obsid') and
(value='oid00066'));
hasn't completed in the last 15 minutes.
Removing the analyze gives:
lab.devel.configdb=# explain select * from attributes_table where
id in (select id from attributes where (name='obsid') and
(value='oid00066'));
NOTICE: QUERY PLAN:
Seq Scan on attributes_table
(cost=100000000.00..8873688920.07 rows=241201 width=59)
SubPlan
-> Materialize (cost=18187.48..18187.48 rows=15 width=25)
-> Index Scan using name_index on attributes_table
(cost=0.00..18187.48 rows=15 width=25)
EXPLAIN
Obviously, something is forcing the outer select into a
sequential scan, which is what I assume is the bottleneck
(see above about lack of expert-ness...).
I've played with the settings in postgresql.conf, using
the on-line performance tuning guide:
shared_buffers = 8192 # 2*max_connections, min 16
max_fsm_relations = 1000 # min 10, fsm is free space map
max_fsm_pages = 10000 # min 1000, fsm is free space map
max_locks_per_transaction = 128 # min 10
wal_buffers = 64 # min 4
sort_mem = 128 # min 32
vacuum_mem = 4096 # min 1024
wal_files = 32 # range 0-64 (default was 0)
effective_cache_size = 96000 # default in 8k pages
random_page_cost = 3
but haven't noticed an significant change with these settings
over more conservative settings.
Any suggestions? Is there a better way to phrase the query
that would provide order-of-magnitude improvement?
Thanks!
Steve
--
Steve Wampler -- swampler@noao.edu
Quantum materiae materietur marmota monax si marmota
monax materiam possit materiari?
> I'm not an SQL or PostgreSQL expert. > > I'm getting abysmal performance on a nested query and > need some help on finding ways to improve the performance: [snip] > select * from attributes_table where id in (select id from > attributes where (name='obsid') and (value='oid00066')); This is the classic IN problem (much improved in 7.4 dev I believe). The recommended approach is to rewrite the query as an EXISTS form if possible. See the mailing list archives for plenty of examples. Could you not rewrite this as a simple join though? - Richard
> select * from attributes_table where id in (select id from > attributes where (name='obsid') and (value='oid00066')); Can you convert it into a join? 'where in' clauses tend to slow pgsql down. -- Mike Nolan
On Sun, Jul 13, 2003 at 08:09:17PM +0100, Richard Huxton wrote:
> > I'm not an SQL or PostgreSQL expert.
> >
> > I'm getting abysmal performance on a nested query and
> > need some help on finding ways to improve the performance:
> [snip]
> > select * from attributes_table where id in (select id from
> > attributes where (name='obsid') and (value='oid00066'));
>
> This is the classic IN problem (much improved in 7.4 dev I believe). The
> recommended approach is to rewrite the query as an EXISTS form if
> possible. See the mailing list archives for plenty of examples.
>
> Could you not rewrite this as a simple join though?
Hmmm, I don't see how. Then again, I'm pretty much the village
idiot w.r.t. SQL...
The inner select is locating a set of (2049) ids (actually from
the same table, since 'attributes' is just a view into
'attributes_table'). The outer select is then locating all
records (~30-40K) that have any of those ids. Is that really
something a JOIN could be used for?
-Steve
--
Steve Wampler -- swampler@noao.edu
Quantum materiae materietur marmota monax si marmota
monax materiam possit materiari?
Steve Wampler kirjutas P, 13.07.2003 kell 23:46:
> On Sun, Jul 13, 2003 at 08:09:17PM +0100, Richard Huxton wrote:
> > > I'm not an SQL or PostgreSQL expert.
> > >
> > > I'm getting abysmal performance on a nested query and
> > > need some help on finding ways to improve the performance:
> > [snip]
> > > select * from attributes_table where id in (select id from
> > > attributes where (name='obsid') and (value='oid00066'));
> >
> > This is the classic IN problem (much improved in 7.4 dev I believe). The
> > recommended approach is to rewrite the query as an EXISTS form if
> > possible. See the mailing list archives for plenty of examples.
> >
> > Could you not rewrite this as a simple join though?
>
> Hmmm, I don't see how. Then again, I'm pretty much the village
> idiot w.r.t. SQL...
>
> The inner select is locating a set of (2049) ids (actually from
> the same table, since 'attributes' is just a view into
> 'attributes_table'). The outer select is then locating all
> records (~30-40K) that have any of those ids. Is that really
> something a JOIN could be used for?
There may be some subtle differences, but most likely the 'join' form
wis like this:
select at.*
from attributes_table at,
attributes a
where at.id = a.id
and a.name='obsid'
and a.value='oid00066'
--------------
Hannu
> > Could you not rewrite this as a simple join though? > > Hmmm, I don't see how. Then again, I'm pretty much the village > idiot w.r.t. SQL... > > The inner select is locating a set of (2049) ids (actually from > the same table, since 'attributes' is just a view into > 'attributes_table'). The outer select is then locating all > records (~30-40K) that have any of those ids. Is that really > something a JOIN could be used for? This may be a question for SQL theoretists, but I don't think I've ever run across a query with a 'where in' clause that couldn't be written as a join. I think linguistically 'where in' may even be a special case of 'join'. Yet another question for the theoretists: Would it be possible to optimize a 'where in' query by rewriting it as a join? -- Mike Nolan
At 01:46 PM 7/13/03 -0700, Steve Wampler wrote: The following left join should work if I've done my select right, you might want to play with a left versus right to see which will give you a better result, but this query should help: SELECT * FROM attributes_table att LEFT JOIN attributes at ON (at.name = 'obsid' AND at.value = 'oid00066') WHERE att.id = at.id; >On Sun, Jul 13, 2003 at 08:09:17PM +0100, Richard Huxton wrote: > > > I'm not an SQL or PostgreSQL expert. > > > > > > I'm getting abysmal performance on a nested query and > > > need some help on finding ways to improve the performance: > > [snip] > > > select * from attributes_table where id in (select id from > > > attributes where (name='obsid') and (value='oid00066')); > > > > This is the classic IN problem (much improved in 7.4 dev I believe). The > > recommended approach is to rewrite the query as an EXISTS form if > > possible. See the mailing list archives for plenty of examples. > > > > Could you not rewrite this as a simple join though? > >Hmmm, I don't see how. Then again, I'm pretty much the village >idiot w.r.t. SQL... > >The inner select is locating a set of (2049) ids (actually from >the same table, since 'attributes' is just a view into >'attributes_table'). The outer select is then locating all >records (~30-40K) that have any of those ids. Is that really >something a JOIN could be used for? > >-Steve >-- >Steve Wampler -- swampler@noao.edu >Quantum materiae materietur marmota monax si marmota > monax materiam possit materiari? > >---------------------------(end of broadcast)--------------------------- >TIP 4: Don't 'kill -9' the postmaster
At 11:31 PM 7/13/03 -0300, Chris Bowlby wrote: Woops, this might not go through via the address I used :> (not subscribed with that address).. >At 01:46 PM 7/13/03 -0700, Steve Wampler wrote: > > The following left join should work if I've done my select right, you > might want to play with a left versus right to see which will give you a > better result, but this query should help: > > SELECT * FROM attributes_table att LEFT JOIN attributes at ON (at.name = > 'obsid' AND at.value = 'oid00066') WHERE att.id = at.id; > >>On Sun, Jul 13, 2003 at 08:09:17PM +0100, Richard Huxton wrote: >> > > I'm not an SQL or PostgreSQL expert. >> > > >> > > I'm getting abysmal performance on a nested query and >> > > need some help on finding ways to improve the performance: >> > [snip] >> > > select * from attributes_table where id in (select id from >> > > attributes where (name='obsid') and (value='oid00066')); >> > >> > This is the classic IN problem (much improved in 7.4 dev I believe). The >> > recommended approach is to rewrite the query as an EXISTS form if >> > possible. See the mailing list archives for plenty of examples. >> > >> > Could you not rewrite this as a simple join though? >> >>Hmmm, I don't see how. Then again, I'm pretty much the village >>idiot w.r.t. SQL... >> >>The inner select is locating a set of (2049) ids (actually from >>the same table, since 'attributes' is just a view into >>'attributes_table'). The outer select is then locating all >>records (~30-40K) that have any of those ids. Is that really >>something a JOIN could be used for? >> >>-Steve >>-- >>Steve Wampler -- swampler@noao.edu >>Quantum materiae materietur marmota monax si marmota >> monax materiam possit materiari? >> >>---------------------------(end of broadcast)--------------------------- >>TIP 4: Don't 'kill -9' the postmaster