Обсуждение: Improving a simple query?

От:
Steve Wampler
Дата:

I'm not an SQL or PostgreSQL expert.

I'm getting abysmal performance on a nested query and
need some help on finding ways to improve the performance:

Background:
              RH 8.0 dual-CPU machine (1.2GHz athlon)
              Postgresql 7.2
              1GB ram
              (Machine is dedicated to postgres, so there's
              not much else running.)

The table has ~500K rows.

Table definition:

     lab.devel.configdb=# \d attributes_table
                  Table "attributes_table"
      Column |           Type           |   Modifiers
     --------+--------------------------+---------------
      id     | character varying(64)    | not null
      name   | character varying(64)    | not null
      units  | character varying(32)    |
      value  | text                     |
      time   | timestamp with time zone | default now()
     Indexes: id_index,
              name_index
     Primary key: attributes_table_pkey
     Triggers: trigger_insert

View definition:
     lab.devel.configdb=# \d attributes;
                  View "attributes"
      Column |         Type          | Modifiers
     --------+-----------------------+-----------
      id     | character varying(64) |
      name   | character varying(64) |
      units  | character varying(32) |
      value  | text                  |
     View definition: SELECT attributes_table.id,
                 attributes_table.name, attributes_table.units,
                 attributes_table.value FROM attributes_table;

Query:

 select * from attributes_table where id in (select id from
      attributes where (name='obsid') and (value='oid00066'));

Now, the inner SELECT is fast:
  lab.devel.configdb=# explain analyze select id from attributes
      where (name='obsid') and (value='oid00066');
  NOTICE:  QUERY PLAN:

  Index Scan using name_index on attributes_table
     (cost=0.00..18187.48 rows=15 width=25)
     (actual time=0.33..238.06 rows=2049 loops=1)
  Total runtime: 239.28 msec

  EXPLAIN

But the outer SELECT insists on using a sequential scan [it should
pick up about 20K-40K rows (normally, access is through a
script].

How slow?  Slow enough that:

  explain analyze select * from attributes_table where id in
    (select id from attributes where (name='obsid') and
                                          (value='oid00066'));

hasn't completed in the last 15 minutes.

Removing the analyze gives:

lab.devel.configdb=# explain select * from attributes_table where
     id in (select id from attributes where (name='obsid') and
                                                  (value='oid00066'));
  NOTICE:  QUERY PLAN:

  Seq Scan on attributes_table
  (cost=100000000.00..8873688920.07   rows=241201 width=59)
    SubPlan
      ->  Materialize  (cost=18187.48..18187.48 rows=15 width=25)
            ->  Index Scan using name_index on attributes_table
                (cost=0.00..18187.48 rows=15 width=25)

  EXPLAIN

Obviously, something is forcing the outer select into a
sequential scan, which is what I assume is the bottleneck
(see above about lack of expert-ness...).

I've played with the settings in postgresql.conf, using
the on-line performance tuning guide:

  shared_buffers = 8192            # 2*max_connections, min 16
  max_fsm_relations = 1000         # min 10, fsm is free space map
  max_fsm_pages = 10000            # min 1000, fsm is free space map
  max_locks_per_transaction = 128 # min 10
  wal_buffers = 64                  # min 4
  sort_mem = 128                    # min 32
  vacuum_mem = 4096                # min 1024
  wal_files = 32 # range 0-64 (default was 0)
  effective_cache_size = 96000  # default in 8k pages
  random_page_cost = 3

but haven't noticed an significant change with these settings
over more conservative settings.

Any suggestions?  Is there a better way to phrase the query
that would provide order-of-magnitude improvement?

Thanks!
Steve

--
Steve Wampler -- 
Quantum materiae materietur marmota monax si marmota
                    monax materiam possit materiari?

От:
"Richard Huxton"
Дата:

> I'm not an SQL or PostgreSQL expert.
>
> I'm getting abysmal performance on a nested query and
> need some help on finding ways to improve the performance:
[snip]
>  select * from attributes_table where id in (select id from
>       attributes where (name='obsid') and (value='oid00066'));

This is the classic IN problem (much improved in 7.4 dev I believe). The
recommended approach is to rewrite the query as an EXISTS form if
possible. See the mailing list archives for plenty of examples.

Could you not rewrite this as a simple join though?

- Richard

От:
nolan@celery.tssi.com
Дата:

>  select * from attributes_table where id in (select id from
>       attributes where (name='obsid') and (value='oid00066'));

Can you convert it into a join?  'where in' clauses tend to slow pgsql
down.
--
Mike Nolan

От:
Steve Wampler
Дата:

On Sun, Jul 13, 2003 at 08:09:17PM +0100, Richard Huxton wrote:
> > I'm not an SQL or PostgreSQL expert.
> >
> > I'm getting abysmal performance on a nested query and
> > need some help on finding ways to improve the performance:
> [snip]
> >  select * from attributes_table where id in (select id from
> >       attributes where (name='obsid') and (value='oid00066'));
>
> This is the classic IN problem (much improved in 7.4 dev I believe). The
> recommended approach is to rewrite the query as an EXISTS form if
> possible. See the mailing list archives for plenty of examples.
>
> Could you not rewrite this as a simple join though?

Hmmm, I don't see how.  Then again, I'm pretty much the village
idiot w.r.t. SQL...

The inner select is locating a set of (2049) ids (actually from
the same table, since 'attributes' is just a view into
'attributes_table').  The outer select is then locating all
records (~30-40K) that have any of those ids.  Is that really
something a JOIN could be used for?

-Steve
--
Steve Wampler -- 
Quantum materiae materietur marmota monax si marmota
                    monax materiam possit materiari?

От:
Hannu Krosing
Дата:

Steve Wampler kirjutas P, 13.07.2003 kell 23:46:
> On Sun, Jul 13, 2003 at 08:09:17PM +0100, Richard Huxton wrote:
> > > I'm not an SQL or PostgreSQL expert.
> > >
> > > I'm getting abysmal performance on a nested query and
> > > need some help on finding ways to improve the performance:
> > [snip]
> > >  select * from attributes_table where id in (select id from
> > >       attributes where (name='obsid') and (value='oid00066'));
> >
> > This is the classic IN problem (much improved in 7.4 dev I believe). The
> > recommended approach is to rewrite the query as an EXISTS form if
> > possible. See the mailing list archives for plenty of examples.
> >
> > Could you not rewrite this as a simple join though?
>
> Hmmm, I don't see how.  Then again, I'm pretty much the village
> idiot w.r.t. SQL...
>
> The inner select is locating a set of (2049) ids (actually from
> the same table, since 'attributes' is just a view into
> 'attributes_table').  The outer select is then locating all
> records (~30-40K) that have any of those ids.  Is that really
> something a JOIN could be used for?

There may be some subtle differences, but most likely the 'join' form
wis like this:

select at.*
  from attributes_table at,
       attributes a
 where at.id = a.id
   and a.name='obsid'
   and a.value='oid00066'

--------------
Hannu


От:
nolan@celery.tssi.com
Дата:

> > Could you not rewrite this as a simple join though?
>
> Hmmm, I don't see how.  Then again, I'm pretty much the village
> idiot w.r.t. SQL...
>
> The inner select is locating a set of (2049) ids (actually from
> the same table, since 'attributes' is just a view into
> 'attributes_table').  The outer select is then locating all
> records (~30-40K) that have any of those ids.  Is that really
> something a JOIN could be used for?

This may be a question for SQL theoretists, but I don't think I've ever
run across a query with a 'where in' clause that couldn't be written
as a join.  I think linguistically 'where in' may even be a special
case of 'join'.

Yet another question for the theoretists:  Would it be possible to optimize
a 'where in' query by rewriting it as a join?
--
Mike Nolan

От:
Chris Bowlby
Дата:

At 01:46 PM 7/13/03 -0700, Steve Wampler wrote:

  The following left join should work if I've done my select right, you
might want to play with a left versus right to see which will give you a
better result, but this query should help:

  SELECT * FROM attributes_table att LEFT JOIN attributes at ON (at.name =
'obsid' AND at.value = 'oid00066') WHERE att.id = at.id;

>On Sun, Jul 13, 2003 at 08:09:17PM +0100, Richard Huxton wrote:
> > > I'm not an SQL or PostgreSQL expert.
> > >
> > > I'm getting abysmal performance on a nested query and
> > > need some help on finding ways to improve the performance:
> > [snip]
> > >  select * from attributes_table where id in (select id from
> > >       attributes where (name='obsid') and (value='oid00066'));
> >
> > This is the classic IN problem (much improved in 7.4 dev I believe). The
> > recommended approach is to rewrite the query as an EXISTS form if
> > possible. See the mailing list archives for plenty of examples.
> >
> > Could you not rewrite this as a simple join though?
>
>Hmmm, I don't see how.  Then again, I'm pretty much the village
>idiot w.r.t. SQL...
>
>The inner select is locating a set of (2049) ids (actually from
>the same table, since 'attributes' is just a view into
>'attributes_table').  The outer select is then locating all
>records (~30-40K) that have any of those ids.  Is that really
>something a JOIN could be used for?
>
>-Steve
>--
>Steve Wampler -- 
>Quantum materiae materietur marmota monax si marmota
>                     monax materiam possit materiari?
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster


От:
Chris Bowlby
Дата:

At 11:31 PM 7/13/03 -0300, Chris Bowlby wrote:

  Woops, this might not go through via the address I used :> (not
subscribed with that address)..

>At 01:46 PM 7/13/03 -0700, Steve Wampler wrote:
>
>  The following left join should work if I've done my select right, you
> might want to play with a left versus right to see which will give you a
> better result, but this query should help:
>
>  SELECT * FROM attributes_table att LEFT JOIN attributes at ON (at.name =
> 'obsid' AND at.value = 'oid00066') WHERE att.id = at.id;
>
>>On Sun, Jul 13, 2003 at 08:09:17PM +0100, Richard Huxton wrote:
>> > > I'm not an SQL or PostgreSQL expert.
>> > >
>> > > I'm getting abysmal performance on a nested query and
>> > > need some help on finding ways to improve the performance:
>> > [snip]
>> > >  select * from attributes_table where id in (select id from
>> > >       attributes where (name='obsid') and (value='oid00066'));
>> >
>> > This is the classic IN problem (much improved in 7.4 dev I believe). The
>> > recommended approach is to rewrite the query as an EXISTS form if
>> > possible. See the mailing list archives for plenty of examples.
>> >
>> > Could you not rewrite this as a simple join though?
>>
>>Hmmm, I don't see how.  Then again, I'm pretty much the village
>>idiot w.r.t. SQL...
>>
>>The inner select is locating a set of (2049) ids (actually from
>>the same table, since 'attributes' is just a view into
>>'attributes_table').  The outer select is then locating all
>>records (~30-40K) that have any of those ids.  Is that really
>>something a JOIN could be used for?
>>
>>-Steve
>>--
>>Steve Wampler -- 
>>Quantum materiae materietur marmota monax si marmota
>>                     monax materiam possit materiari?
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 4: Don't 'kill -9' the postmaster