Обсуждение: IN(subselect returning few values ...)

Поиск
Список
Период
Сортировка

IN(subselect returning few values ...)

От
Stephen Frost
Дата:
All,

* Josh Berkus (josh@agliodbs.com) wrote:
> > The plain non-VALUES list form is also significantly faster than it
> > was, but I think it will only result in a bitmap indexscan plan type.
>
> Yeah, even bitmapscans break down at 1000 values ...

In a similar vein, perhaps 8.2 fixes this but I don't recall seeing
anything where it would...

Working on 8.1 I've recently been annoyed at the need to translate a
sub-select inside an IN () clause into a fixed list of contents (the
results of the sub-select, exactly) in order to get better performance.
If the results of a sub-select are very likely to be less than 1000 (or
what have you) is there a reason not to translate that sub-select into
a VALUES list or IN (constants) set (ie: a nest-loop or a bitmap
indexscan)?

This particular case was involving 9 values from a table which only had
around 250 rows total being used to find a set of records in a much,
much bigger table (60M or so, iirc).  I dislike having to hard-code
those values in the scripts I'm writing, or hack it up to implement
getting the list and then using it as a constant.

A similar case I've seen is that when using a sub-select or similar
instead of a list of constants the 'One-Time Filter: false' doesn't
appear to ever be able to happen.  I might have overlooked something
else which doesn't something similar, but if not this ends up making a
query *much* more expensive when alot of disjoint tables are involved,
most of which don't need to be considered since they're not in the
constants list.
Thanks,
    Stephen

Re: IN(subselect returning few values ...)

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> Working on 8.1 I've recently been annoyed at the need to translate a
> sub-select inside an IN () clause into a fixed list of contents (the
> results of the sub-select, exactly) in order to get better performance.

Better performance than what?  Ever since 7.4 we've converted small IN
sub-selects into plans along the lines of

regression=# explain select * from tenk1 where unique1 in (select f1 from int4_tbl);
QUERYPLAN
 
-----------------------------------------------------------------------------------Nested Loop  (cost=1.06..31.20
rows=5width=244)  ->  HashAggregate  (cost=1.06..1.11 rows=5 width=4)        ->  Seq Scan on int4_tbl  (cost=0.00..1.05
rows=5width=4)  ->  Index Scan using tenk1_unique1 on tenk1  (cost=0.00..6.00 rows=1 width=244)        Index Cond:
(tenk1.unique1= "outer".f1)
 
(5 rows)

which looks OK to me.
        regards, tom lane


Re: IN(subselect returning few values ...)

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > Working on 8.1 I've recently been annoyed at the need to translate a
> > sub-select inside an IN () clause into a fixed list of contents (the
> > results of the sub-select, exactly) in order to get better performance.
>
> Better performance than what?  Ever since 7.4 we've converted small IN
> sub-selects into plans along the lines of

Specifically what I had been looking for a change from a HASH IN
plan w/ seq-scan on the big table to a bitmap index scan or a nested
loop index lookup (as you have below).  With the IN(constants) I had
been getting a bitmap-index scan.  I looked a bit closer though and
discovered it was thinking there would be 300+ rows returned from the
query (which would have resulted in a very much larger number of rows
being returned from the large table) instead of just 9, so I ran
analyze on the table and that seemed to fix it up (changed to a nested
loop w/ an index scan, which works nicely).

I've got autovacuum running though and that table hasn't changed in ages
so I'm a bit confused how the stats for it were so far off.  I didn't
expect to have an analyze problem on a database that has autovacuum
running on a table that hasn't changed in a very long time.  Wish I knew
how it'd been missed. :/  I'm running a database-wide analyze, though
that'll probably take a while considering it about 300G.  Makes me
wonder if autovacuum needs to periodically run a check of tables which
havn't been seen to have changed but may have in important ways which
were somehow missed, not unlike how my SAN and RAID systems run monthly
consistancy checks...

Sorry about the noise. :/
Thanks,
    Stephen