Обсуждение: subquery returning array

Поиск
Список
Период
Сортировка

subquery returning array

От
Zeljko Vrba
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi! I have posted this as a user comment in section 9.17:

- ----
= ANY(array expression) doesn't work in the obvious way when the array
expression is a subselect. For example:

select * from stat3 where stat3.id = any ('{4,5,6,7}');

works (and returns the expected tuples). However,

select * from stat3 where stat3.id = any (select stat3 from
helix_request where id=11);

DOESN'T work and complains with an error: operator does not exist:
integer = integer[]. The inner select returns EXACTLY ONE value, namely
the same array as the literal in the first example.

The solution is:

select * from stat3 where (select stat3.id = any (stat3) from
helix_request where id=11);

I thank to Chris Kings-Lynne ("KL") for helping me out with this over IRC.
- ----

KL suggested to mail this question to the hackers list. The problem with
this solution is that postgresql uses sequential scan for the proposed
solution:
Seq Scan on stat3  (cost=0.00..40018.94 rows=3321 width=32) (actual
time=0.112..75.911 rows=4 loops=1)  Filter: (subplan)  SubPlan    ->  Index Scan using helix_request_pkey on
helix_request
(cost=0.00..6.01 rows=1 width=32) (actual time=0.007..0.008 rows=1
loops=6756)          Index Cond: (id = 11)Total runtime: 76.040 ms
(6 rows)

If I write select * from stat3 where id in (4,5,6,7) and index scan is used:Index Scan using stat3_pkey, stat3_pkey,
stat3_pkey,stat3_pkey on
 
stat3  (cost=0.00..23.94 rows=4 width=32) (actual time=0.066..0.091
rows=4 loops=1)  Index Cond: ((id = 4) OR (id = 5) OR (id = 6) OR (id = 7))Total runtime: 0.164 ms

I'm going to have MANY queries of this kind and having sequential scan
for each such query will lead to quadratic performance - unacceptable
for the amount of data I will have (millions of rows).

Is there yet another way of making WHERE field = ANY (subselect
returning an array) work? Or make postgres to use index?

OK, I know the suggestion from the manual: usually it's bad database
design searching through arrays and a separate table is better. however,
to convert {4,5,6,7} into a table would require a table with two
columns, e.g. (1,4), (1,5), (1,6), (1,7) where the first column would be
used to identify elements in the single array). this additional column
is a waste of space.

KL has mentioned a package for converting an array into a tuple. Where
can I find it? Also, what is the limit on the number of elements in the
IN (...) condition before the database resorts to sequential scan?

PS: Please reply also via e-mail as I'm not a regular subscriber of this
list.

Thanks.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.7 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iQCVAwUBQpF0U2nN2B8CwPRmAQKN8gP+JCzLiX5b48kMYmHRwTSFZWN5Jydfw0iH
MABuYj2mKCY9Dgmd4pLK8Xlxhf/tEYzd3N2lcPFYf1vIXCSpbFasRrO3hJ4WjRLr
MZ6MLXCn59Y8wtd8Iz8uug47XuYVGSreZESKA9tRfl+u8t16nPC9nEMyDNDFIRI9
ZOnBMCnQzrY=
=5qmB
-----END PGP SIGNATURE-----


Re: subquery returning array

От
Richard Huxton
Дата:
Zeljko Vrba wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi! I have posted this as a user comment in section 9.17:
> 
> - ----
> = ANY(array expression) doesn't work in the obvious way when the array
> expression is a subselect. For example:
> 
> select * from stat3 where stat3.id = any ('{4,5,6,7}');
> 
> works (and returns the expected tuples). However,
> 
> select * from stat3 where stat3.id = any (select stat3 from
> helix_request where id=11);
> 
> DOESN'T work and complains with an error: operator does not exist:
> integer = integer[]. The inner select returns EXACTLY ONE value, namely
> the same array as the literal in the first example.

No, it doesn't - hence the error.

What it returns is a SET containing ONE TUPLE which contains ONE ARRAY. 
Here the "any" ranges over the SET not the array, so it tries to compare 
the outer integer with the inner array of integers.

> The solution is:
> 
> select * from stat3 where (select stat3.id = any (stat3) from
> helix_request where id=11);
> 
> I thank to Chris Kings-Lynne ("KL") for helping me out with this over IRC.
> - ----
> 
> KL suggested to mail this question to the hackers list. The problem with
> this solution is that postgresql uses sequential scan for the proposed
> solution:
[snip]
> I'm going to have MANY queries of this kind and having sequential scan
> for each such query will lead to quadratic performance - unacceptable
> for the amount of data I will have (millions of rows).
> 
> Is there yet another way of making WHERE field = ANY (subselect
> returning an array) work? Or make postgres to use index?
> 
> OK, I know the suggestion from the manual: usually it's bad database
> design searching through arrays and a separate table is better. however,
> to convert {4,5,6,7} into a table would require a table with two
> columns, e.g. (1,4), (1,5), (1,6), (1,7) where the first column would be
> used to identify elements in the single array). this additional column
> is a waste of space.

You're trading space for time, and deliberately spoiling your design to 
do so. If you're going to want to access each value separately, design 
your system to reflect that fact.

Have you actually tested your example with a proper two-column table? It 
might be that weaknesses in PostgreSQL force you to compromise your 
design, but I wouldn't start from that point. Try a proper relational 
design, with a bit of tuning and see if that will work for you.

If you really want to use arrays, perhaps see if the contrib/intarray 
module can help you.

--   Richard Huxton  Archonet Ltd


Re: subquery returning array

От
Greg Stark
Дата:
Zeljko Vrba <zvrba@ifi.uio.no> writes:

> Is there yet another way of making WHERE field = ANY (subselect
> returning an array) work? Or make postgres to use index?

You could use the int_array_enum() function from the contrib/int_agg module.

> Also, what is the limit on the number of elements in the IN (...) condition
> before the database resorts to sequential scan?

It depends on the table. If, for example, there's little free space in your
table and the records are very narrow then sequential scans will be especially
efficient since each i/o will read in many records. An index scan of any
significant size would likely have to read nearly every page in multiple
times.

The problem in your case is that Postgres has no idea how large an array it's
going to find when it's doing the planning. I'm not sure what the fallback
logic is, apparently it's assuming a fairly low selectivity for the =ANY
constraint.

Actually I think there was some discussion a while back about making =ANY a
little less pessimistic about the selectivity. Perhaps 8.1 will be better for
this. If you're still in development with a long timeframe you could try a CVS
build to see if that's the case.


-- 
greg