Обсуждение: Multiple IN

Поиск
Список
Период
Сортировка

Multiple IN

От
Leandro Fanzone
Дата:
Maybe this is a silly question. I have a table t with two fields f1 and
f2. Those fields together form the primary key. I want to select a list
of pairs from that table. The simplest SQL clause would be

SELECT * FROM t WHERE (f1 = v1 AND f2 = v2) OR (f1 = v3 AND f2 = v4)
...;

When that list is long, on the one hand I suppose it won't have a good
performance; on the other, the resultant clause is clumsy and too long.
Is there any other (more elegant) solution for this? If the field would
be just one, a simple IN would do better, but as far as I know there is
no IN for multiple fields.
Thank you in advance.

Leandro Fanzone


Re: Multiple IN

От
Bruno Wolff III
Дата:
On Tue, Dec 11, 2001 at 12:43:08PM -0300,
  Leandro Fanzone <leandro@hasar.com> wrote:
>
> When that list is long, on the one hand I suppose it won't have a good
> performance; on the other, the resultant clause is clumsy and too long.
> Is there any other (more elegant) solution for this? If the field would
> be just one, a simple IN would do better, but as far as I know there is
> no IN for multiple fields.

I am not sure if this is a new feature or just more complete documenation,
but you might want to take a look at this from the 7.2 docs:
http://developer.postgresql.org/docs/postgres/functions-subquery.html

Re: Multiple IN

От
"Josh Berkus"
Дата:
Leandro,

> Maybe this is a silly question. I have a table t with two fields f1
> and
> f2. Those fields together form the primary key. I want to select a
> list
> of pairs from that table. The simplest SQL clause would be
>
> SELECT * FROM t WHERE (f1 = v1 AND f2 = v2) OR (f1 = v3 AND f2 = v4)
> ...;

We'll need database schema (table designs) if you want us to help you.
The above isn't enough information to go on.  For example, where did
"v1" come from?  Is this a field?  An interface language variable?

Please be explicit!  Also, consider picking up Joe Celko's SQL for
Smarties, reviewed at http://techdocs.postgresql.org/bookreviews.php

-Josh

______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

Re: Multiple IN

От
Leandro Fanzone
Дата:
I'm sorry for not being precise. The table is simple enough: it has only
those two integer fields (f1, f2) and a third text field f3 that has the
value I'm looking for. Fields f1 and f2 together are the primary key. An
example:

f1 f2 f3
1  1  'a'
1  2  'b'
2  1  'c'
2  4  'd'
..  .. ...
8  5  'z'

etc. Now I have a list of integer pairs to match f1 and f2 from that table;
let's say I have those values in a vector. For example:

{ (1, 2), (2, 1), (8, 5) }

I want the corresponding return values for f3:

{ b, c, z }

The problem is that I have to match the pair, so my SQL again would be

SELECT f3 FROM t
WHERE (f1 = 1 AND f2 = 2) OR (f1 = 2 AND f2 = 1) OR (f1 = 8 OR f2 = 5);

for the former example. I can guarantee that all the values I put in the OR
list exist in the table, and I want to know the f3 of each one, exactly. The
values from the list are arbitrary or random, and are not coming from a
SELECTion. I want to avoid the AND/OR list, because the vector could be very
long.
Thank you again for your help.

Leandro Fanzone.

Josh Berkus wrote:

> Leandro,
>
> > Maybe this is a silly question. I have a table t with two fields f1
> > and
> > f2. Those fields together form the primary key. I want to select a
> > list
> > of pairs from that table. The simplest SQL clause would be
> >
> > SELECT * FROM t WHERE (f1 = v1 AND f2 = v2) OR (f1 = v3 AND f2 = v4)
> > ...;
>
> We'll need database schema (table designs) if you want us to help you.
> The above isn't enough information to go on.  For example, where did
> "v1" come from?  Is this a field?  An interface language variable?
>
> Please be explicit!  Also, consider picking up Joe Celko's SQL for
> Smarties, reviewed at http://techdocs.postgresql.org/bookreviews.php
>
> -Josh
>
> ______AGLIO DATABASE SOLUTIONS___________________________
>                                        Josh Berkus
>   Complete information technology      josh@agliodbs.com
>    and data management solutions       (415) 565-7293
>   for law firms, small businesses        fax 621-2533
>     and non-profit organizations.      San Francisco
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


Re: Multiple IN

От
Tom Lane
Дата:
Leandro Fanzone <leandro@hasar.com> writes:
> The problem is that I have to match the pair, so my SQL again would be
> SELECT f3 FROM t
> WHERE (f1 = 1 AND f2 = 2) OR (f1 = 2 AND f2 = 1) OR (f1 = 8 OR f2 = 5);
> for the former example. I can guarantee that all the values I put in the OR
> list exist in the table, and I want to know the f3 of each one, exactly. The
> values from the list are arbitrary or random, and are not coming from a
> SELECTion. I want to avoid the AND/OR list, because the vector could be very
> long.

Actually, that is the standard way of doing it, and the performance is
likely to be less bad than you think.  If f1/f2 are indexed (which they
would be, since they're the primary key) you should get a plan like
this:

regression=# create table foo (f1 int, f2 int, primary key (f1,f2));
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index 'foo_pkey' for table 'foo'
CREATE
regression=# explain
regression-# SELECT * FROM foo WHERE
regression-# (f1 = 1 and f2 = 2) or
regression-# (f1 = 3 and f2 = 4) or
regression-# (f1 = 1 and f2 = 4);
NOTICE:  QUERY PLAN:

Index Scan using foo_pkey, foo_pkey, foo_pkey on foo  (cost=0.00..14.51 rows=1 width=8)

EXPLAIN

The multiple index scan represents three successive index probes using
the three OR'd parts of the WHERE condition.

You will eventually see it switch over to a seqscan if you get up into
many hundreds or thousands of OR'd conditions, but at that point I'd suggest
that you need to stick the probe values into a temp table and do a join.

            regards, tom lane

Re: Multiple IN

От
"Josh Berkus"
Дата:
Leandro,

There are several ways you can do your 2-column comparison.  as Tom
pointed out, there is nothing wrong with the query as you suggested it.
Let me outline a few of the options:

1. Use your interface language (what are you using?) to generate an
ad-hoc query along the line you already suggested, using a loop in the
interface language to generate the query.
SELECT tablea.text FROM tablea WHERE (f1=v1 and f2=v2) OR (f1=v3 and
f2=v4) OR ...
This query will work fine, with the caveat that a few interface
technologies (Microsoft ODBC for one) will restrict the total length of
your query, so test for that.  Certainly the above query formation is
likely to be the fastest to execute on large tables (> 10000 rows), as
it involves no iterative loops or subqueries.

2. Load all the values into a "lookups" temporary table from the
interface.  Then match against that:
CREATE TEMPORARY TABLE lookups (
    v1 INT, v2 INT );
interface language loop:
INSERT INTO lookups VALUES ( $v1, $v2 )
end loop;
SELECT tablea.text
FROM tablea JOIN lookups ON tablea.f1=lookups.v1 and
tablea.f2=lookups.v2
ORDER BY tablea.text;
This has the advantage of requiring less dynamic query generation from
you.   It also would probably be a better approach if you are going to
display the results in sets (SELECT ... LIMIT 25 OFFSET 50;).  However,
on a straight query, you are likely to get much slower results from this
method; it requires 4 to dozens of database commands instead of one, and
temporary tables cannot be effectively indexed.  There are also some
headaches associated with temporary table management.

3. You could create a PL/pgSQL function which would accept two arrays of
values and return true as soon as it found a match:
CREATE FUNCTION match_dual_array ( INT, INT, INT[], INT[]) RETURNS
BOOLEAN
AS ' ... compare each f1 + f2 pair against each v1[] + v2[] array.
Return True if any match, and false if no match.
Then:
SELECT tablea.text FROM tablea
WHERE match_dual_array(f1, f2, v1[], v2[]);
While this function approach has a certain amount of elegance about it
(much simpler queries, for one thing), performance-wise, it's likely to
be the slowest of the three approaches on very large datasets due to the
necessity of procedurally looping through the array for each row in
tablea.  If, however, you have relatively few rows in tablea (a few
hundred) this might be the approach for you.

FURTHER READING:
Joe Celko's "SQL for Smarties" see:
http://techdocs.postgresql.org/bookreviews.php

Roberto Mello's and others' contributions to PL/pgSQL documentation and
sample functions:
http://techdocs.postgresql.org/

Have fun!

-Josh

______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

Re: Multiple IN

От
Leandro Fanzone
Дата:
Well, thank you very much for all your help. I was certainly worried for the
OR list, but I see there's no fundamental problem with it. I think I stick
with the point one of your list. I want it to be as portable as possible, so
that discards PL/pgSQL, and as you said, temporary tables for point two are
tricky. I'm working with the libpq++ interface, so there's no problem on
looping the vector to construct dynamically the query.
Thanks again!

Leandro Fanzone

Josh Berkus wrote:

> Leandro,
>
> There are several ways you can do your 2-column comparison.  as Tom
> pointed out, there is nothing wrong with the query as you suggested it.
> Let me outline a few of the options:
>
> 1. Use your interface language (what are you using?) to generate an
> ad-hoc query along the line you already suggested, using a loop in the
> interface language to generate the query.
> SELECT tablea.text FROM tablea WHERE (f1=v1 and f2=v2) OR (f1=v3 and
> f2=v4) OR ...
> This query will work fine, with the caveat that a few interface
> technologies (Microsoft ODBC for one) will restrict the total length of
> your query, so test for that.  Certainly the above query formation is
> likely to be the fastest to execute on large tables (> 10000 rows), as
> it involves no iterative loops or subqueries.
>
> 2. Load all the values into a "lookups" temporary table from the
> interface.  Then match against that:
> CREATE TEMPORARY TABLE lookups (
>         v1 INT, v2 INT );
> interface language loop:
> INSERT INTO lookups VALUES ( $v1, $v2 )
> end loop;
> SELECT tablea.text
> FROM tablea JOIN lookups ON tablea.f1=lookups.v1 and
> tablea.f2=lookups.v2
> ORDER BY tablea.text;
> This has the advantage of requiring less dynamic query generation from
> you.   It also would probably be a better approach if you are going to
> display the results in sets (SELECT ... LIMIT 25 OFFSET 50;).  However,
> on a straight query, you are likely to get much slower results from this
> method; it requires 4 to dozens of database commands instead of one, and
> temporary tables cannot be effectively indexed.  There are also some
> headaches associated with temporary table management.
>
> 3. You could create a PL/pgSQL function which would accept two arrays of
> values and return true as soon as it found a match:
> CREATE FUNCTION match_dual_array ( INT, INT, INT[], INT[]) RETURNS
> BOOLEAN
> AS ' ... compare each f1 + f2 pair against each v1[] + v2[] array.
> Return True if any match, and false if no match.
> Then:
> SELECT tablea.text FROM tablea
> WHERE match_dual_array(f1, f2, v1[], v2[]);
> While this function approach has a certain amount of elegance about it
> (much simpler queries, for one thing), performance-wise, it's likely to
> be the slowest of the three approaches on very large datasets due to the
> necessity of procedurally looping through the array for each row in
> tablea.  If, however, you have relatively few rows in tablea (a few
> hundred) this might be the approach for you.
>
> FURTHER READING:
> Joe Celko's "SQL for Smarties" see:
> http://techdocs.postgresql.org/bookreviews.php
>
> Roberto Mello's and others' contributions to PL/pgSQL documentation and
> sample functions:
> http://techdocs.postgresql.org/
>
> Have fun!
>
> -Josh
>
> ______AGLIO DATABASE SOLUTIONS___________________________
>                                        Josh Berkus
>   Complete information technology      josh@agliodbs.com
>    and data management solutions       (415) 565-7293
>   for law firms, small businesses        fax 621-2533
>     and non-profit organizations.      San Francisco