Обсуждение: Handling large number of OR/IN conditions

Поиск
Список
Период
Сортировка

Handling large number of OR/IN conditions

От
David Wall
Дата:
We have a database report function that seemed clean when the number of users was small, but as the number of users grow, I was wondering if anybody had any good ideas about how to handle OR or IN for SELECTs. 

The general scenario is that a manager runs reports that list all records that were created by users under his/her oversight.  So, when the number of users is small, we had simple queries like:

SELECT field1, field2 FROM table1 WHERE creator_user_id = 'U1' OR creator_user_id = 'U2';

But when there are thousands of users, and a manager has oversight of 100 of them, the OR construct seems out of whack when you read the query:

WHERE creator_user_id = 'U1' OR creator_user_id = 'U2' ... OR creator_user_id = 'U99' OR creator_user_id = 'U100'


I know it can be shortened with IN using something like, but don't know if it's any more/less efficient or a concern:

WHERE creator_user_id IN ('U1', 'U2', ...., 'U99', 'U100)

How do people tend to handle this sort of thing?  I suspect manager reports against their people must be pretty common.  Are there any good tricks on how to group users like this?  Unfortunately, group membership changes over time, and users may report to more than one manager and thus belong to more than one group, so we can't just have a 'creator_group_id' attribute that is set and then query against that.

Thanks,
David

Re: Handling large number of OR/IN conditions

От
Steve Atkins
Дата:
On May 1, 2009, at 10:49 AM, David Wall wrote:

> We have a database report function that seemed clean when the number
> of users was small, but as the number of users grow, I was wondering
> if anybody had any good ideas about how to handle OR or IN for
> SELECTs.
>
> The general scenario is that a manager runs reports that list all
> records that were created by users under his/her oversight.  So,
> when the number of users is small, we had simple queries like:
>
> SELECT field1, field2 FROM table1 WHERE creator_user_id = 'U1' OR
> creator_user_id = 'U2';
>
> But when there are thousands of users, and a manager has oversight
> of 100 of them, the OR construct seems out of whack when you read
> the query:
>
> WHERE creator_user_id = 'U1' OR creator_user_id = 'U2' ... OR
> creator_user_id = 'U99' OR creator_user_id = 'U100'
>
> I know it can be shortened with IN using something like, but don't
> know if it's any more/less efficient or a concern:
>
> WHERE creator_user_id IN ('U1', 'U2', ...., 'U99', 'U100)
>
> How do people tend to handle this sort of thing?  I suspect manager
> reports against their people must be pretty common.  Are there any
> good tricks on how to group users like this?  Unfortunately, group
> membership changes over time, and users may report to more than one
> manager and thus belong to more than one group, so we can't just
> have a 'creator_group_id' attribute that is set and then query
> against that.


Sounds like a job for a two column table that lists manager and report.

select table1.field1, table2.field2 from table1, reports where
table1.creator_user_id = reports.peon and reports.overlord = 'bob'

Cheers,
   Steve


Re: Handling large number of OR/IN conditions

От
"Chris Spotts"
Дата:

A separate table for managing the relationships.  One column for the manager and one for employee.

Then you end up with a query like this.

 

Select field1,field2 FROM table1 inner join relationships on table1.creator_user_id = relationships.employee WHERE relationships.manager = ?


From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of David Wall
Sent: Friday, May 01, 2009 12:49 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Handling large number of OR/IN conditions

 

We have a database report function that seemed clean when the number of users was small, but as the number of users grow, I was wondering if anybody had any good ideas about how to handle OR or IN for SELECTs. 

The general scenario is that a manager runs reports that list all records that were created by users under his/her oversight.  So, when the number of users is small, we had simple queries like:

SELECT field1, field2 FROM table1 WHERE creator_user_id = 'U1' OR creator_user_id = 'U2';

But when there are thousands of users, and a manager has oversight of 100 of them, the OR construct seems out of whack when you read the query:

WHERE creator_user_id = 'U1' OR creator_user_id = 'U2' ... OR creator_user_id = 'U99' OR creator_user_id = 'U100'


I know it can be shortened with IN using something like, but don't know if it's any more/less efficient or a concern:

WHERE creator_user_id IN ('U1', 'U2', ...., 'U99', 'U100)

How do people tend to handle this sort of thing?  I suspect manager reports against their people must be pretty common.  Are there any good tricks on how to group users like this?  Unfortunately, group membership changes over time, and users may report to more than one manager and thus belong to more than one group, so we can't just have a 'creator_group_id' attribute that is set and then query against that.

Thanks,
David

Re: Handling large number of OR/IN conditions

От
David Wall
Дата:
(quoted from Chris) 
Select field1,field2 FROM table1 inner join relationships on table1.creator_user_id = relationships.employee WHERE relationships.manager = ?

(quoted from Steve) 
select table1.field1, table2.field2 from table1, reports where table1.creator_user_id = reports.peon and reports.overlord = 'bob'

Thanks, Steve and Chris, who both suggested similar things. 

I believe that will work for me in some situations, but currently the report allows the manager to select any combination (from one to all) of users that report to him.  So one solution would be to run the report as you have it and then have the application filter it, but that could be painful if the users "de-selected" from his full team happen to have a large percentage of the matching rows.  Of course, I may see if this is something they really do much of.  I mean, perhaps if they select individual users (just a few), I can just use the OR/IN style, and if they select "all my users" I could rely on the table joins.

Does anybody know if PG will perform better with the table join instead of evaluating the series of OR/IN?  The OR/IN has to be parsed, but the comparisons may be faster than the table join.

Many thanks for your help....

David

Re: Handling large number of OR/IN conditions

От
Steve Atkins
Дата:
On May 1, 2009, at 2:42 PM, David Wall wrote:

> (quoted from Chris)
> Select field1,field2 FROM table1 inner join relationships on
> table1.creator_user_id = relationships.employee WHERE
> relationships.manager = ?
>
> (quoted from Steve)
>> select table1.field1, table2.field2 from table1, reports where
>> table1.creator_user_id = reports.peon and reports.overlord = 'bob'
>
> Thanks, Steve and Chris, who both suggested similar things.
>
> I believe that will work for me in some situations, but currently
> the report allows the manager to select any combination (from one to
> all) of users that report to him.  So one solution would be to run
> the report as you have it and then have the application filter it,
> but that could be painful if the users "de-selected" from his full
> team happen to have a large percentage of the matching rows.  Of
> course, I may see if this is something they really do much of.  I
> mean, perhaps if they select individual users (just a few), I can
> just use the OR/IN style, and if they select "all my users" I could
> rely on the table joins.
>
> Does anybody know if PG will perform better with the table join
> instead of evaluating the series of OR/IN?  The OR/IN has to be
> parsed, but the comparisons may be faster than the table join.

It used to be that populating and then joining with a temporary table
was faster than using IN (1, 3, 5, 7, 9, 11, 13) for all but the
smallest sets. That's no longer true, and IN() is pretty good.

I'd still use a temporary table myself, though. It's cleaner and
easier to populate one than to cleanly produce a statement with a
variable number of identifiers in it. And you can reuse it for
multiple reports, join against it different ways and so on. Also you
can populate it either from your UI or by selecting from the
relationships table suggested above (create temporary table foo as
select peon from reports where overlord in ('bob', 'ben', 'jerry) ),
and still run the same reports against it.

Cheers,
   Steve


Re: Handling large number of OR/IN conditions

От
Tom Lane
Дата:
Steve Atkins <steve@blighty.com> writes:
> On May 1, 2009, at 2:42 PM, David Wall wrote:
>> Does anybody know if PG will perform better with the table join
>> instead of evaluating the series of OR/IN?  The OR/IN has to be
>> parsed, but the comparisons may be faster than the table join.

> It used to be that populating and then joining with a temporary table
> was faster than using IN (1, 3, 5, 7, 9, 11, 13) for all but the
> smallest sets. That's no longer true, and IN() is pretty good.

> I'd still use a temporary table myself, though. It's cleaner and
> easier to populate one than to cleanly produce a statement with a
> variable number of identifiers in it. And you can reuse it for
> multiple reports, join against it different ways and so on. Also you
> can populate it either from your UI or by selecting from the
> relationships table suggested above (create temporary table foo as
> select peon from reports where overlord in ('bob', 'ben', 'jerry) ),
> and still run the same reports against it.

Possibly worth noting: if you're depending on the quality of join plans
involving such a table, it's worth doing an ANALYZE against it after
you populate it.  (Autovacuum won't do that for you, because it can't
access temp tables.)

            regards, tom lane