Обсуждение: Query Problem

Поиск
Список
Период
Сортировка

Query Problem

От
Josh Berkus
Дата:
Folks:

Here's the problem, in abstract:  I need to select every record in table
A that does not have a link in table B Join Table C where Table
C.account = 11

The relevant fields:

Table_A
CaseID
Data

Table_B
GroupID
CaseID
Amount

Table_C
GroupID
AccountID

Thus, I need to select:

SELECT Data FROM Table A 
WHERE CaseID NOT IN (SELECT CaseID FROM Table_B, Table_CWHERE Table_B.GroupID = TableC.GroupID    AND TableC.AccountID
=11)
 

The problem is, since Table_B and Table_C are large (10,000 records +)
this exclusion query takes several *minutes* to run.

I've fooled around with drectional joins, views, and temporary tables,
but I can seem to find anything that works faster.  Suggestions?

-Josh Berkus

-- 
______AGLIO DATABASE SOLUTIONS___________________________                                       Josh Berkus  Complete
informationtechnology      josh@agliodbs.com   and data management solutions       (415) 436-9166  for law firms, small
businesses      fax  436-0137   and non-profit organizations.       pager 338-4078                               San
Francisco


Re: Query Problem

От
Tomas Berndtsson
Дата:
Josh Berkus <josh@agliodbs.com> writes:

> Folks:
> 
> Here's the problem, in abstract:  I need to select every record in table
> A that does not have a link in table B Join Table C where Table
> C.account = 11
> 
> The relevant fields:
> 
> Table_A
> CaseID
> Data
> 
> Table_B
> GroupID
> CaseID
> Amount
> 
> Table_C
> GroupID
> AccountID
> 
> Thus, I need to select:
> 
> SELECT Data FROM Table A 
> WHERE CaseID NOT IN (
>     SELECT CaseID FROM Table_B, Table_C
>     WHERE Table_B.GroupID = TableC.GroupID
>         AND TableC.AccountID = 11)
> 
> The problem is, since Table_B and Table_C are large (10,000 records +)
> this exclusion query takes several *minutes* to run.
> 
> I've fooled around with drectional joins, views, and temporary tables,
> but I can seem to find anything that works faster.  Suggestions?

Without having tried, something like this might be faster:

SELECT Data FROM Table A 
WHERE CaseID NOT IN (SELECT CaseID FROM Table_BWHERE Table_B.GroupID IN (               SELECT GroupID FROM Table_C
WHERETableC.AccountID = 11))
 


Tomas


Re: Query Problem

От
Stephan Szabo
Дата:
What is the explain output for the queries you've tried?

Stephan Szabo
sszabo@bigpanda.com

On Wed, 25 Oct 2000, Josh Berkus wrote:

> 
> Folks:
> 
> Here's the problem, in abstract:  I need to select every record in table
> A that does not have a link in table B Join Table C where Table
> C.account = 11
> 
> The relevant fields:
> 
> Table_A
> CaseID
> Data
> 
> Table_B
> GroupID
> CaseID
> Amount
> 
> Table_C
> GroupID
> AccountID
> 
> Thus, I need to select:
> 
> SELECT Data FROM Table A 
> WHERE CaseID NOT IN (
>     SELECT CaseID FROM Table_B, Table_C
>     WHERE Table_B.GroupID = TableC.GroupID
>         AND TableC.AccountID = 11)
> 
> The problem is, since Table_B and Table_C are large (10,000 records +)
> this exclusion query takes several *minutes* to run.
> 
> I've fooled around with drectional joins, views, and temporary tables,
> but I can seem to find anything that works faster.  Suggestions?
> 
> -Josh Berkus
> 
> -- 
> ______AGLIO DATABASE SOLUTIONS___________________________
>                                         Josh Berkus
>    Complete information technology      josh@agliodbs.com
>     and data management solutions       (415) 436-9166
>    for law firms, small businesses       fax  436-0137
>     and non-profit organizations.       pager 338-4078
>                                 San Francisco
> 



Re: Query Problem

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> Thus, I need to select:

> SELECT Data FROM Table A 
> WHERE CaseID NOT IN (
>     SELECT CaseID FROM Table_B, Table_C
>     WHERE Table_B.GroupID = TableC.GroupID
>         AND TableC.AccountID = 11)

> The problem is, since Table_B and Table_C are large (10,000 records +)
> this exclusion query takes several *minutes* to run.

I don't think there is any good way to make this fast in current
sources.  A partial workaround is to use a temp table:

SELECT CaseID INTO TEMP TABLE mycaseids FROM Table_B, Table_CWHERE Table_B.GroupID = TableC.GroupID    AND
TableC.AccountID= 11;
 

CREATE INDEX mycaseids_idx ON mycaseids(caseid);  -- critical!

SELECT Data FROM TableA upper
WHERE NOT EXISTS (select 1 from mycaseids where caseid = upper.caseid);

You'd need to check with EXPLAIN, but the EXISTS subplan should make
use of the index to probe the temp table, so you get one index lookup
per outer tuple.  Better than a complete scan of the subselect outputs,
which is what you'll get with the NOT IN style.


In 7.1 it'll be possible to do this with an outer join, which should
be a lot quicker:

SELECT Data FROM TableA LEFT JOIN
(SELECT CaseID FROM Table_B, Table_CWHERE Table_B.GroupID = TableC.GroupIDAND TableC.AccountID = 11) subselect
ON (tablea.caseid = subselect.caseid)
WHERE subselect.caseid IS NULL;

ie, do the outer join and then discard the successfully-matched rows.


Further down the pike, we have plans to make the system smart enough to
transform IN and NOT IN constructs into join-like queries automatically.
Right now, though, they're best rewritten into something else when
performance is important.
        regards, tom lane


Re: Query Problem

От
Josh Berkus
Дата:
Michael,

> SELECT Data
> FROM Table A
> WHERE NOT EXISTS (
>     SELECT * FROM Table_B, Table_C
>     WHERE Table_B.GroupID = TableC.GroupID
>     AND TableC.AccountID = 11
> )
> 
> I think that the not exists is a bit quicker than the NOT IN.  Give it
> a whirl.

A *lot* faster.  Like, 7x as fast. I'd forgotten about EXISTS, since I
so seldom have a use for it ... but this is shy it was created, I guess.

Thanks so much for your help!
                -Josh Berkus
-- 
______AGLIO DATABASE SOLUTIONS___________________________                                       Josh Berkus  Complete
informationtechnology      josh@agliodbs.com   and data management solutions       (415) 436-9166  for law firms, small
businesses      fax  436-0137   and non-profit organizations.       pager 338-4078                               San
Francisco


HELP! ... pg_locale ???

От
Sandis Jerics
Дата:
Hello folks,

I got a headache now, cause our admin played with postgres settings,
something about pg_locale, as he says. Perhaps the reason is
elsewhere, i dont know.

As result, now all queries, written inside the php code on multiply lines,
returns the following:
ERROR: parser: parse error at or near " "
The same query, pasted to psql, works well.
The same query written on single line in php code works well.
But i see that my older scripts with multiply line queries works well.

Whats up?! Why?

There is no errors in this code:
------
$result = @pg_exec($db,"SET DATESTYLE = 'ISO';SELECT DISTINCT  meznieciba,  date_part('day',date(datums))   AS dd,
date_part('month',date(datums))AS mm,  date_part('year',date(datums))  AS yyyy,  galvene,  kajene  FROM izsolesWHERE
datums= '$datums'") or die(pg_errormessage());
 
------
it works fine when written on the single line
but now i see the above stupid error message..

it so funny to have to rewrite queries to single line?..

--:)-- 
Best regards,Sandis




Re: HELP! ... pg_locale ???

От
Tom Lane
Дата:
Sandis Jerics <sandis@mediaparks.lv> writes:
> As result, now all queries, written inside the php code on multiply lines,
> returns the following:
>  ERROR: parser: parse error at or near " "

At a guess, you're having trouble with newline representations
(Unix convention is \n only, DOS/Windows convention is \r\n,
and then there's Macintosh which likes \r only).

We've been working to change Postgres to accept all of these
choices, but depending on which version of which tool you are
using, you may need to toe the Unix line faithfully.  The above
message looks a lot like something spitting up on a stray \r.

Dunno what your admin did to make the problem appear where you
hadn't had it before...
        regards, tom lane


Re: HELP! ... pg_locale ???

От
Tom Lane
Дата:
Sandis Jerics <sandis@mediaparks.lv> writes:
> As result, now all queries, written inside the php code on multiply lines,
> returns the following:
>  ERROR: parser: parse error at or near " "

At a guess, you're having trouble with newline representations
(Unix convention is \n only, DOS/Windows convention is \r\n,
and then there's Macintosh which likes \r only).

We've been working to change Postgres to accept all of these
choices, but depending on which version of which tool you are
using, you may need to toe the Unix line faithfully.  The above
message looks a lot like something spitting up on a stray \r.

Dunno what your admin did to make the problem appear where you
hadn't had it before...
        regards, tom lane