Обсуждение: More Performance Questions

Поиск
Список
Период
Сортировка

More Performance Questions

От
Gordan Bobic
Дата:
Hi.

Speed-wise, is there a signifficant performance difference between doing
complex queries in the following forms?

Form 1:
(
    SELECT    Master.*
    FROM    Master,
        MasterFTI
    WHERE    MasterFTI.ID        = Master.ID    AND
        MasterFTI.String    = 'string1'
)
UNION
(
    SELECT    Master.*
    FROM    Master,
        MasterFTI
    WHERE    MasterFTI.ID        = Master.ID    AND
        MasterFTI.String    = 'string2'
)
...
UNION
...;

Form 2:
SELECT DISTINCT    Master.*
FROM        Master,
        MasterFTI
WHERE        MasterFTI.ID        = Master.ID    AND
        (
            MasterFTI.String    = 'string1'    OR
            MasterFTI.String    = 'string2'    OR
            ...
        );

The reason am asking is because I don't know how the back end splits and
executes these queries. Are the UNION/INTERSECT/EXCEPT queries each executed
separately in sequence? Or does the optimizer do some magic and transform
them in a more efficient way that doesn't require multiple passes?

And is the overhead of running multiple UNION queries greater than the
overhead of doing a DISTINCT? I need to sort the records anyway, so the fact
that DISTINCT does a SORT is a bonus in this case.

In an extreme case my dynamically constructed queries (from a CGI) can have
as many as 50 terms in them, which using the UNION method, equates to 50
queries being run (if that is the way it all gets executed). Is there likely
to be a sizeable improvement in using the other method? The reason I am
asking before trying is because I'd like to avoid re-writing my custom->SQL
parser again.

I was hoping that someone with a bit more background knowledge into how
PostgreSQL works could shed some light on it...

Regards.

Gordan

Re: More Performance Questions

От
Gordan Bobic
Дата:
On Wednesday 07 Nov 2001 18:13, Tom Lane wrote:
> Gordan Bobic <gordan@bobich.net> writes:
> > And is the overhead of running multiple UNION queries greater than the
> > overhead of doing a DISTINCT? I need to sort the records anyway, so the
> > fact that DISTINCT does a SORT is a bonus in this case.
>
> UNION implies DISTINCT, so you're going to get sort and uniq steps in
> either case.

Yes, but I thought that if I have 50 UNION queries, that would do a sort +
uniq for each "append" between them, whereas in the distinc case it only gets
done once, albeit on a bigger data set.

> What this is really going to boil down to is how the
> restriction and join steps are done, and you haven't given enough info
> to speculate about that.

Well, what I said is pretty much it.

It's the case of either doing single FTI term search per query and doing
UNION (for OR search) or INTERSECT (for AND search) of multiple queries.

If the search is executed in this way, and each UNION segment is executed in
sequence, then that means N queries, where N is the number of search terms.

In the SELECT DISTINCT case where multiple terms are ORed in the WHERE
clause, it is vaguely concievable that the entire query (at least in the
UNION case) could be executed in a single pass. Is that the how it works? Or
is each OR term located in a separate pass?

What I'm really trying to figure out is if there is an advantage (in theory
at least) in doing one slightly more complex query, or lots of simpler ones.

> Try some experimentation with EXPLAIN to see
> what kinds of plans you get.

Well, all the fields that are searched on are indexed, and for testing I
usually enable_seqscan=off. What I am going to do is re-write my parser/SQL
generator and give it a go - with a bit of luck, there will be a noticeable
difference in performance.

Thanks.

Gordan

Re: More Performance Questions

От
Tom Lane
Дата:
Gordan Bobic <gordan@bobich.net> writes:
> And is the overhead of running multiple UNION queries greater than the
> overhead of doing a DISTINCT? I need to sort the records anyway, so the fact
> that DISTINCT does a SORT is a bonus in this case.

UNION implies DISTINCT, so you're going to get sort and uniq steps in
either case.  What this is really going to boil down to is how the
restriction and join steps are done, and you haven't given enough info
to speculate about that.  Try some experimentation with EXPLAIN to see
what kinds of plans you get.

            regards, tom lane

Re: More Performance Questions

От
"Peter Darley"
Дата:
I don't know about weather this would be faster or not, but you can have all
your criteria in one where segment using IN, i.e.: WHERE Value IN
('Criteria1', 'Criteria2', etc...).  It may also have some benefit in being
shorter.
Peter Darley

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Gordan Bobic
Sent: Wednesday, November 07, 2001 10:26 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] More Performance Questions


On Wednesday 07 Nov 2001 18:13, Tom Lane wrote:
> Gordan Bobic <gordan@bobich.net> writes:
> > And is the overhead of running multiple UNION queries greater than the
> > overhead of doing a DISTINCT? I need to sort the records anyway, so the
> > fact that DISTINCT does a SORT is a bonus in this case.
>
> UNION implies DISTINCT, so you're going to get sort and uniq steps in
> either case.

Yes, but I thought that if I have 50 UNION queries, that would do a sort +
uniq for each "append" between them, whereas in the distinc case it only
gets
done once, albeit on a bigger data set.

> What this is really going to boil down to is how the
> restriction and join steps are done, and you haven't given enough info
> to speculate about that.

Well, what I said is pretty much it.

It's the case of either doing single FTI term search per query and doing
UNION (for OR search) or INTERSECT (for AND search) of multiple queries.

If the search is executed in this way, and each UNION segment is executed in
sequence, then that means N queries, where N is the number of search terms.

In the SELECT DISTINCT case where multiple terms are ORed in the WHERE
clause, it is vaguely concievable that the entire query (at least in the
UNION case) could be executed in a single pass. Is that the how it works? Or
is each OR term located in a separate pass?

What I'm really trying to figure out is if there is an advantage (in theory
at least) in doing one slightly more complex query, or lots of simpler ones.

> Try some experimentation with EXPLAIN to see
> what kinds of plans you get.

Well, all the fields that are searched on are indexed, and for testing I
usually enable_seqscan=off. What I am going to do is re-write my parser/SQL
generator and give it a go - with a bit of luck, there will be a noticeable
difference in performance.

Thanks.

Gordan

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html