Обсуждение: More Performance Questions
Hi.
Speed-wise, is there a signifficant performance difference between doing
complex queries in the following forms?
Form 1:
(
SELECT Master.*
FROM Master,
MasterFTI
WHERE MasterFTI.ID = Master.ID AND
MasterFTI.String = 'string1'
)
UNION
(
SELECT Master.*
FROM Master,
MasterFTI
WHERE MasterFTI.ID = Master.ID AND
MasterFTI.String = 'string2'
)
...
UNION
...;
Form 2:
SELECT DISTINCT Master.*
FROM Master,
MasterFTI
WHERE MasterFTI.ID = Master.ID AND
(
MasterFTI.String = 'string1' OR
MasterFTI.String = 'string2' OR
...
);
The reason am asking is because I don't know how the back end splits and
executes these queries. Are the UNION/INTERSECT/EXCEPT queries each executed
separately in sequence? Or does the optimizer do some magic and transform
them in a more efficient way that doesn't require multiple passes?
And is the overhead of running multiple UNION queries greater than the
overhead of doing a DISTINCT? I need to sort the records anyway, so the fact
that DISTINCT does a SORT is a bonus in this case.
In an extreme case my dynamically constructed queries (from a CGI) can have
as many as 50 terms in them, which using the UNION method, equates to 50
queries being run (if that is the way it all gets executed). Is there likely
to be a sizeable improvement in using the other method? The reason I am
asking before trying is because I'd like to avoid re-writing my custom->SQL
parser again.
I was hoping that someone with a bit more background knowledge into how
PostgreSQL works could shed some light on it...
Regards.
Gordan
On Wednesday 07 Nov 2001 18:13, Tom Lane wrote: > Gordan Bobic <gordan@bobich.net> writes: > > And is the overhead of running multiple UNION queries greater than the > > overhead of doing a DISTINCT? I need to sort the records anyway, so the > > fact that DISTINCT does a SORT is a bonus in this case. > > UNION implies DISTINCT, so you're going to get sort and uniq steps in > either case. Yes, but I thought that if I have 50 UNION queries, that would do a sort + uniq for each "append" between them, whereas in the distinc case it only gets done once, albeit on a bigger data set. > What this is really going to boil down to is how the > restriction and join steps are done, and you haven't given enough info > to speculate about that. Well, what I said is pretty much it. It's the case of either doing single FTI term search per query and doing UNION (for OR search) or INTERSECT (for AND search) of multiple queries. If the search is executed in this way, and each UNION segment is executed in sequence, then that means N queries, where N is the number of search terms. In the SELECT DISTINCT case where multiple terms are ORed in the WHERE clause, it is vaguely concievable that the entire query (at least in the UNION case) could be executed in a single pass. Is that the how it works? Or is each OR term located in a separate pass? What I'm really trying to figure out is if there is an advantage (in theory at least) in doing one slightly more complex query, or lots of simpler ones. > Try some experimentation with EXPLAIN to see > what kinds of plans you get. Well, all the fields that are searched on are indexed, and for testing I usually enable_seqscan=off. What I am going to do is re-write my parser/SQL generator and give it a go - with a bit of luck, there will be a noticeable difference in performance. Thanks. Gordan
Gordan Bobic <gordan@bobich.net> writes:
> And is the overhead of running multiple UNION queries greater than the
> overhead of doing a DISTINCT? I need to sort the records anyway, so the fact
> that DISTINCT does a SORT is a bonus in this case.
UNION implies DISTINCT, so you're going to get sort and uniq steps in
either case. What this is really going to boil down to is how the
restriction and join steps are done, and you haven't given enough info
to speculate about that. Try some experimentation with EXPLAIN to see
what kinds of plans you get.
regards, tom lane
I don't know about weather this would be faster or not, but you can have all
your criteria in one where segment using IN, i.e.: WHERE Value IN
('Criteria1', 'Criteria2', etc...). It may also have some benefit in being
shorter.
Peter Darley
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Gordan Bobic
Sent: Wednesday, November 07, 2001 10:26 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] More Performance Questions
On Wednesday 07 Nov 2001 18:13, Tom Lane wrote:
> Gordan Bobic <gordan@bobich.net> writes:
> > And is the overhead of running multiple UNION queries greater than the
> > overhead of doing a DISTINCT? I need to sort the records anyway, so the
> > fact that DISTINCT does a SORT is a bonus in this case.
>
> UNION implies DISTINCT, so you're going to get sort and uniq steps in
> either case.
Yes, but I thought that if I have 50 UNION queries, that would do a sort +
uniq for each "append" between them, whereas in the distinc case it only
gets
done once, albeit on a bigger data set.
> What this is really going to boil down to is how the
> restriction and join steps are done, and you haven't given enough info
> to speculate about that.
Well, what I said is pretty much it.
It's the case of either doing single FTI term search per query and doing
UNION (for OR search) or INTERSECT (for AND search) of multiple queries.
If the search is executed in this way, and each UNION segment is executed in
sequence, then that means N queries, where N is the number of search terms.
In the SELECT DISTINCT case where multiple terms are ORed in the WHERE
clause, it is vaguely concievable that the entire query (at least in the
UNION case) could be executed in a single pass. Is that the how it works? Or
is each OR term located in a separate pass?
What I'm really trying to figure out is if there is an advantage (in theory
at least) in doing one slightly more complex query, or lots of simpler ones.
> Try some experimentation with EXPLAIN to see
> what kinds of plans you get.
Well, all the fields that are searched on are indexed, and for testing I
usually enable_seqscan=off. What I am going to do is re-write my parser/SQL
generator and give it a go - with a bit of luck, there will be a noticeable
difference in performance.
Thanks.
Gordan
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/users-lounge/docs/faq.html