Обсуждение: 7.3 support for IN

Поиск
Список
Период
Сортировка

7.3 support for IN

От
"Gregory Wood"
Дата:
Before anyone starts thinking it, yes, I am aware that INs are really slow
in PostgreSQL. That being said... I was doing some tests with big IN
statements and various workarounds and was shocked at some of the results.

Here are the test cases I came up with:

1. Single value (16 ms)
2. 2 values using IN (870 ms)
3. 2 values using OR (870 ms)
4. 2 values using EXISTS (840 ms)
5. 2 values using JOIN (760 ms)
6. 2 values using UNION (31 ms)
7. 44 values using IN (960 ms)
8. 44 values using OR (960 ms)
9. 44 values using EXISTS (880 ms)
10. 44 values using JOIN (930 ms)
11. 44 values using UNION (670 ms)

Note that the EXISTS and the JOIN do not count the time needed to execute
the CREATE TABLE or COPY statements needed to populate a separate dataset to
query across.

Now then, it concerns me that doing 44 separate SELECTs and UNIONing the
results together is faster than simply doing an IN or an OR. Heck, it's
significantly faster to do 44 UNIONs than it is to do (Field=Value1 OR
Field=Value2) with just *2* values. Is that really to be expected or am I
doing a really poor job of optimizing/querying? Is this going to be improved
in 7.3?

I've attached the schema and the queries used to do my very rough
benchmarking. The two tables have 591655 and 3044478 rows. If you would like
to see anything else, please let me know.

Greg

Вложения

Re: 7.3 support for IN

От
Martijn van Oosterhout
Дата:
On Thu, Jul 18, 2002 at 01:52:44PM -0400, Gregory Wood wrote:
> Before anyone starts thinking it, yes, I am aware that INs are really slow
> in PostgreSQL. That being said... I was doing some tests with big IN
> statements and various workarounds and was shocked at some of the results.

I hope those values were averaged out over a few runs. Measuring something as
16ms with a single run makes it statistically invalid. Maybe PostgreSQL
happened to lose it's timeslice just as the EXISTS query came along, blowing
any results completely out of the water.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.

Re: 7.3 support for IN

От
"Gregory Wood"
Дата:
> > Before anyone starts thinking it, yes, I am aware that INs are really
slow
> > in PostgreSQL. That being said... I was doing some tests with big IN
> > statements and various workarounds and was shocked at some of the
results.
>
> I hope those values were averaged out over a few runs. Measuring something
as
> 16ms with a single run makes it statistically invalid. Maybe PostgreSQL
> happened to lose it's timeslice just as the EXISTS query came along,
blowing
> any results completely out of the water.

Actually I ran the queries several times until I got consistent results
three or four times consecutively. In the case of 2 digit times (the 16 and
31 ms) they were within tenths of a second of each other. In the case of the
hundred ms+ times, the range was about 10ms, I tried to pick the 10s value
that they tended to cluster around (for example 930 might have been 934.23,
928.61 and 929.47).

If I got an inconsistent result I would run the query another 5 or 10 times
to be sure that I had what I considered the most consistent times. Very
rough I know, but I'm pretty sure that none of those numbers represents a
'blip' in performance.

Of course multiple runs like this pretty much ensures that everything is
thrown into cache, but I'm hoping to keep most if not all of the DB in
memory anyway.

Greg