Обсуждение: 7.3 support for IN
Before anyone starts thinking it, yes, I am aware that INs are really slow in PostgreSQL. That being said... I was doing some tests with big IN statements and various workarounds and was shocked at some of the results. Here are the test cases I came up with: 1. Single value (16 ms) 2. 2 values using IN (870 ms) 3. 2 values using OR (870 ms) 4. 2 values using EXISTS (840 ms) 5. 2 values using JOIN (760 ms) 6. 2 values using UNION (31 ms) 7. 44 values using IN (960 ms) 8. 44 values using OR (960 ms) 9. 44 values using EXISTS (880 ms) 10. 44 values using JOIN (930 ms) 11. 44 values using UNION (670 ms) Note that the EXISTS and the JOIN do not count the time needed to execute the CREATE TABLE or COPY statements needed to populate a separate dataset to query across. Now then, it concerns me that doing 44 separate SELECTs and UNIONing the results together is faster than simply doing an IN or an OR. Heck, it's significantly faster to do 44 UNIONs than it is to do (Field=Value1 OR Field=Value2) with just *2* values. Is that really to be expected or am I doing a really poor job of optimizing/querying? Is this going to be improved in 7.3? I've attached the schema and the queries used to do my very rough benchmarking. The two tables have 591655 and 3044478 rows. If you would like to see anything else, please let me know. Greg
Вложения
On Thu, Jul 18, 2002 at 01:52:44PM -0400, Gregory Wood wrote: > Before anyone starts thinking it, yes, I am aware that INs are really slow > in PostgreSQL. That being said... I was doing some tests with big IN > statements and various workarounds and was shocked at some of the results. I hope those values were averaged out over a few runs. Measuring something as 16ms with a single run makes it statistically invalid. Maybe PostgreSQL happened to lose it's timeslice just as the EXISTS query came along, blowing any results completely out of the water. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > There are 10 kinds of people in the world, those that can do binary > arithmetic and those that can't.
> > Before anyone starts thinking it, yes, I am aware that INs are really slow > > in PostgreSQL. That being said... I was doing some tests with big IN > > statements and various workarounds and was shocked at some of the results. > > I hope those values were averaged out over a few runs. Measuring something as > 16ms with a single run makes it statistically invalid. Maybe PostgreSQL > happened to lose it's timeslice just as the EXISTS query came along, blowing > any results completely out of the water. Actually I ran the queries several times until I got consistent results three or four times consecutively. In the case of 2 digit times (the 16 and 31 ms) they were within tenths of a second of each other. In the case of the hundred ms+ times, the range was about 10ms, I tried to pick the 10s value that they tended to cluster around (for example 930 might have been 934.23, 928.61 and 929.47). If I got an inconsistent result I would run the query another 5 or 10 times to be sure that I had what I considered the most consistent times. Very rough I know, but I'm pretty sure that none of those numbers represents a 'blip' in performance. Of course multiple runs like this pretty much ensures that everything is thrown into cache, but I'm hoping to keep most if not all of the DB in memory anyway. Greg