IN list processing performance (yet again)

Поиск
Список
Период
Сортировка
Having grepped the web, it's clear that this isn't the first or last
time this issue will be raised.

My application relies heavily on IN lists.  The lists are primarily
constant integers, so queries look like:

SELECT val FROM table WHERE id IN (43, 49, 1001, 100002, ...)

Performance is critical, and the size of these lists depends a lot on
how the larger 3-tier applicaiton is used,
but it wouldn't be out of the question to retrieve 3000-10000 items.

PostgreSQL 7.3.2 seems to have a lot of trouble with large lists.

I ran an experiment that ran queries on a table of two integers  (ID,
VAL), where ID is a primary key and the subject
of IN list predicates.  The test used a table with one million rows  ID
is appropriately indexed,
and I have VACUUMED/analyzed the database after table load.

I ran tests on in-lists from about 100 to 100,000 entries.

I also ran tests where I picked the rows out one-by-one using
parameterized statements, e.g.

SELECT val FROM table WHERE id = ?

I'm trying to decide how much I should use parameterized statements and
when to work around buffer size limitations
in JDBC transport, general query processing, etc.

So here are my questions as a prelude to the data noted below:

1) What is the acceptable limit of jdbc Statement buffer sizes for
executeQuery()?
      (Presumably it's not really a JDBC question, but a PostgreSQL
query/buffering/transport question).

2) What is the expected acceptable limit for the number of items in an
IN list predicate such as
      those used here.  (List of constants, not subselects).

3) What should I expect for performance capabilities/limitations from
PostgreSQL on this type of problem?
      (Set my expectations, perhaps they're unrealistic).

4) What are my alternatives for picking specific rows for thousands of
elements with sub-second response times
      if not IN lists?      (This is crucial!)

---------------------------------------------

Here is a summary of my observations of query times, and I've attached
the test program (more on that below).

1) PostgreSQL exhibits worse-than-linear performance behavior with
respect to IN list size.
      This is bad.

2) Parameterized statements exhibit the expected linear performance
characteristics.

3) The break-even point for using IN lists vs. parameterized statements
in my environment

      (RedHat Linux 9.0, PostgreSQL 7.3.2, 512MB memory, 7200RPM 100UDMA
IDE disks, AMD1600Mhz)

      is about 700 items in the IN list.  Beyond that, IN the IN list
scalability curve makes it impractical.

4)  God help you if you haven't vacuum/analyzed that the newly loaded
table.  Without this,
      IN list processing is even worse!

     For just 10 elements in the IN list:

     *Without* VACUUMDB, IN lists suck beyond measure:
    Elapsed time for 10 IN list elements: 2638 ms
    Elapsed time for 10 parameterized elements: 9 ms

   *With* VACUUMDB: IN lists recover a bit:
   Elapsed time for 10 IN list elements: 10 ms
   Elapsed time for 10 parameterized elements: 24 ms

    However it's VERY interesting to note that parameterized statements
worked well.  That implies
    probable disparity in plan generation.  It's worth noting that it
didn't *feel* like I was getting the same
    delay when I ran the query from the 'psql' client, but since it
doesn't report times I can't be sure.

5) Rest of my results are vacuumed, (and listed in the attached program
in detail).

     The interesting points are:

      For an IN list of 700 elements:

         MySQL 3.23.56 (with INNODB tables)  takes 19ms, 73ms with
parameterized statements.
         PostgreSQL takes 269ms, 263ms with parameterized statements.

      For larger lists, MySQL happily processed a 90,000 element IN list
in 1449ms,
      9283 ms using parameterized statements.

      PostgreSQL  craps out trying to process 8000 elements with the error:
      out of free buffers: time to abort!

      PostgreSQL takes 45,566ms for 7000 elements in an IN list (ouch!)
, and 2027ms for a parameterized statement.
      MySQL easily beats that with 10 times the data.

6) Using a remote client on the lan (10/100) to run the client piece on
a separate machine from the database
     server yielded little change int he results.  Prepared statements
worked pretty well even with actual wire latency,
     surprise!  (Of course it's very little latency on my lan, not like,
say, the database server running in a different city
     which customers have been known to do).

The MySQL and PostgreSQL installations are the default RedHat 9.0
distribution packages,
I haven't tweaked either's installation parameters. (though MySQL was
updated by RedHat from 3.23.54a to 3.23.56
as part of an automatic upgrade).

My goal here isn't to say "why aren't you like MySQL".  I've been using
PostgreSQL for a year as the development database of choice
with satisfactory results.  But I won't be able to support customers who
want to use PostgreSQL on large deployments of my products
if I can't selectively retrieve data for several thousand elements in
sub-second times.

(PostgreSQL devos, if you want a feel good bullet, note that I don't
support MySQL at all since lack of MVCC transactions
is a showstopper from a multi-user performance standpoint).

So I'm looking for (a) solutions, answers to my questions above, and (b)
a statement of "we're on this" or "you're stuck with it" from
PostgreSQL devos who know.

----------------------------------------------------------------
On the attached program. (a Java JDBC program) Sorry, you can't just run
it "as is".  Search for formfeeds (^L) if you want
to skip all the result data I logged.  Compilation is straightforward,
simply supply the location of your JDBC jar file for compiling and running
(mine is noted in the program).

First, move the "if (false)" element below the table creation statements
and run the program to create the table.
Then VACUUM/analyze your database (suuuuure would be nice not to have to
vacuum).
Then move the "if (false)" element above the table creation so you won't
have to do it every time.
Then move past the formfeed and adjust the 'tryAmount' for loop to test
the amounts you're interested.
100 to 1000 by 100's is a good starting point.

Ignore the section (part of the if (false) logic) that attempts to
figure out what the largest query size is.
Unless you want to reproduce a hung postmaster in a CPU loop, which is
what I got when I tried to run that logic,
though whan I ran it I was missing the closing ')' in my IN list, which
I've since added to that code.

Thanks for any help!

Dave
import java.sql.* ;

/*
Query huge inlists, test for buffer overflow and performance using in-lists vs.
parameterized statements, and perhaps batched statements (not yet)

Results:
MySQL blows PostgreSQL away here, and that's AFTER doing a VACUUMDB in PostgreSQL
PostgreSQL does ok with VACUUMDB to 700 entries, the prepared statements are better.
MySQL: IN lists are always superior to any practical limit (tested to 100,000 elements)

Local-to-local on AMD1600Mhz, MySQL 3.23.56 with JDBC driver 3.0.8
[MYSQL was much better here...]

Table update time: 109927 ms

For 10,000 -> 100,000 by 10,000
Elapsed time for 90000 IN list elements: 1449 ms
62 elements/ms
Elapsed time for 90000 parameterized elements: 9283 ms
9 elements/ms

For 100 -> 10,000 by 100
Last received via IN list: 99
Elapsed time for 100 IN list elements: 6 ms
16 elements/ms
Elapsed time for 100 parameterized elements: 20 ms
5 elements/ms
Last received via parameterized elements: 99
Last received via IN list: 199
Elapsed time for 200 IN list elements: 7 ms
28 elements/ms
Elapsed time for 200 parameterized elements: 45 ms
4 elements/ms
Last received via parameterized elements: 199
Last received via IN list: 299
Elapsed time for 300 IN list elements: 22 ms
13 elements/ms
Elapsed time for 300 parameterized elements: 64 ms
4 elements/ms
Last received via parameterized elements: 299
Last received via IN list: 399
Elapsed time for 400 IN list elements: 39 ms
10 elements/ms
Elapsed time for 400 parameterized elements: 63 ms
6 elements/ms
Last received via parameterized elements: 399
Last received via IN list: 499
Elapsed time for 500 IN list elements: 12 ms
41 elements/ms
Elapsed time for 500 parameterized elements: 116 ms
4 elements/ms
Last received via parameterized elements: 499
Last received via IN list: 599
Elapsed time for 600 IN list elements: 8 ms
75 elements/ms
Elapsed time for 600 parameterized elements: 65 ms
9 elements/ms
Last received via parameterized elements: 599
Last received via IN list: 699
Elapsed time for 700 IN list elements: 19 ms
36 elements/ms
Elapsed time for 700 parameterized elements: 73 ms
9 elements/ms
Last received via parameterized elements: 699
Last received via IN list: 799
Elapsed time for 800 IN list elements: 12 ms
66 elements/ms
Elapsed time for 800 parameterized elements: 84 ms
9 elements/ms
Last received via parameterized elements: 799
Last received via IN list: 899
Elapsed time for 900 IN list elements: 12 ms
75 elements/ms
Elapsed time for 900 parameterized elements: 97 ms
9 elements/ms
Last received via parameterized elements: 899
Last received via IN list: 999
Elapsed time for 1000 IN list elements: 15 ms
66 elements/ms
Elapsed time for 1000 parameterized elements: 109 ms
9 elements/ms
Last received via parameterized elements: 999
Last received via IN list: 1099
Elapsed time for 1100 IN list elements: 16 ms
68 elements/ms
Elapsed time for 1100 parameterized elements: 115 ms
9 elements/ms
Last received via parameterized elements: 1099
Last received via IN list: 1199
Elapsed time for 1200 IN list elements: 16 ms
75 elements/ms
Elapsed time for 1200 parameterized elements: 138 ms
8 elements/ms
Last received via parameterized elements: 1199
Last received via IN list: 1299
Elapsed time for 1300 IN list elements: 19 ms
68 elements/ms
Elapsed time for 1300 parameterized elements: 136 ms
9 elements/ms
Last received via parameterized elements: 1299
Last received via IN list: 1399
Elapsed time for 1400 IN list elements: 21 ms
66 elements/ms
Elapsed time for 1400 parameterized elements: 150 ms
9 elements/ms
Last received via parameterized elements: 1399
Last received via IN list: 1499
Elapsed time for 1500 IN list elements: 21 ms
71 elements/ms
Elapsed time for 1500 parameterized elements: 158 ms
9 elements/ms
Last received via parameterized elements: 1499
Last received via IN list: 1599
Elapsed time for 1600 IN list elements: 22 ms
72 elements/ms
Elapsed time for 1600 parameterized elements: 178 ms
8 elements/ms
Last received via parameterized elements: 1599
Last received via IN list: 1699
Elapsed time for 1700 IN list elements: 22 ms
77 elements/ms
Elapsed time for 1700 parameterized elements: 180 ms
9 elements/ms
Last received via parameterized elements: 1699
Last received via IN list: 1799
Elapsed time for 1800 IN list elements: 26 ms
69 elements/ms
Elapsed time for 1800 parameterized elements: 205 ms
8 elements/ms
Last received via parameterized elements: 1799
Last received via IN list: 1899
Elapsed time for 1900 IN list elements: 57 ms
33 elements/ms
Elapsed time for 1900 parameterized elements: 198 ms
9 elements/ms
Last received via parameterized elements: 1899
Last received via IN list: 1999
Elapsed time for 2000 IN list elements: 29 ms
68 elements/ms
Elapsed time for 2000 parameterized elements: 215 ms
9 elements/ms
Last received via parameterized elements: 1999
Last received via IN list: 2099
Elapsed time for 2100 IN list elements: 30 ms
70 elements/ms
Elapsed time for 2100 parameterized elements: 222 ms
9 elements/ms
Last received via parameterized elements: 2099
Last received via IN list: 2199
Elapsed time for 2200 IN list elements: 32 ms
68 elements/ms
Elapsed time for 2200 parameterized elements: 243 ms
9 elements/ms
Last received via parameterized elements: 2199
Last received via IN list: 2299
Elapsed time for 2300 IN list elements: 32 ms
71 elements/ms
Elapsed time for 2300 parameterized elements: 241 ms
9 elements/ms
Last received via parameterized elements: 2299
Last received via IN list: 2399
Elapsed time for 2400 IN list elements: 33 ms
72 elements/ms
Elapsed time for 2400 parameterized elements: 260 ms
9 elements/ms
Last received via parameterized elements: 2399
Last received via IN list: 2499
Elapsed time for 2500 IN list elements: 64 ms
39 elements/ms
Elapsed time for 2500 parameterized elements: 275 ms
9 elements/ms
Last received via parameterized elements: 2499
Last received via IN list: 2599
Elapsed time for 2600 IN list elements: 35 ms
74 elements/ms
Elapsed time for 2600 parameterized elements: 275 ms
9 elements/ms
Last received via parameterized elements: 2599
Last received via IN list: 2699
Elapsed time for 2700 IN list elements: 36 ms
75 elements/ms
Elapsed time for 2700 parameterized elements: 292 ms
9 elements/ms
Last received via parameterized elements: 2699
Last received via IN list: 2799
Elapsed time for 2800 IN list elements: 41 ms
68 elements/ms
Elapsed time for 2800 parameterized elements: 308 ms
9 elements/ms
Last received via parameterized elements: 2799
Last received via IN list: 2899
Elapsed time for 2900 IN list elements: 40 ms
72 elements/ms
Elapsed time for 2900 parameterized elements: 341 ms
8 elements/ms
Last received via parameterized elements: 2899
Last received via IN list: 2999
Elapsed time for 3000 IN list elements: 42 ms
71 elements/ms
Elapsed time for 3000 parameterized elements: 356 ms
8 elements/ms
Last received via parameterized elements: 2999
Last received via IN list: 3099
Elapsed time for 3100 IN list elements: 42 ms
73 elements/ms
Elapsed time for 3100 parameterized elements: 342 ms
9 elements/ms
Last received via parameterized elements: 3099
Last received via IN list: 3199
Elapsed time for 3200 IN list elements: 43 ms
74 elements/ms
Elapsed time for 3200 parameterized elements: 345 ms
9 elements/ms
Last received via parameterized elements: 3199
Last received via IN list: 3299
Elapsed time for 3300 IN list elements: 43 ms
76 elements/ms
Elapsed time for 3300 parameterized elements: 359 ms
9 elements/ms
Last received via parameterized elements: 3299
Last received via IN list: 3399
Elapsed time for 3400 IN list elements: 77 ms
44 elements/ms
Elapsed time for 3400 parameterized elements: 356 ms
9 elements/ms
Last received via parameterized elements: 3399
Last received via IN list: 3499
Elapsed time for 3500 IN list elements: 50 ms
70 elements/ms
Elapsed time for 3500 parameterized elements: 370 ms
9 elements/ms
Last received via parameterized elements: 3499
Last received via IN list: 3599
Elapsed time for 3600 IN list elements: 84 ms
42 elements/ms
Elapsed time for 3600 parameterized elements: 387 ms
9 elements/ms
Last received via parameterized elements: 3599
Last received via IN list: 3699
Elapsed time for 3700 IN list elements: 52 ms
71 elements/ms
Elapsed time for 3700 parameterized elements: 390 ms
9 elements/ms
Last received via parameterized elements: 3699
Last received via IN list: 3799
Elapsed time for 3800 IN list elements: 54 ms
70 elements/ms
Elapsed time for 3800 parameterized elements: 407 ms
9 elements/ms
Last received via parameterized elements: 3799
Last received via IN list: 3899
Elapsed time for 3900 IN list elements: 84 ms
46 elements/ms
Elapsed time for 3900 parameterized elements: 422 ms
9 elements/ms
Last received via parameterized elements: 3899
Last received via IN list: 3999
Elapsed time for 4000 IN list elements: 58 ms
68 elements/ms
Elapsed time for 4000 parameterized elements: 434 ms
9 elements/ms
Last received via parameterized elements: 3999
Last received via IN list: 4099
Elapsed time for 4100 IN list elements: 58 ms
70 elements/ms
Elapsed time for 4100 parameterized elements: 446 ms
9 elements/ms
Last received via parameterized elements: 4099
Last received via IN list: 4199
Elapsed time for 4200 IN list elements: 61 ms
68 elements/ms
Elapsed time for 4200 parameterized elements: 453 ms
9 elements/ms
Last received via parameterized elements: 4199
Last received via IN list: 4299
Elapsed time for 4300 IN list elements: 63 ms
68 elements/ms
Elapsed time for 4300 parameterized elements: 457 ms
9 elements/ms
Last received via parameterized elements: 4299
Last received via IN list: 4399
Elapsed time for 4400 IN list elements: 94 ms
46 elements/ms
Elapsed time for 4400 parameterized elements: 473 ms
9 elements/ms
Last received via parameterized elements: 4399
Last received via IN list: 4499
Elapsed time for 4500 IN list elements: 64 ms
70 elements/ms
Elapsed time for 4500 parameterized elements: 506 ms
8 elements/ms
Last received via parameterized elements: 4499
Last received via IN list: 4599
Elapsed time for 4600 IN list elements: 72 ms
63 elements/ms
Elapsed time for 4600 parameterized elements: 527 ms
8 elements/ms
Last received via parameterized elements: 4599
Last received via IN list: 4699
Elapsed time for 4700 IN list elements: 68 ms
69 elements/ms
Elapsed time for 4700 parameterized elements: 499 ms
9 elements/ms
Last received via parameterized elements: 4699
Last received via IN list: 4799
Elapsed time for 4800 IN list elements: 70 ms
68 elements/ms
Elapsed time for 4800 parameterized elements: 539 ms
8 elements/ms
Last received via parameterized elements: 4799
Last received via IN list: 4899
Elapsed time for 4900 IN list elements: 100 ms
49 elements/ms
Elapsed time for 4900 parameterized elements: 590 ms
8 elements/ms
Last received via parameterized elements: 4899
Last received via IN list: 4999
Elapsed time for 5000 IN list elements: 111 ms
45 elements/ms
Elapsed time for 5000 parameterized elements: 531 ms
9 elements/ms
Last received via parameterized elements: 4999
Last received via IN list: 5099
Elapsed time for 5100 IN list elements: 79 ms
64 elements/ms
Elapsed time for 5100 parameterized elements: 542 ms
9 elements/ms
Last received via parameterized elements: 5099
Last received via IN list: 5199
Elapsed time for 5200 IN list elements: 108 ms
48 elements/ms
Elapsed time for 5200 parameterized elements: 560 ms
9 elements/ms
Last received via parameterized elements: 5199
Last received via IN list: 5299
Elapsed time for 5300 IN list elements: 78 ms
67 elements/ms
Elapsed time for 5300 parameterized elements: 560 ms
9 elements/ms
Last received via parameterized elements: 5299
Last received via IN list: 5399
Elapsed time for 5400 IN list elements: 81 ms
66 elements/ms
Elapsed time for 5400 parameterized elements: 572 ms
9 elements/ms
Last received via parameterized elements: 5399
Last received via IN list: 5499
Elapsed time for 5500 IN list elements: 80 ms
68 elements/ms
Elapsed time for 5500 parameterized elements: 627 ms
8 elements/ms
Last received via parameterized elements: 5499
Last received via IN list: 5599
Elapsed time for 5600 IN list elements: 81 ms
69 elements/ms
Elapsed time for 5600 parameterized elements: 590 ms
9 elements/ms
Last received via parameterized elements: 5599
Last received via IN list: 5699
Elapsed time for 5700 IN list elements: 86 ms
66 elements/ms
Elapsed time for 5700 parameterized elements: 606 ms
9 elements/ms
Last received via parameterized elements: 5699
Last received via IN list: 5799
Elapsed time for 5800 IN list elements: 85 ms
68 elements/ms
Elapsed time for 5800 parameterized elements: 663 ms
8 elements/ms
Last received via parameterized elements: 5799
Last received via IN list: 5899
Elapsed time for 5900 IN list elements: 83 ms
71 elements/ms
Elapsed time for 5900 parameterized elements: 629 ms
9 elements/ms
Last received via parameterized elements: 5899
Last received via IN list: 5999
Elapsed time for 6000 IN list elements: 91 ms
65 elements/ms
Elapsed time for 6000 parameterized elements: 658 ms
9 elements/ms
Last received via parameterized elements: 5999
Last received via IN list: 6099
Elapsed time for 6100 IN list elements: 87 ms
70 elements/ms
Elapsed time for 6100 parameterized elements: 702 ms
8 elements/ms
Last received via parameterized elements: 6099
Last received via IN list: 6199
Elapsed time for 6200 IN list elements: 88 ms
70 elements/ms
Elapsed time for 6200 parameterized elements: 667 ms
9 elements/ms
Last received via parameterized elements: 6199
Last received via IN list: 6299
Elapsed time for 6300 IN list elements: 91 ms
69 elements/ms
Elapsed time for 6300 parameterized elements: 670 ms
9 elements/ms
Last received via parameterized elements: 6299
Last received via IN list: 6399
Elapsed time for 6400 IN list elements: 91 ms
70 elements/ms
Elapsed time for 6400 parameterized elements: 708 ms
9 elements/ms
Last received via parameterized elements: 6399
Last received via IN list: 6499
Elapsed time for 6500 IN list elements: 96 ms
67 elements/ms
Elapsed time for 6500 parameterized elements: 687 ms
9 elements/ms
Last received via parameterized elements: 6499
Last received via IN list: 6599
Elapsed time for 6600 IN list elements: 95 ms
69 elements/ms
Elapsed time for 6600 parameterized elements: 701 ms
9 elements/ms
Last received via parameterized elements: 6599
Last received via IN list: 6699
Elapsed time for 6700 IN list elements: 134 ms
50 elements/ms
Elapsed time for 6700 parameterized elements: 711 ms
9 elements/ms
Last received via parameterized elements: 6699
Last received via IN list: 6799
Elapsed time for 6800 IN list elements: 98 ms
69 elements/ms
Elapsed time for 6800 parameterized elements: 723 ms
9 elements/ms
Last received via parameterized elements: 6799
Last received via IN list: 6899
Elapsed time for 6900 IN list elements: 99 ms
69 elements/ms
Elapsed time for 6900 parameterized elements: 728 ms
9 elements/ms
Last received via parameterized elements: 6899
Last received via IN list: 6999
Elapsed time for 7000 IN list elements: 141 ms
49 elements/ms
Elapsed time for 7000 parameterized elements: 733 ms
9 elements/ms
Last received via parameterized elements: 6999
Last received via IN list: 7099
Elapsed time for 7100 IN list elements: 108 ms
65 elements/ms
Elapsed time for 7100 parameterized elements: 752 ms
9 elements/ms
Last received via parameterized elements: 7099
Last received via IN list: 7199
Elapsed time for 7200 IN list elements: 104 ms
69 elements/ms
Elapsed time for 7200 parameterized elements: 769 ms
9 elements/ms
Last received via parameterized elements: 7199
Last received via IN list: 7299
Elapsed time for 7300 IN list elements: 142 ms
51 elements/ms
Elapsed time for 7300 parameterized elements: 778 ms
9 elements/ms
Last received via parameterized elements: 7299
Last received via IN list: 7399
Elapsed time for 7400 IN list elements: 127 ms
58 elements/ms
Elapsed time for 7400 parameterized elements: 796 ms
9 elements/ms
Last received via parameterized elements: 7399
Last received via IN list: 7499
Elapsed time for 7500 IN list elements: 110 ms
68 elements/ms
Elapsed time for 7500 parameterized elements: 796 ms
9 elements/ms
Last received via parameterized elements: 7499
Last received via IN list: 7599
Elapsed time for 7600 IN list elements: 144 ms
52 elements/ms
Elapsed time for 7600 parameterized elements: 994 ms
7 elements/ms
Last received via parameterized elements: 7599
Last received via IN list: 7699
Elapsed time for 7700 IN list elements: 122 ms
63 elements/ms
Elapsed time for 7700 parameterized elements: 819 ms
9 elements/ms
Last received via parameterized elements: 7699
Last received via IN list: 7799
Elapsed time for 7800 IN list elements: 114 ms
68 elements/ms
Elapsed time for 7800 parameterized elements: 893 ms
8 elements/ms
Last received via parameterized elements: 7799
Last received via IN list: 7899
Elapsed time for 7900 IN list elements: 152 ms
51 elements/ms
Elapsed time for 7900 parameterized elements: 841 ms
9 elements/ms
Last received via parameterized elements: 7899
Last received via IN list: 7999
Elapsed time for 8000 IN list elements: 119 ms
67 elements/ms
Elapsed time for 8000 parameterized elements: 853 ms
9 elements/ms
Last received via parameterized elements: 7999
Last received via IN list: 8099
Elapsed time for 8100 IN list elements: 121 ms
66 elements/ms
Elapsed time for 8100 parameterized elements: 858 ms
9 elements/ms
Last received via parameterized elements: 8099
Last received via IN list: 8199
Elapsed time for 8200 IN list elements: 157 ms
52 elements/ms
Elapsed time for 8200 parameterized elements: 871 ms
9 elements/ms
Last received via parameterized elements: 8199
Last received via IN list: 8299
Elapsed time for 8300 IN list elements: 131 ms
63 elements/ms
Elapsed time for 8300 parameterized elements: 879 ms
9 elements/ms
Last received via parameterized elements: 8299
Last received via IN list: 8399
Elapsed time for 8400 IN list elements: 124 ms
67 elements/ms
Elapsed time for 8400 parameterized elements: 890 ms
9 elements/ms
Last received via parameterized elements: 8399
Last received via IN list: 8499
Elapsed time for 8500 IN list elements: 163 ms
52 elements/ms
Elapsed time for 8500 parameterized elements: 904 ms
9 elements/ms
Last received via parameterized elements: 8499
Last received via IN list: 8599
Elapsed time for 8600 IN list elements: 131 ms
65 elements/ms
Elapsed time for 8600 parameterized elements: 907 ms
9 elements/ms
Last received via parameterized elements: 8599
Last received via IN list: 8699
Elapsed time for 8700 IN list elements: 127 ms
68 elements/ms
Elapsed time for 8700 parameterized elements: 947 ms
9 elements/ms
Last received via parameterized elements: 8699
Last received via IN list: 8799
Elapsed time for 8800 IN list elements: 166 ms
53 elements/ms
Elapsed time for 8800 parameterized elements: 933 ms
9 elements/ms
Last received via parameterized elements: 8799
Last received via IN list: 8899
Elapsed time for 8900 IN list elements: 133 ms
66 elements/ms
Elapsed time for 8900 parameterized elements: 947 ms
9 elements/ms
Last received via parameterized elements: 8899
Last received via IN list: 8999
Elapsed time for 9000 IN list elements: 142 ms
63 elements/ms
Elapsed time for 9000 parameterized elements: 951 ms
9 elements/ms
Last received via parameterized elements: 8999
Last received via IN list: 9099
Elapsed time for 9100 IN list elements: 173 ms
52 elements/ms
Elapsed time for 9100 parameterized elements: 961 ms
9 elements/ms
Last received via parameterized elements: 9099
Last received via IN list: 9199
Elapsed time for 9200 IN list elements: 138 ms
66 elements/ms
Elapsed time for 9200 parameterized elements: 979 ms
9 elements/ms
Last received via parameterized elements: 9199
Last received via IN list: 9299
Elapsed time for 9300 IN list elements: 135 ms
68 elements/ms
Elapsed time for 9300 parameterized elements: 985 ms
9 elements/ms
Last received via parameterized elements: 9299
Last received via IN list: 9399
Elapsed time for 9400 IN list elements: 175 ms
53 elements/ms
Elapsed time for 9400 parameterized elements: 1000 ms
9 elements/ms
Last received via parameterized elements: 9399
Last received via IN list: 9499
Elapsed time for 9500 IN list elements: 136 ms
69 elements/ms
Elapsed time for 9500 parameterized elements: 1010 ms
9 elements/ms
Last received via parameterized elements: 9499
Last received via IN list: 9599
Elapsed time for 9600 IN list elements: 144 ms
66 elements/ms
Elapsed time for 9600 parameterized elements: 1017 ms
9 elements/ms
Last received via parameterized elements: 9599
Last received via IN list: 9699
Elapsed time for 9700 IN list elements: 186 ms
52 elements/ms
Elapsed time for 9700 parameterized elements: 1021 ms
9 elements/ms
Last received via parameterized elements: 9699
Last received via IN list: 9799
Elapsed time for 9800 IN list elements: 142 ms
69 elements/ms
Elapsed time for 9800 parameterized elements: 1048 ms
9 elements/ms
Last received via parameterized elements: 9799
Last received via IN list: 9899
Elapsed time for 9900 IN list elements: 145 ms
68 elements/ms
Elapsed time for 9900 parameterized elements: 1054 ms
9 elements/ms
Last received via parameterized elements: 9899


Local-to-local on AMD 1600Mhz, PGSQL7.3.2, RH9.0
Table update time: 169xxx ms

*Without* VACUUMDB, IN lists suck beyond measure:
For just 10 elements in the IN list:
Elapsed time for 10 IN list elements: 2638 ms
0 elements/ms
Elapsed time for 10 parameterized elements: 9 ms
1 elements/ms

*With* VACUUMDB: IN lists recover a bit:
Elapsed time for 10 IN list elements: 10 ms
1 elements/ms
Elapsed time for 10 parameterized elements: 24 ms
0 elements/ms
[...]
Elapsed time for 99 IN list elements: 16 ms
6 elements/ms
Elapsed time for 99 parameterized elements: 36 ms
2 elements/ms
Last received via parameterized elements: 98


Testing for some larger numbers of elements, however, reveals another story.
Elapsed time for 1000 IN list elements: 676 ms
1 elements/ms
Elapsed time for 1000 parameterized elements: 351 ms
2 elements/ms
Elapsed time for 2000 IN list elements: 3197 ms
0 elements/ms
Elapsed time for 2000 parameterized elements: 713 ms
2 elements/ms
Elapsed time for 3000 IN list elements: 7266 ms
0 elements/ms
Elapsed time for 3000 parameterized elements: 1080 ms
2 elements/ms
Elapsed time for 4000 IN list elements: 13341 ms
0 elements/ms
Elapsed time for 4000 parameterized elements: 1449 ms
2 elements/ms

Based on the 'by-the-1000's approach, IN lists are on a seriously bad curve,
while parameterized statements are linear.

The cutoff on my AMD 1600 Mhz machine is at about 700 elements:
Elapsed time for 600 IN list elements: 157 ms
3 elements/ms
Elapsed time for 600 parameterized elements: 246 ms
2 elements/ms
Elapsed time for 700 IN list elements: 269 ms
2 elements/ms
Elapsed time for 700 parameterized elements: 263 ms
2 elements/ms

Local(Intel PIII-M 1Ghz)-to-REMOTE(AMD1600Mhz), PostgreSQL 7.3.2

100 -> 1000 by 100's.

IN List Query string length is 637
Last received via IN list: 99
Elapsed time for 100 IN list elements: 22 ms
4 elements/ms
Elapsed time for 100 parameterized elements: 117 ms
0 elements/ms
Last received via parameterized elements: 99
IN List Query string length is 1237
Last received via IN list: 199
Elapsed time for 200 IN list elements: 39 ms
5 elements/ms
Elapsed time for 200 parameterized elements: 169 ms
1 elements/ms
Last received via parameterized elements: 199
IN List Query string length is 1837
Last received via IN list: 299
Elapsed time for 300 IN list elements: 81 ms
3 elements/ms
Elapsed time for 300 parameterized elements: 136 ms
2 elements/ms
Last received via parameterized elements: 299
IN List Query string length is 2437
Last received via IN list: 399
Elapsed time for 400 IN list elements: 108 ms
3 elements/ms
Elapsed time for 400 parameterized elements: 157 ms
2 elements/ms
Last received via parameterized elements: 399
IN List Query string length is 3037
Last received via IN list: 499
Elapsed time for 500 IN list elements: 99 ms
5 elements/ms
Elapsed time for 500 parameterized elements: 232 ms
2 elements/ms
Last received via parameterized elements: 499
IN List Query string length is 3637
Last received via IN list: 599
Elapsed time for 600 IN list elements: 150 ms
4 elements/ms
Elapsed time for 600 parameterized elements: 182 ms
3 elements/ms
Last received via parameterized elements: 599
IN List Query string length is 4237
Last received via IN list: 699
Elapsed time for 700 IN list elements: 256 ms
2 elements/ms
Elapsed time for 700 parameterized elements: 219 ms
3 elements/ms
Last received via parameterized elements: 699
IN List Query string length is 4837
Last received via IN list: 799
Elapsed time for 800 IN list elements: 370 ms
2 elements/ms
Elapsed time for 800 parameterized elements: 245 ms
3 elements/ms
Last received via parameterized elements: 799
IN List Query string length is 5437
Last received via IN list: 899
Elapsed time for 900 IN list elements: 511 ms
1 elements/ms
Elapsed time for 900 parameterized elements: 265 ms
3 elements/ms
Last received via parameterized elements: 899

Curiously, the difference between local->local and local->remote here is
small. The breakeven point was about 650 elements, versus slightly closer
to 700 in local->local.


Local(Intel PIII-M 1Ghz)-to-REMOTE(AMD1600Mhz), PostgreSQL 7.3.2

1000 -> 10000 by 1000's.

IN List Query string length is 6037
Last received via IN list: 999
Elapsed time for 1000 IN list elements: 772 ms
1 elements/ms
Elapsed time for 1000 parameterized elements: 495 ms
2 elements/ms
Last received via parameterized elements: 999
IN List Query string length is 12037
Last received via IN list: 1999
Elapsed time for 2000 IN list elements: 3055 ms
0 elements/ms
Elapsed time for 2000 parameterized elements: 681 ms
2 elements/ms
Last received via parameterized elements: 1999
IN List Query string length is 18037
Last received via IN list: 2999
Elapsed time for 3000 IN list elements: 7205 ms
0 elements/ms
Elapsed time for 3000 parameterized elements: 888 ms
3 elements/ms
Last received via parameterized elements: 2999
IN List Query string length is 24037
Last received via IN list: 3999
Elapsed time for 4000 IN list elements: 12912 ms
0 elements/ms
Elapsed time for 4000 parameterized elements: 1162 ms
3 elements/ms
Last received via parameterized elements: 3999
IN List Query string length is 30037
Last received via IN list: 4999
Elapsed time for 5000 IN list elements: 21844 ms
0 elements/ms
Elapsed time for 5000 parameterized elements: 1461 ms
3 elements/ms
Last received via parameterized elements: 4999
IN List Query string length is 36037
Last received via IN list: 5999
Elapsed time for 6000 IN list elements: 32595 ms
0 elements/ms
Elapsed time for 6000 parameterized elements: 1780 ms
3 elements/ms
Last received via parameterized elements: 5999
IN List Query string length is 42037
Last received via IN list: 6999
Elapsed time for 7000 IN list elements: 46566 ms
0 elements/ms
Elapsed time for 7000 parameterized elements: 2027 ms
3 elements/ms
Last received via parameterized elements: 6999
IN List Query string length is 48037
java.sql.SQLException: ERROR:  out of free buffers: time to abort!

    at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:126)
    at org.postgresql.jdbc1.AbstractJdbc1Connection.ExecSQL(AbstractJdbc1Connection.java:451)
    at org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:281)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:48)
    at org.postgresql.jdbc1.AbstractJdbc1Statement.executeQuery(AbstractJdbc1Statement.java:144)
    at org.postgresql.jdbc1.AbstractJdbc1Statement.executeQuery(AbstractJdbc1Statement.java:132)
    at INList.main(INList.java:775)
*/


class INList
{
  public static void main(String[] args) {
    try {
      final int NROWS = 1000000 ;    // one million
      final int IDSTART = 10000 ;    // first ID value

      // ----------------  MySql driver -----------------
      // -classpath .:/usr/java/mysql-connector-java-2.0.14/mysql-connector-java-2.0.14-bin.jar
      // RH9: -classpath .:/home/dave/mysql-connector-java-3.0.8-stable/mysql-connector-java-3.0.8-stable-bin.jar
      //Class.forName("com.mysql.jdbc.Driver") ;
      //Connection conn = DriverManager.getConnection("jdbc:mysql:///test", "dave", "") ;
      //String innodb = " TYPE=INNODB" ;

      // --------------- postgres 7.3.2 distribution JDBC2 driver linked as /usr/java/postgresql-jdbc.jar
      // -classpath .:/usr/java/postgresql-jdbc.jar
      Class.forName("org.postgresql.Driver") ;
      Connection conn = DriverManager.getConnection("jdbc:postgresql:test", "dave", "") ;
      String innodb = "" ;

      conn.setAutoCommit(false) ;

      long oldmillis, newmillis, elapsed ;
      Statement statement = conn.createStatement() ;
      PreparedStatement pstatement ;
      ResultSet rs ;
      int lastReceived ;

      if (false) {
      statement.executeUpdate("CREATE TABLE MILLION (ID INTEGER PRIMARY KEY, VAL INTEGER)" + innodb) ;
      conn.commit() ;

      // Read that "all statements of a connection commit/rollback as a group, API certainly implies this
      // conn.commit() ; pstatement should potentially fail because table isn't there without this!

      // Create the rows
      pstatement = conn.prepareStatement("INSERT INTO MILLION (ID, VAL) VALUES(?, ?)") ;
      oldmillis = System.currentTimeMillis() ;
      for (int i = 0 ; i < NROWS ; i++) {
    pstatement.setInt(1, IDSTART+i) ;
    pstatement.setInt(2, i) ;
    pstatement.executeUpdate() ;
      }
      conn.commit() ;
      newmillis = System.currentTimeMillis() ;
      pstatement.close() ;
      System.out.println("Table update time: " + (newmillis-oldmillis) + " ms") ;
      rs = statement.executeQuery("SELECT COUNT(*) FROM MILLION") ;
      rs.first() ;
      if (rs.getInt(1) != NROWS)
    System.out.println("**** Expected " + (NROWS) + " rows, got " + rs.getInt(1)) ;
      conn.commit() ;

      // Determine largest IN LIST
      StringBuffer sb = new StringBuffer(8000*1024) ;
      boolean successful = false ;
      int tryAmount = NROWS ;
      while (!successful) {
    sb.setLength(0) ;
    sb.append("SELECT MAX(VAL) FROM MILLION WHERE ID IN (") ;
    for (int i = 0 ; i < tryAmount ; i++) {
      sb.append(i+IDSTART) ;
      if (i < tryAmount-1)
        sb.append(',') ;
    }
    sb.append(')') ;
    try {
      rs = statement.executeQuery(sb.toString()) ;
      successful = true ;
    }
    catch (SQLException e) {
      System.out.println("IN list failed for " + tryAmount + " entries, buffer size was " +
                 sb.length()) ;
      tryAmount -= 10000 ;
      if (tryAmount <= 0)
        break ;            // be done with the while loop
    }
      }                    // while (!successful)
      conn.commit() ;            // more for the next txn than this one, eliminate txn caching
      System.out.println("Largest IN list: " + tryAmount + " elements.") ;
      }                    // if (false)

      else {
    for (int tryAmount = 100 ; tryAmount < 1000 ; tryAmount+= 100) {
    StringBuffer sb = new StringBuffer(tryAmount*20) ;
    sb.append("SELECT VAL FROM MILLION WHERE ID IN (") ;
    for (int i = 0 ; i < tryAmount ; i++) {
      sb.append(i+IDSTART) ;
      if (i+1 < tryAmount)
        sb.append(',') ;
    }
    sb.append(')') ;

      // Try query with largest inlist, getting resultset with all values
      String query = sb.toString() ;
      System.out.println("IN List Query string length is " + query.length()) ;
      //query = query.replaceFirst("MAX(VAL)", "VAL") ;
      oldmillis = System.currentTimeMillis() ;
      rs = statement.executeQuery(query) ;
      lastReceived = -1 ;
      while (rs.next())
    lastReceived = rs.getInt(1) ;
      System.out.println("Last received via IN list: " + lastReceived) ;
      conn.commit() ;
      newmillis = System.currentTimeMillis() ;
      elapsed = newmillis - oldmillis ;
      System.out.println("Elapsed time for " + tryAmount + " IN list elements: " + elapsed + " ms") ;
      System.out.println((tryAmount/elapsed) + " elements/ms") ;

      // Try the query with prepared statements
      lastReceived = -1 ;
      pstatement = conn.prepareStatement("SELECT VAL FROM MILLION WHERE ID = ?") ;
      oldmillis = System.currentTimeMillis() ;
      for (int i = 0 ; i < tryAmount ; i++) {
    pstatement.setInt(1, i+IDSTART) ;
    rs = pstatement.executeQuery() ;
    if (!rs.first())
      System.out.println("Pstatement query failed") ;
    else
      lastReceived = rs.getInt(1) ;
      }
      conn.commit() ;
      newmillis = System.currentTimeMillis() ;
      pstatement.close() ;
      elapsed = newmillis - oldmillis ;
      System.out.println("Elapsed time for " + tryAmount + " parameterized elements: " + elapsed + " ms") ;
      System.out.println((tryAmount/elapsed) + " elements/ms") ;
      System.out.println("Last received via parameterized elements: " + lastReceived) ;
    } // for (tryAmount ...)
      }                    // else (if (false))

      // Done, delete our table
      //      statement.executeUpdate("DROP TABLE MILLION") ;
      conn.commit() ;
    }
    catch (Exception e) {
      e.printStackTrace() ;
    }
  }                    // main()
}                    // class INList

В списке pgsql-performance по дате отправления:

Предыдущее
От: Peter Lavender
Дата:
Сообщение: ...
Следующее
От: Shridhar Daithankar
Дата:
Сообщение: Re: IN list processing performance (yet again)