Обсуждение: Maybe a strange question, but: "How long may a cursor live?"
Hello everybody... I'm working on a web based application (using gnuJSP / JDBC) that needs to do queries like SELECT count(textattrib1), text_cat(textattrib1 || ' '), func1(textattrib2) FROM table1 WHERE textattrib2 >= 'foo' GROUP BY textattrib2 ORDER BY textattrib2 ASC LIMIT somelimit; with table1 holding roughly 80000 rows (probably growing some day), textattrib2 is indexed of course. Unfortunately the query above is intolerably slow (up to 30 seconds or so, 2 would be OK...). Just declaring a cursor for SELECT count(textattrib1), text_cat(textattrib1 || ' '), func1(textattrib2) FROM table1 GROUP BY textattrib2 ORDER BY textattrib2 ASC; didn't help much: fetching my entries from such a cursor is very fast indeed, but the *first* access to the cursor is still much too slow. So I'm wondering about alternatives. Maybe somebody can give a comment on these: 1. pooling cursors I've been playing with the idea of writing a bean that lives in the JSP application scope (and gets started with apache). That bean would then declare a pool of cursors for the query shown above and each client (read: session scope bean...) could request one for its session. A (very) crude prototype of this is running right now and yes: it is *very* fast. I'm worried however about the effects of having a number of cursors/transactions hanging around for a potentially very long time. Will temporary files eat my disk space away in a few hours? What about RAM and CPU ressources? Restarting apache and postgres (and that cursor pool with it) every night and having a few minutes of downtime would be not a big problem here... But still I'm feeling quite uncomfortable with the whole affair. 2. splitting up table1 by substr(textattrib2, 1, 2) This would result in smaller (faster) tables table1_aa, table1_ab,... BUT: in case table1_fo contains less than somelimit entries satisfying the query I want to union these entries with those from table1_fp,... until I reach somelimit. So I have to store information somewhere on what table comes next in the chain. Taking into account that entries are not distributed evenly with respect to substr(textattrib2, 1, 2) and that I don't want to have too many entries in each table to keep performance high... I guess I would pretty soon end up implementing trees (and tree traversals). I must confess that I'm somewhat shying away from that work ;-) (Could probably be easier if PL/pgSQL would allow for 'computed' tablenames?) 3. a *good* solution I've not found yet? Maybe I'm just thinking too complicated??? Any other ideas to this problem? Thanks in advance... Christian Fritze -- "The sky above the port was the color of television, tuned to a dead channel." -- W.G. --
A couple of ideas for you: Experiment with doing the GROUP BY within your code. Depending on a number of factors, it's sometimes faster. Experiment with doing the ORDER BY within your code. I've seen several cases where pulling the data into memory and sorting there was much, much faster than having the database do the sorting. Experiment with moving the text_cat and func1 operations into your Java code (rather than asking the DB to do them). This may or may not help, depending on a number of factors. The startup cost you're seeing is basically PG generating all of the results, GROUP BY, and ORDER BY and tucking the final answer away somewhere temporarily. The cursor then just steps through this temporary table. You'd probably gain a lot by building such a temporary table explicitly. That is, create a new "summary" table and periodically do the SELECT below and put the results into the temporary table. (You can even use a separate program outside of your web application that's run from 'cron') That way, you avoid the overhead of summarizing 80,000 results on every query. I've used this approach to provide rapid access to web site logs (tens of millions of records summarized down to a few thousand entries). You mention that table1 has 80000 rows but didn't mention how many rows there were after the aggregation (that is, how many distinct values of textattrib2 there were). For high-performance web applications, I've been just storing full data results in memory. You've talked about using a bean to keep a DB cursor around; why not just store the results? This looks a lot like a simple SortedMap from textattrib2 to textattrib1/count pairs. That's easy to store in memory. And it's hard to beat the speed. - Tim Kientzle Christian Fritze <The.Finn@sprawl.de> writes: > > I'm working on a web based application (using gnuJSP / JDBC) > that needs to do queries like > > SELECT count(textattrib1), text_cat(textattrib1 || ' '), func1(textattrib2) > FROM table1 WHERE textattrib2 >= 'foo' GROUP BY textattrib2 ORDER BY > textattrib2 ASC LIMIT somelimit; > > with table1 holding roughly 80000 rows (probably growing some day), > textattrib2 is indexed of course. > > Unfortunately the query above is intolerably slow (up to 30 seconds or > so, 2 would be OK...).
> Christian Fritze <The.Finn@sprawl.de> writes: > > > > I'm working on a web based application (using gnuJSP / JDBC) > > that needs to do queries like > > > > SELECT count(textattrib1), text_cat(textattrib1 || ' '), func1(textattrib2) > > FROM table1 WHERE textattrib2 >= 'foo' GROUP BY textattrib2 ORDER BY > > textattrib2 ASC LIMIT somelimit; > > > > with table1 holding roughly 80000 rows (probably growing some day), > > textattrib2 is indexed of course. > > > > Unfortunately the query above is intolerably slow (up to 30 seconds or > > so, 2 would be OK...). * Tim Kientzle <kientzle@acm.org> [001109 17:21] wrote: > A couple of ideas for you: > > Experiment with doing the GROUP BY within your code. > Depending on a number of factors, it's sometimes faster. > > Experiment with doing the ORDER BY within your code. > I've seen several cases where pulling the data into > memory and sorting there was much, much faster than > having the database do the sorting. You can increase postgresql's performance by tuning the amount of shared memory it allocates as well as how much memory it will use for "sort buffers" -B 32768 (~256MB shared segment) -o "-S 65534" (increases size of sort buffers, not sure how much though) -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."