Re: Critical performance problems on large databases
От | Gunther Schadow |
---|---|
Тема | Re: Critical performance problems on large databases |
Дата | |
Msg-id | 3CB5B521.4040909@aurora.regenstrief.org обсуждение исходный текст |
Ответ на | Critical performance problems on large databases (Gunther Schadow <gunther@aurora.regenstrief.org>) |
Ответы |
Re: Critical performance problems on large databases
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: Critical performance problems on large databases (Stephan Szabo <sszabo@megazone23.bigpanda.com>) Re: Critical performance problems on large databases (Lincoln Yeoh <lyeoh@pop.jaring.my>) Re: Critical performance problems on large databases (will trillich <will@serensoft.com>) |
Список | pgsql-general |
Hi, thanks everyone who responded. I want to set this whole COUNT(...) issue aside for a moment because I have reported hearsay instead of my own experiences. I apologize for muddying the waters with that COUNT issue. But I still want to respond to the SELECT * FROM Bigtable; issue once more. [[[I have to say I am a bit confused by some of the responses who basically shrugged the problem off sending the asker from anywhere between back to the schoolbooks to the mental facility. That's not necessary.]]] I am delighted to hear that one respondent has no problem with a 2M table and snappy response, but unfortunately he didn't say much about the detail, was that really select * from bigtable; queries or do we have where clauses and stuff that limits the result set considerably? A number of respondents confirmed my observations, so I think the problem is real. There was one remark about Perl or PHP always loading the complete result set before returning. Bad for them. I don't use either and I think it's just bad design to do that on the client but I don't care about bad clients. I care about a good server. The constructive responses suggested that I use LIMIT/OFFSET and CURSORs. I can see how that could be a workaround the problem, but I still believe that something is wrong with the PostgreSQL query executer. Loading the entire result set into a buffer without need just makes no sense. Good data base engines try to provide for parallel execution of the query plan as much as possible, and that implies streaming. There's a crowd of literature about this testifying for it's importance. The main reasons for this is (a) the use on multi-processor machines where one CPU does one task and the other does another task on the same query plan and the results from CPU 1 is streamed to CPU 2 (hey, I have a 6 processor machine in my basement.) Perhaps more importantly (b) buffering (without need) is inherently bad, because it wastes memory resources leads to bursty demand on CPU and network, and slow perceived response times. Buffering is a complete waste if the buffer is being paged out to disk again and it isn't flexible or scaleable if buffer pages are fixed into physical memory. Straming is especially important if you want to do distributed joins (and though pgsql doesn't support that yet it would be foolish to close your eyes before a fundamental problem and then being forced to rework this in a hurry when the time comes for distributed PostgreSQL.) So, while my client application might benefit from such things as cursors and OFFSET/LIMIT, the query planning and executing may suffer from the buffering. And of course, the point is that it makes sense to design the server such that streaming results to the client is transparent because it automatically relieves the strain on all resources, CPU, storage and network! Isn't that obvious? regards -Gunther -- Gunther Schadow, M.D., Ph.D. gschadow@regenstrief.org Medical Information Scientist Regenstrief Institute for Health Care Adjunct Assistant Professor Indiana University School of Medicine tel:1(317)630-7960 http://aurora.regenstrief.org
В списке pgsql-general по дате отправления: