Обсуждение: Deathly slow performance on SMP red-hat system
I have a Red Hat ELsmp system running on a dual opteron hp server with 16 gigs of memory. I believe the RH installation is straight out of the box. I've compiled postgres 8.1.4 from sources. The problem is that query performance is horrible, and I can't think why, although it seems clear the problem is outside the query engine. For instance, "select 1" takes on the order of 100ms. Explain analyze doesn't shed much light on why this might be: explain analyze select 1; QUERY PLAN ------------------------------------------------------------------------------------ Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.006 rows=1 loops=1) Total runtime: 0.059 ms (2 rows) But if I turn on duration logging, I get timings like LOG: duration: 91.480 ms I was able to pg_reload 80 gigs of data in a reasonable amount of time, but individual queries are taking a long time. I'm wondering if anyone's seen anything like this. -- Patrick TJ McPhee North York Canada ptjm@interlog.com
Patrick TJ McPhee wrote: > I have a Red Hat ELsmp system running on a dual opteron hp server with > 16 gigs of memory. I believe the RH installation is straight out of the > box. I've compiled postgres 8.1.4 from sources. > > The problem is that query performance is horrible, and I can't think > why, although it seems clear the problem is outside the query engine. > For instance, "select 1" takes on the order of 100ms. Explain analyze > doesn't shed much light on why this might be: > > explain analyze select 1; > QUERY PLAN > ------------------------------------------------------------------------------------ > Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.006 rows=1 loops=1) > Total runtime: 0.059 ms > (2 rows) > > But if I turn on duration logging, I get timings like > LOG: duration: 91.480 ms > > I was able to pg_reload 80 gigs of data in a reasonable amount of time, but > individual queries are taking a long time. > > I'm wondering if anyone's seen anything like this. Vacuum? Analyze? Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Patrick,
Have you modified your postgresql.conf file or is this default out-of-the-box too? Are the database and database logs (pg_xlog) on shared disc space?
Steve
Have you modified your postgresql.conf file or is this default out-of-the-box too? Are the database and database logs (pg_xlog) on shared disc space?
Steve
On 8/28/06, Patrick TJ McPhee <ptjm@interlog.com> wrote:
I have a Red Hat ELsmp system running on a dual opteron hp server with
16 gigs of memory. I believe the RH installation is straight out of the
box. I've compiled postgres 8.1.4 from sources.
The problem is that query performance is horrible, and I can't think
why, although it seems clear the problem is outside the query engine.
For instance, "select 1" takes on the order of 100ms. Explain analyze
doesn't shed much light on why this might be:
explain analyze select 1;
QUERY PLAN
------------------------------------------------------------------------------------
Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.006 rows=1 loops=1)
Total runtime: 0.059 ms
(2 rows)
But if I turn on duration logging, I get timings like
LOG: duration: 91.480 ms
I was able to pg_reload 80 gigs of data in a reasonable amount of time, but
individual queries are taking a long time.
I'm wondering if anyone's seen anything like this.
--
Patrick TJ McPhee
North York Canada
ptjm@interlog.com
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Patrick TJ McPhee wrote: > I have a Red Hat ELsmp system running on a dual opteron hp server with > 16 gigs of memory. I believe the RH installation is straight out of the > box. I've compiled postgres 8.1.4 from sources. > > The problem is that query performance is horrible, and I can't think > why, although it seems clear the problem is outside the query engine. > For instance, "select 1" takes on the order of 100ms. Explain analyze > doesn't shed much light on why this might be: I've had this problem on a old version of redhat. It turned out that the driver included with redhat for the raid-controller was very slow - i think that it was fsync-performance which was really bad with that driver. The controller was as adaptec aacraid - in the end, we used debian, and a recent 2.6 kernel, and things were much faster. greetings, Florian Pflug
In article <44F44F44.1010104@commandprompt.com>, Joshua D. Drake <jd@commandprompt.com> wrote: % Patrick TJ McPhee wrote: [...] [the query is "select 1"] % > But if I turn on duration logging, I get timings like % > LOG: duration: 91.480 ms [...] % Vacuum? Analyze? I had autovacuum on initially, but turned it off. The slowness was in evidence from the point the data was loaded, when presumably vacuum would be superfluous. The data is analyzed. Right now, I'm not so worried about my real data as "select 1" being two orders of magnitude slower than I'd expect it to be. Steve Poe asked if I've modified postgresql.conf, and if the database and logs are on separate volumes. The .conf file has the memory parameters (shared buffers, work mem, effective cache size, etc) bumped up quite a bit. We have the block size set to 16k and the statistics target has been increased from the default. I had some of the planner costs adjusted as well, but they don't seem to be material to the problem. It's basically a copy of the .conf file that's working well in production on similar hardware under NetBSD. The logs and data are all one file system, which seems to be on a logical volume with a single disk sitting under it. Florian Pflug reports that he had a similar problem due to a slow RAID controller driver, to which I have no comment. Thanks for your comments. -- Patrick TJ McPhee North York Canada ptjm@interlog.com
Patrick TJ McPhee wrote: > % Patrick TJ McPhee wrote: > [...] > [the query is "select 1"] > % > But if I turn on duration logging, I get timings like > % > LOG: duration: 91.480 ms > > The logs and data are all one file system, which seems to be on a logical > volume with a single disk sitting under it. > > Florian Pflug reports that he had a similar problem due to a slow RAID > controller driver, to which I have no comment. You could try doing: begin; select 1; select 1; ... rollback; If this is faster, than it's committing a transaction which is slow - remember that a statement not wrapped in begin/commit will cause postgres to start a transaction, execute the statement, and commit afterwards. If this is slow too, then I'd suggest playing with postgresql.conf parameters - e.g. try turning any logging, and the statistics collector off. If that doesn't reveal a possible cause, then I'd suggest that you strace the backend you're connected to, and try to see where it's spending it's time. Since 100ms for a simple "select 1;" is way out of bounds, I bet that it's some syscall that's taking up all the time - probably either network or disk related. greetings, Florian Pflug > > Thanks for your comments. >
In article <44F87893.9050400@phlo.org>, Florian G. Pflug <fgp@phlo.org> wrote: % You could try doing: % % begin; % select 1; % select 1; % ... % rollback; Yes, I tried that, but each select is on the order of 100ms. % off. If that doesn't reveal a possible cause, then I'd suggest that % you strace the backend you're connected to, and try to see where it's % spending it's time. Since 100ms for a simple "select 1;" is way out I tried that, but I don't have the results handy, and the machine's not responding now. I remember the first time I looked at the result, I thought it was spending an inordinate amount of time in gettimeofday, but I also had an strace which showed time in send, which is where I think the problem is. I'm pretty strongly convinced that the time is being spent sending data to the client. Actually I measured the full round trip and found it was taking on the order of 200ms, which is consistent with no time at all to perform the select, but 100ms each to send the query to the db and the result to the client. I think I need to checkout the socket layer on this machine. -- Patrick TJ McPhee North York Canada ptjm@interlog.com