Обсуждение: old server, new server, same performance
Hi I have a situation at my work which I simply don't understand and hope that here I can find some explanations. What is on the scene: A - old 'server' PC AMD Athlon64 3000+, 2GB RAM, 1 ATA HDD 150GB, Debian etch, postgresql 8.1.19 B - new server HP DL 360, 12GB RAM, Intel Xeon 8 cores CPU, fast SAS (mirrored) HDDs, Debian 64 bit, lenny, backported postgresql 8.1.19 C - our Windows application based on Postgresql 8.1 (not newer) and second role actors (for pgAdmin) D - my old Windows XP computer, Athlon64 X2 3800+, with 100Mbit ethernet E - new laptop with Ubuntu, 1000Mbit ethernet The goal: migrate postgresql from A to B. Simple and works fine (using pg_dump, psql -d dbname <bakcup_file). So what is the problem? My simple 'benchmarks' I have done with pgAdmin in spare time. pgAdmin is the latest 1.8.2 on both D and E. Using pgAdmin on my (D) computer I have run SELECT * from some_table; and noted the execution time on both A and B servers: - on A (the old one) about 120sec - on B (the new monster) about 120sec (???) (yes, there is almost no difference) On the first test runs the postgresql configs on both servers were the same, so I have started to optimize (according to postgresql wiki) the postgresql on the new (B) server. The difference with my simple select * were close to 0. So this is my first question. Why postgresql behaves so strangely? Why there is no difference in database speed between those two machines? I thought about hardware problem on B, but: hdparm shows 140MB/sec on B and 60MB on A (and buffered reads are 8GB on B and 800MB on A) bonnie++ on B: > Version 1.03d ------Sequential Output------ --Sequential Input- --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > malwa 24G 51269 71 49649 10 34974 6 48969 82 147840 13 1150 1 on A: > Version 1.03 ------Sequential Output------ --Sequential Input- --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > irys 4G 42961 93 41125 13 14414 3 20262 48 38487 5 167.0 0 Here the difference in writings is not so big (wonder why, the price between those machines is huge) but in readings are noticeably better on B. Ok, those were the tests done using my old Windows PC (D) computer. So I have decided to do the same using my new laptop with Ubuntu (E). The results were soooo strange that now I am completely confused. The same SELECT: - on A first (fresh) run 30sec, second (and so on) about 11sec (??!) - on B first run 80sec, second (and so on) about 80sec also What is going on here? About 8x faster on slower machine? One more thing comes to my mind. The A server has iso-8859-2 locale and database is set to latin2, the B server has utf8 locale, but database is still latin2. Does it matter anyway? So here I'm stuck and hope for help. Is there any bottleneck? How to find it? Regards Piotr
2010/5/14 Piotr Legiecki <piotrlg@sci.pam.szczecin.pl>: > So what is the problem? My simple 'benchmarks' I have done with pgAdmin in > spare time. > > pgAdmin is the latest 1.8.2 on both D and E. > Using pgAdmin on my (D) computer I have run SELECT * from some_table; and > noted the execution time on both A and B servers: So, any chance you'll run it like I asked: select count(*) from some_table; ?
On May 14, 2010, at 3:52 PM, Scott Marlowe wrote: > 2010/5/14 Piotr Legiecki <piotrlg@sci.pam.szczecin.pl>: >> So what is the problem? My simple 'benchmarks' I have done with pgAdmin in >> spare time. >> >> pgAdmin is the latest 1.8.2 on both D and E. >> Using pgAdmin on my (D) computer I have run SELECT * from some_table; and >> noted the execution time on both A and B servers: > > So, any chance you'll run it like I asked: > > select count(*) from some_table; > > ? I agree that select * is a very bad test and probably the problem here. Even if you do 'select * from foo' locally to avoidthe network and pipe it to /dev/null, it is _significantly_ slower than count(*) because of all the data serialization. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
Scott Marlowe pisze: > 2010/5/14 Piotr Legiecki <piotrlg@sci.pam.szczecin.pl>: >> So what is the problem? My simple 'benchmarks' I have done with pgAdmin in >> spare time. >> >> pgAdmin is the latest 1.8.2 on both D and E. >> Using pgAdmin on my (D) computer I have run SELECT * from some_table; and >> noted the execution time on both A and B servers: > > So, any chance you'll run it like I asked: > > select count(*) from some_table; > Sorry, but it was computer-less weekend ;-) So to answer all questions in one mail: 1. The database is autovacuumed, at first (the default debian setting) every one minute, than I have set it to one hour. 2. select count(*) from some_table; runs in a fraction of a second on the console on both servers (there are only 4000 records, the second longer table has 50000 but it does not matter very much). From pg_admin the results are: - slow server (and the longest table in my db) 938ms (first run) and about 40ms next ones - fast server 110ms first run, about 30ms next ones. Well, finally my new server deservers its name ;-) The later times as I understand are just cache readings from postgresql itself? 3. The configs. As noted earlier, at first test they were the same, later I started to optimize the faster server from the defaults to some higher values, without any significant gain. The package on slower server is just deb package from etch (4.0) repository, the one on fast server (which is newer lenny - 5.0) is compiled from deb source package on that server. fast server: http://pgsql.privatepaste.com/edf2ec36c3 slow server: http://pgsql.privatepaste.com/bdc141f0be 4. Machine. The new server has 5 SAS disks (+ 1 spare), but I don't remember how they are set up now (looks like mirror for system '/' and RAID5 for rest - including DB). size of the DB is 405MB So still I don't get this: select * from table; on old server takes 0,5 sec, on new one takes 6sec. Why there is so big difference? And it does not matter how good or bad select is to measure performance, because I don't measure the performance, I measure the relative difference. Somwhere there is a bottleneck. Regards P.
On Mon, May 17, 2010 at 2:06 AM, Piotr Legiecki <piotrlg@ams.edu.pl> wrote: > 2. select count(*) from some_table; runs in a fraction of a second on the > console on both servers (there are only 4000 records, the second longer > table has 50000 but it does not matter very much). From pg_admin the results > are: > - slow server (and the longest table in my db) 938ms (first run) and about > 40ms next ones > - fast server 110ms first run, about 30ms next ones. > Well, finally my new server deservers its name ;-) The later times as I > understand are just cache readings from postgresql itself? SNIP > So the server itself seems faster. > So still I don't get this: select * from table; on old server takes 0,5 sec, > on new one takes 6sec. Why there is so big difference? And it does not > matter how good or bad select is to measure performance, because I don't > measure the performance, I measure the relative difference. Somwhere there > is a bottleneck. Yep, the network I'd say. How fast are things like scp between the various machines? > 4. Machine. The new server has 5 SAS disks (+ 1 spare), but I don't remember > how they are set up now (looks like mirror for system '/' and RAID5 for rest > - including DB). size of the DB is 405MB Get off of RAID-5 if possible. A 3 Disk RAID-5 is the slowest possible combination for RAID-5 and RAID-5 is generally the poorest choice for a db server.
On Mon, May 17, 2010 at 2:10 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Mon, May 17, 2010 at 2:06 AM, Piotr Legiecki <piotrlg@ams.edu.pl> wrote: >> 2. select count(*) from some_table; runs in a fraction of a second on the >> console on both servers (there are only 4000 records, the second longer >> table has 50000 but it does not matter very much). From pg_admin the results >> are: >> - slow server (and the longest table in my db) 938ms (first run) and about >> 40ms next ones >> - fast server 110ms first run, about 30ms next ones. >> Well, finally my new server deservers its name ;-) The later times as I >> understand are just cache readings from postgresql itself? > SNIP >> So the server itself seems faster. >> So still I don't get this: select * from table; on old server takes 0,5 sec, >> on new one takes 6sec. Why there is so big difference? And it does not >> matter how good or bad select is to measure performance, because I don't >> measure the performance, I measure the relative difference. Somwhere there >> is a bottleneck. > > Yep, the network I'd say. How fast are things like scp between the > various machines? > >> 4. Machine. The new server has 5 SAS disks (+ 1 spare), but I don't remember >> how they are set up now (looks like mirror for system '/' and RAID5 for rest >> - including DB). size of the DB is 405MB > > Get off of RAID-5 if possible. A 3 Disk RAID-5 is the slowest > possible combination for RAID-5 and RAID-5 is generally the poorest > choice for a db server. I refer you to this classic post on the subject: http://www.mail-archive.com/pgsql-general@postgresql.org/msg93043.html
Whoops, wrong thread. On Mon, May 17, 2010 at 2:52 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Mon, May 17, 2010 at 2:10 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On Mon, May 17, 2010 at 2:06 AM, Piotr Legiecki <piotrlg@ams.edu.pl> wrote: >>> 2. select count(*) from some_table; runs in a fraction of a second on the >>> console on both servers (there are only 4000 records, the second longer >>> table has 50000 but it does not matter very much). From pg_admin the results >>> are: >>> - slow server (and the longest table in my db) 938ms (first run) and about >>> 40ms next ones >>> - fast server 110ms first run, about 30ms next ones. >>> Well, finally my new server deservers its name ;-) The later times as I >>> understand are just cache readings from postgresql itself? >> SNIP >>> So the server itself seems faster. >>> So still I don't get this: select * from table; on old server takes 0,5 sec, >>> on new one takes 6sec. Why there is so big difference? And it does not >>> matter how good or bad select is to measure performance, because I don't >>> measure the performance, I measure the relative difference. Somwhere there >>> is a bottleneck. >> >> Yep, the network I'd say. How fast are things like scp between the >> various machines? >> >>> 4. Machine. The new server has 5 SAS disks (+ 1 spare), but I don't remember >>> how they are set up now (looks like mirror for system '/' and RAID5 for rest >>> - including DB). size of the DB is 405MB >> >> Get off of RAID-5 if possible. A 3 Disk RAID-5 is the slowest >> possible combination for RAID-5 and RAID-5 is generally the poorest >> choice for a db server. > > I refer you to this classic post on the subject: > http://www.mail-archive.com/pgsql-general@postgresql.org/msg93043.html
Scott Marlowe pisze: >>> So still I don't get this: select * from table; on old server takes 0,5 sec, >>> on new one takes 6sec. Why there is so big difference? And it does not >>> matter how good or bad select is to measure performance, because I don't >>> measure the performance, I measure the relative difference. Somwhere there >>> is a bottleneck. >> Yep, the network I'd say. How fast are things like scp between the >> various machines? Sure it is, but not in a way one could expect: - scp from 1000Gbit laptop to old server 27MB/sec - scp from the same laptop to new server 70MB/sec Both servers have 1000Gbit connection. So it is still mysterious why old server makes 9x faster select? I don't claim that something is slow on new (or even older) server. Not at all. the application works fine (still on older machine). I only wonder about those differences. >>> 4. Machine. The new server has 5 SAS disks (+ 1 spare), but I don't remember >>> how they are set up now (looks like mirror for system '/' and RAID5 for rest >>> - including DB). size of the DB is 405MB >> Get off of RAID-5 if possible. A 3 Disk RAID-5 is the slowest >> possible combination for RAID-5 and RAID-5 is generally the poorest >> choice for a db server. Sure I know that RAID-5 is slower than mirror but anyway how much slower? And for sure not as much as single ATA disk. > I refer you to this classic post on the subject: > http://www.mail-archive.com/pgsql-general@postgresql.org/msg93043.html Well, this thread is about benchmarking databases (or even worse, comparison between two RDBMS). I'm not benchmarking anything, just compare one factor. P.
On Mon, May 17, 2010 at 3:52 AM, Piotr Legiecki <piotrlg@ams.edu.pl> wrote: > Scott Marlowe pisze: > >>>> So still I don't get this: select * from table; on old server takes 0,5 >>>> sec, >>>> on new one takes 6sec. Why there is so big difference? And it does not >>>> matter how good or bad select is to measure performance, because I don't >>>> measure the performance, I measure the relative difference. Somwhere >>>> there >>>> is a bottleneck. >>> >>> Yep, the network I'd say. How fast are things like scp between the >>> various machines? > > Sure it is, but not in a way one could expect: > - scp from 1000Gbit laptop to old server 27MB/sec > - scp from the same laptop to new server 70MB/sec > Both servers have 1000Gbit connection. So it is still mysterious why old > server makes 9x faster select? > I don't claim that something is slow on new (or even older) server. Not at > all. the application works fine (still on older machine). I only wonder > about those differences. Is one connecting via SSL? Is this a simple flat switched network, or are these machines on different segments connected via routers? >>>> 4. Machine. The new server has 5 SAS disks (+ 1 spare), but I don't >>>> remember >>>> how they are set up now (looks like mirror for system '/' and RAID5 for >>>> rest >>>> - including DB). size of the DB is 405MB >>> >>> Get off of RAID-5 if possible. A 3 Disk RAID-5 is the slowest >>> possible combination for RAID-5 and RAID-5 is generally the poorest >>> choice for a db server. > > Sure I know that RAID-5 is slower than mirror but anyway how much slower? > And for sure not as much as single ATA disk. Actually, given the amount of read read / write write RAID5 does, it can be slower than a single drive, by quite a bit. A mirror set only reads twice as fast, it writes the same speed as a single disk. RAID-5 is antithetical to good db performance (unless you hardly ever write). > >> I refer you to this classic post on the subject: >> http://www.mail-archive.com/pgsql-general@postgresql.org/msg93043.html > > Well, this thread is about benchmarking databases (or even worse, comparison > between two RDBMS). I'm not benchmarking anything, just compare one factor. That was a mis-post...
Scott Marlowe pisze: > Is one connecting via SSL? Is this a simple flat switched network, or > are these machines on different segments connected via routers? SSL is disabled. It is switched network, all tested computers are in the same segment. Finally I have switched the production database from old server to new one and strange things happened. The same query on new server I have used before with 30sec results now runs about 9 sec (so the same time as the old production server). Hm... Is it possible that database under some load performs better because it makes some something that database not used is not doing? P.