Обсуждение: Drupal and PostgreSQL - performance issues?
Hi there, I've been toying with using PostgreSQL for some of my Drupal sites for some time, and after his session at OpenSourceDays in Copenhagen last weekend, Magnus Hagander told me that there a quite a few in the PostgreSQL community using Drupal. I have been testing it a bit performance-wise, and the numbers are worrying. In my test, MySQL (using InnoDB) had a 40% lead in performance, but I'm unsure whether this is indicative for PostgreSQL performance in general or perhaps a misconfiguration on my part. In any case, if anyone has any tips, input, etc. on how best to configure PostgreSQL for Drupal, or can find a way to poke holes in my analysis, I would love to hear your insights :) The performance test results can be found on my blog: http://mikkel.hoegh.org/blog/2008/drupal_database_performance_mysql_and_postgresql_compared -- Kind regards, Mikkel Høgh <mikkel@hoegh.org>
Вложения
On Oct 12, 2008, at 11:57 PM, Mikkel Høgh wrote: > In any case, if anyone has any tips, input, etc. on how best to > configure PostgreSQL for Drupal, or can find a way to poke holes in > my analysis, I would love to hear your insights :) I just came across this article about moving Drupal from MySQL to PostgreSQL because of MyISAM data corruption and InnoDB was too slow. http://groups.drupal.org/node/15793 John DeSoi, Ph.D.
On Sun, Oct 12, 2008 at 9:57 PM, Mikkel Høgh <mikkel@hoegh.org> wrote: > Hi there, > > I've been toying with using PostgreSQL for some of my Drupal sites for some > time, and after his session at OpenSourceDays in Copenhagen last weekend, > Magnus Hagander told me that there a quite a few in the PostgreSQL community > using Drupal. > > I have been testing it a bit performance-wise, and the numbers are worrying. > In my test, MySQL (using InnoDB) had a 40% lead in performance, but I'm > unsure whether this is indicative for PostgreSQL performance in general or > perhaps a misconfiguration on my part. The test you're running is far too simple to tell you which database will actually be faster in real world usage. No updates, no inserts, no interesting or complex work goes into just delivering the front page over and over. I suggest you invest some time learning how to drive a real load testing tool like jmeter and build realistic test cases (with insert / update / delete as well as selects) and then see how the databases perform with 1, 2, 5, 10, 50, 100 consecutive threads running at once. Without a realistic test scenario and with no connection pooling and with no performance tuning, I don't think you should make any decisions right now about which is faster. It may well be that in a more realistic testing that mysql keeps up through 5 or 10 client connections then collapses at 40 or 50, while pgsql keeps climbing in performance. This is the performance curve I'm used to seeing from both dbs under heavy load. In simple terms, you're kicking the tires and making a decision based on that.
* Mikkel Høgh (mikkel@hoegh.org) wrote: > I have been testing it a bit performance-wise, and the numbers are > worrying. In my test, MySQL (using InnoDB) had a 40% lead in > performance, but I'm unsure whether this is indicative for PostgreSQL > performance in general or perhaps a misconfiguration on my part. The comments left on your blog would probably be a good first step, if you're not doing them already.. Connection pooling could definitely help if you're not already doing it. Drupal's MySQL-isms don't help things either, of course. Also, you don't post anything about the PostgreSQL config, nor the hardware it's running on. The default PostgreSQL config usually isn't appropriate for decent hardware and that could be a contributing factor here. It would also be useful to make sure you've analyze'd your tables and didn't just do a fresh load w/o any statistics having been gathered. We run Drupal on PostgreSQL for an internal site and it works reasonably well. We havn't had any performance problems but it's not a terribly large site either. The issues we've had tend to come from PostgreSQL's somewhat less-than-supported status with Drupal. I've been meaning to look into Drupal's PG support to see about improving it. Perhaps this winter I'll get a chance to. Thanks, Stephen
Вложения
Alright, my benchmarks might have been a bit naïve. When it comes to hardware, my webserver is a SunFire X2100 with an Opteron 1210 Dual Core and 4 GB DDR2 RAM, running 64-bit Ubuntu Linux Server 8.04 LTS. When it comes to the resource usage section of my postgresql.conf, the only thing that are not commented out are: shared_buffers = 24MB max_fsm_pages = 153600 I freely admit that the reason I haven't messed with these values is that I have next to no clue what the different things do and how they affect performance, so perhaps an apology is in order. As Scott wrote, "Without a realistic test scenario and with no connection pooling and with no performance tuning, I don't think you should make any decisions right now about which is faster". My apologies. -- Kind regards, Mikkel Høgh <mikkel@hoegh.org> On 13/10/2008, at 06.54, Stephen Frost wrote: > * Mikkel Høgh (mikkel@hoegh.org) wrote: >> I have been testing it a bit performance-wise, and the numbers are >> worrying. In my test, MySQL (using InnoDB) had a 40% lead in >> performance, but I'm unsure whether this is indicative for PostgreSQL >> performance in general or perhaps a misconfiguration on my part. > > The comments left on your blog would probably be a good first step, if > you're not doing them already.. Connection pooling could definitely > help if you're not already doing it. Drupal's MySQL-isms don't help > things either, of course. > > Also, you don't post anything about the PostgreSQL config, nor the > hardware it's running on. The default PostgreSQL config usually isn't > appropriate for decent hardware and that could be a contributing > factor > here. It would also be useful to make sure you've analyze'd your > tables > and didn't just do a fresh load w/o any statistics having been > gathered. > > We run Drupal on PostgreSQL for an internal site and it works > reasonably > well. We havn't had any performance problems but it's not a terribly > large site either. The issues we've had tend to come from > PostgreSQL's > somewhat less-than-supported status with Drupal. > > I've been meaning to look into Drupal's PG support to see about > improving it. Perhaps this winter I'll get a chance to. > > Thanks, > > Stephen
Вложения
On Sun, 12 Oct 2008, Scott Marlowe wrote: > It may well be that in a more realistic testing that mysql keeps up > through 5 or 10 client connections then collapses at 40 or 50, while > pgsql keeps climbing in performance. One of the best pro-PostgreSQL comparisons showing this behavior is at http://tweakers.net/reviews/649/7 MySQL owns that benchmark until you hit 40 users, then...ouch. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Well, in that benchmark, what you say is only true for the Niagara processors. On the Opteron page, MySQL performance only drops slightly as concurrency passes 50. MySQL might have a problem with Niagara, but it doesn't seem like it has the severe concurrency vulnerability you speak of. There are many reasons to pick PostgreSQL, but this one doesn't seem to be a general thing. In general, MySQL seems to have problems with some kinds of threading, since their perfomance on Mac OS X is crappy as well for that reason. -- Kind regards, Mikkel Høgh <mikkel@hoegh.org> On 13/10/2008, at 10.43, Greg Smith wrote: > On Sun, 12 Oct 2008, Scott Marlowe wrote: > >> It may well be that in a more realistic testing that mysql keeps up >> through 5 or 10 client connections then collapses at 40 or 50, >> while pgsql keeps climbing in performance. > > One of the best pro-PostgreSQL comparisons showing this behavior is > at http://tweakers.net/reviews/649/7 MySQL owns that benchmark until > you hit 40 users, then...ouch. > > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com > Baltimore, MD
Вложения
On Mon, 13 Oct 2008, Mikkel H�gh wrote: > Well, in that benchmark, what you say is only true for the Niagara > processors. On the Opteron page, MySQL performance only drops slightly as > concurrency passes 50. That's partly because the upper limit on the graph only goes to 100 concurrent processes. Since the Opterons are faster, that's not a broad enough scale to see how fast the right edge of the MySQL curve falls. You are right that the Niagara processors have a sharper decline than the more traditional platforms. The MySQL 5.0.20a graphs at http://tweakers.net/reviews/657/6 has a nice comparison graph showing a few different architectures that's also interesting. Anyway, you don't actually have to believe any of this; you've got a testbed to try for yourself if you just crank the user count up. The main thing I was trying to suggest is that MySQL being a bit faster at 5 users is not unusual, but it's not really representative of which performs better either. > In general, MySQL seems to have problems with some kinds of threading, > since their perfomance on Mac OS X is crappy as well for that reason. One of the reasons (but by no means not the only one) that PostgreSQL uses a multi-process based architecture instead of a threaded one is because thread library quality varies so much between platforms. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, Oct 13, 2008 at 11:57 AM, Mikkel Høgh <mikkel@hoegh.org> wrote: > In any case, if anyone has any tips, input, etc. on how best to configure > PostgreSQL for Drupal, or can find a way to poke holes in my analysis, I > would love to hear your insights :) It'd be more accurate to configure Drupal for PostgreSQL. We use PostgreSQL for almost everything, including many drupal sites, but the usage pattern of Drupal puts PostgreSQL at a disadvantage. In short, Drupal issues a lot of small, simple SQL (100+ is the norm), that makes tuning hard. To make it faster, you'd need to turn on Drupal's caches (and PHP opcode caches) to reduce the number of SQLs issued. To get even better numbers, you'd need to get Drupal to use memcached instead of calling PostgreSQL for the simple lookups. You can use the devel module in Drupal to have a look at the SQLs issued. Not pretty, IMHO. See: http://2bits.com/articles/benchmarking-postgresql-vs-mysql-performance-using-drupal-5x.html http://2bits.com/articles/advcache-and-memcached-benchmarks-with-drupal.html The most promising Drupal performance module for performance looks like: http://drupal.org/project/cacherouter (900 req/s!) but I haven't got the chance to give it a go yet. I'm a die-hard PostgreSQL and Drupal supporter, but in this case, I concede straight up Drupal+MySQL will always be faster than Drupal+PostgreSQL because of the way Drupal uses the database. We still use PostgreSQL for our Drupal sites though, because while it's slower, it's plenty fast enough.
On Mon, Oct 13, 2008 at 12:00 AM, Mikkel Høgh <mikkel@hoegh.org> wrote: > Alright, my benchmarks might have been a bit naïve. > When it comes to hardware, my webserver is a SunFire X2100 with an Opteron > 1210 Dual Core and 4 GB DDR2 RAM, running 64-bit Ubuntu Linux Server 8.04 > LTS. > > When it comes to the resource usage section of my postgresql.conf, the only > thing that are not commented out are: > shared_buffers = 24MB > max_fsm_pages = 153600 Well, 24MB is pretty small. See if you can increase your system's shared memory and postgresql's shared_buffers to somewhere around 256M to 512M. It likely won't make a big difference in this scenario, but overall it will definitely help. > I freely admit that the reason I haven't messed with these values is that I > have next to no clue what the different things do and how they affect > performance, so perhaps an apology is in order. As Scott wrote, "Without a > realistic test scenario and with no connection pooling and with no > performance tuning, I don't think you should make any decisions right now > about which is faster". My apologies. No need for apologies. You're looking for the best database for drupal, and you're asking questions and trying to test to see which one is best. You just need to look deeper is all. I would, however, posit that you're putting the cart before the horse by looking at performance first, instead of reliability. On a machine with properly functioning hardware, postgresql is nearly indestructable. MySQL has a lot of instances in time where, if you pull the plug / lose power it will scramble your db / lose part or all of your data. Databases are supposed to be durable. InnoDB, the table handler, is pretty good, but it's surrounded by a DB that was designed for speed not reliability. There was a time when Microsoft was trying to cast IIS as faster than Apache, so they released a benchmark showing IIS being twice as fast as apache at delivering static pages. Let's say it was 10mS for apache and 2mS for IIS. Seems really fast. Problem is, static pages are cheap to deliver. I can buy a $500 server to serve the static content and if I need more speed, I can throw more servers at the problem for $500, no OS license fees. But for dynamic content, the difference was the other way around, and the delivery times were much higher for IIS, like 50mS for apache and 250mS for IIS. Suddenly, a handful of dynamic pages and the IIS server was noticeably slower. The same type of comparison tends to hold true for MySQL versus PostgreSQL. MySQL tends to be very very fast at "select * from table where id=5" while PostgreSQL is much faster at 4 page long reporting queries with 5 levels of subselects and a couple of unions. Things that make MySQL run so slow as to be useless. Also, PostgreSQL tends to keep better read performance as the number of writes increase. This is the real test, so the point I was making before about realistic tests is very important. It's about graceful degradation. PostgreSQL has it, and when your site is getting 20 times the traffic you ever tested for, it's a little late to figure out you might have picked the wrong DBMS. Note I'm not saying MySQL is the wrong choice, I'm saying you don't know because you haven't proven it capable.
On Mon, Oct 13, 2008 at 8:19 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > There was a time when Microsoft was trying to cast IIS as faster than > Apache, so they released a benchmark showing IIS being twice as fast > as apache at delivering static pages. Let's say it was 10mS for > apache and 2mS for IIS. Dyslexia strikes again! That was supposed to be 5mS... anywho.
On Monday 13 October 2008 15:19:07 Scott Marlowe wrote: > > > shared_buffers = 24MB > > max_fsm_pages = 153600 > > Well, 24MB is pretty small. See if you can increase your system's > shared memory and postgresql's shared_buffers to somewhere around 256M > to 512M. It likely won't make a big difference in this scenario, but > overall it will definitely help. I noted after reading earlier messages in the thread, that my distro documents that the values it default to for shared_buffers is rather small. One of our servers is fairly pressed for memory (some of the time). Is there any way to measure the amount of churn in the shared_buffers, as a way of demonstrating that more is needed (or at this moment more would help)? A few very small databases on this server, and one which is 768M (still pretty small but a lot bigger than the rest, most of which is logging information). The only "hot" information is the session table, ~9000 lines, one index on the session id. Can I ask Postgres to tell me, or estimate, how much memory this table would occupy if fully cached in memory? Half the problem in modern computing is knowing what is "slow". In this case, counting the rows of the session table takes about 100ms. Deleting expired session rows about 120ms, more if it hasn't done it for a while, which is I guess evidence that table isn't being cached in memory as efficiency as it could be. In this case the server thinks the system I/O is zero for half the tools in use, because of the RAID hardware, so most of the Linux based tools are useless in this context. At the risk of thread hijacking, for the session table I wonder if we are handling it the most efficient way. It is just a regular table, indexed on session_id. Each request of note to the server requires retrieval of the session record, and often updating the expiry information. Every N requests the application also issues a: DELETE FROM sessions WHERE expires<NOW() OR expires IS NULL; Since there is no index on the table, it sequentially scans, and deletes the stale records. I'm thinking since it is indexed for regular queries, making N larger has almost no obvious penalty except we accumulate a small number of stale records for longer. I'm not sure if an index on expires is worth it, probably too small to make much difference either way. As for Drupal on Postgres, it might be worth the effort for big implementations, I did it for a while, but doing it again I'd go with MySQL. Nothing to do with the database, everything to do with support for 3rd party add-ins. Till Drupal gets to the automated testing of these things routinely against different backends and configs...... Perhaps that is all that is needed, a service for Drupal authors that tries their plugins against Postgres automatically and complains if it doesn't work?
On Mon, 13 Oct 2008, Simon Waters wrote: > One of our servers is fairly pressed for memory (some of the time). Is there > any way to measure the amount of churn in the shared_buffers, as a way of > demonstrating that more is needed (or at this moment more would help)? If you wander to http://www.westnet.com/~gsmith/content/postgresql/ my "Inside the PostgreSQL Buffer Cache" presentation goes over this topic in extreme detail. > Can I ask Postgres to tell me, or estimate, how much memory this table > would occupy if fully cached in memory? http://wiki.postgresql.org/wiki/Disk_Usage gives an example showing all the biggest tables/indexes in your data, and links to an article giving examples of how to find the size of all sorts of things. One of the queries in my presentation even shows you what % of each table is actually being cached by the dedicated database memory. You also need to consider the OS buffer cache to get the full picture, which is a bit more complicated; http://www.kennygorman.com/wordpress/?p=250 gives an example there you might be able to use. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD