Обсуждение: Strange slow database
Hi all, Recently our databases started to experience a significant slowdown. Queries that were taking 500ms now take up to 20 seconds. An insert can take 150ms or more. This is strange since we are still hitting indexes and we vacuum regularly. Here is the description of our system: A single server has a medium size database with around 30-40 tables. Some tables have ~1000 rows. These tables tend to have lots of inserts and deletes, so we vacuum them regularly (every ten minutes). One table has 15 million rows, but it only every has inserts, never updated or deleted, so this table never gets vacuumed. Vacuuming is very low impact: vacuum_cost_delay = 50 vacuum_cost_limit = 100 We have fsync = true and max_connections = 150. There are 12 servers each with the same schema and config, but different data (although the number of rows in the tables are very similar). We assign users to a server when we set them up so all the data for a ser is on one server. This whole setup typically works well and is speedy, we have checked all the indexes and they are hit when appropriate. The table with 15million is the table on which inserts can take 150ms or more (I've seen 800ms for a single insert before). Strangely, this slowdown is on all 12 servers. Perhaps this is a configuration issue? If anyone has any ideas we'd love to hear them. Since we vacuum regularly and all the indexes get hit I don't know where to go next. Machines are Pentium 4 3.2Ghz running Linux kernel 2.6.10. Postgres version 8.0.1. Machines have 2Gb ram and two 10k RPM disks in a RAID-0 configuration. Regards -- David Mitchell Software Engineer Telogis
On Fri, Nov 18, 2005 at 10:05:47AM +1300, David Mitchell wrote: > Strangely, this slowdown is on all 12 servers. Perhaps this is a > configuration issue? If anyone has any ideas we'd love to hear them. It sounds like a use-pattern issue. Did something in your use change? Any time you get repeatable change like that, look for the common changed factor. A -- Andrew Sullivan | ajs@crankycanuck.ca When my information changes, I alter my conclusions. What do you do sir? --attr. John Maynard Keynes
David Mitchell <david.mitchell@telogis.com> writes: > Recently our databases started to experience a significant slowdown. > Queries that were taking 500ms now take up to 20 seconds. An insert can > take 150ms or more. This is strange since we are still hitting indexes > and we vacuum regularly. Here is the description of our system: Have you checked to see whether you are encountering table or index bloat? Your vacuuming policy doesn't sound unreasonable on its face, but there's no substitute for actually looking at physical file sizes when you aren't sure what's wrong. Try doing a pg_dump and reload into a virgin database, then compare file sizes with the original table-by-table to see if anything is drastically out of line. Don't forget to look at the system catalogs too. regards, tom lane
Tom Lane wrote: > Have you checked to see whether you are encountering table or index > bloat? Your vacuuming policy doesn't sound unreasonable on its face, > but there's no substitute for actually looking at physical file sizes > when you aren't sure what's wrong. Try doing a pg_dump and reload into > a virgin database, then compare file sizes with the original > table-by-table to see if anything is drastically out of line. > Don't forget to look at the system catalogs too. I dumped and loaded then wrote a script to compare the sizes of tables. I noticed that a few of the indexes, especially the indexes on the 15million table shrunk a lot (~50% reduction - ~400Mb). Also, the pg_attribute_relid_attnam_index index shrank loads, to 1% of its original size. From this, we tried reindexing indexes and found we got the same reduction in index size. Strangely though, our queries appear to have sped up not just on the servers we reindexed, but across the whole cluster - so apparently something else is affecting the speed of this all, not just bloat. We're scratching our heads, this is nothing if not a little frustrating. -- David Mitchell Software Engineer Telogis
David Mitchell <david.mitchell@telogis.com> writes: > I dumped and loaded then wrote a script to compare the sizes of tables. > I noticed that a few of the indexes, especially the indexes on the > 15million table shrunk a lot (~50% reduction - ~400Mb). Also, the > pg_attribute_relid_attnam_index index shrank loads, to 1% of its > original size. Hm, better check whether your vacuuming policy is taking care of the system catalogs ... > From this, we tried reindexing indexes and found we got > the same reduction in index size. Strangely though, our queries appear > to have sped up not just on the servers we reindexed, but across the > whole cluster - so apparently something else is affecting the speed of > this all, not just bloat. Odd indeed. regards, tom lane
Tom Lane wrote: >>Also, the >>pg_attribute_relid_attnam_index index shrank loads, to 1% of its >>original size. > > Hm, better check whether your vacuuming policy is taking care of the > system catalogs ... Yes, fair point. So I did: 18/11/2005 03:02:29 INFO: vacuuming "pg_catalog.pg_attribute" 18/11/2005 03:02:29 INFO: index "pg_attribute_relid_attnam_index" now contains 2861 row versions in 11900 pages 18/11/2005 03:02:29 DETAIL: 11834 index pages have been deleted, 11834 are currently reusable. 18/11/2005 03:02:29 CPU 0.11s/0.03u sec elapsed 0.45 sec. 18/11/2005 03:02:30 INFO: index "pg_attribute_relid_attnum_index" now contains 2861 row versions in 2942 pages 18/11/2005 03:02:30 DETAIL: 2917 index pages have been deleted, 2917 are currently reusable. 18/11/2005 03:02:30 CPU 0.01s/0.00u sec elapsed 0.22 sec. 18/11/2005 03:02:30 INFO: "pg_attribute": found 0 removable, 2861 nonremovable row versions in 48 pages 18/11/2005 03:02:30 DETAIL: 0 dead row versions cannot be removed yet. 18/11/2005 03:02:30 There were 19 unused item pointers. 18/11/2005 03:02:30 0 pages are entirely empty. 18/11/2005 03:02:30 CPU 0.12s/0.03u sec elapsed 0.72 sec. 18/11/2005 03:02:30 VACUUM This is from our vacuum process log. The first index there is the one in question. Hmm, any clues? Of course everything is running at full speed currently. -- David Mitchell Software Engineer Telogis
David Mitchell <david.mitchell@telogis.com> writes: > 18/11/2005 03:02:29 INFO: index "pg_attribute_relid_attnam_index" now > contains 2861 row versions in 11900 pages > 18/11/2005 03:02:29 DETAIL: 11834 index pages have been deleted, 11834 > are currently reusable. > 18/11/2005 03:02:30 INFO: index "pg_attribute_relid_attnum_index" now > contains 2861 row versions in 2942 pages > 18/11/2005 03:02:30 DETAIL: 2917 index pages have been deleted, 2917 > are currently reusable. > 18/11/2005 03:02:30 INFO: "pg_attribute": found 0 removable, 2861 > nonremovable row versions in 48 pages Those indexes are pretty bloated :-( but pg_attribute itself seems to be nice and tight. I speculate that you did a VACUUM FULL on it recently. The condition of the indexes suggests strongly that you've not been vacuuming pg_attribute often enough (and perhaps not any of the other system catalogs, either?). Heavy use of temp tables will cause pg_attribute and pg_class to bloat if you don't keep after them. At the moment, a REINDEX seems to be indicated. regards, tom lane
Tom Lane wrote: > I speculate that you did a VACUUM FULL on it recently. You speculate right, we do a vacuum full every sunday night as a safety net. So the vacuum full was a week old. > > The condition of the indexes suggests strongly that you've not been > vacuuming pg_attribute often enough (and perhaps not any of the other > system catalogs, either?). Heavy use of temp tables will cause > pg_attribute and pg_class to bloat if you don't keep after them. That's interesting. We stopped using temporary tables because they were causing us lots of trouble, including table bloat. We vacuum pg_attribute (and every other table with an entry in pg_tables) every ten minutes. What other that temp tables could bloat pg_attribute? We use refcursors to return data from our stored procedures. We also have a few stored procedures that return SETOF. Thanks for helping us out on this. -- David Mitchell Software Engineer Telogis