Displaying accumulated autovacuum cost
От | Greg Smith |
---|---|
Тема | Displaying accumulated autovacuum cost |
Дата | |
Msg-id | 4E4C2CE1.5000801@2ndQuadrant.com обсуждение исходный текст |
Ответы |
Re: Displaying accumulated autovacuum cost
(Euler Taveira de Oliveira <euler@timbira.com>)
|
Список | pgsql-hackers |
Attached is a patch that tracks and displays the accumulated cost when autovacuum is running. Code by Noah Misch and myself. I hope this idea will bring a formal process to vacuum tuning, which is currently too hard to do. I was about to add "without..." to that, but I then realized it needs no disclaimer; it's just too hard, period. Vacuum issues are enemy #1 at all the terabyte scale customer sites I've been fighting with lately. The patch updates the command string just before the workers sleep to show how much work they've done so far. And at the end, it adds a few new lines to the information written to the logs, when the autovacuum is notable enough to be logged at all. The overhead it adds is at most a few integer operations per buffer processed and a slower title string update once per sleep. It's trivial compared to both the vacuum itself, and to the instrumentation's value to sites with vacuum issues. To demonstrate the patch in action, here's a test case using a 6.4GB pgbench_accounts table: $ createdb pgbench $ pgbench -i -s 500 pgbench $ psql -d pgbench -c "select pg_relation_size('pgbench_accounts');" pg_relation_size ------------------ 6714761216 $ psql -d pgbench -c "select relname,relpages from pg_class where relname='pgbench_accounts';" relname | relpages ------------------+---------- pgbench_accounts | 819673 $psql -d pgbench -c "delete from pgbench_accounts where aid<20000000" You can see the new information in the command string with ps and grep: $ while [ 1 ] ; do (ps -eaf | grep "[a]utovacuum worker" && sleep 60) ; done gsmith 2687 17718 0 15:44 ? 00:00:00 postgres: autovacuum worker process h=19 m=14196 d=14185 ... gsmith 2687 17718 0 15:44 ? 00:00:09 postgres: autovacuum worker process h=182701 m=301515 d=321345 ... gsmith 2687 17718 1 15:44 ? 00:00:23 postgres: autovacuum worker process h=740359 m=679987 d=617559 ... That's accumulated hit/miss/dirty counts, the raw numbers. When the autovacuum is finished, those totals appear as a new line in the log entry: LOG: automatic vacuum of table "pgbench.public.pgbench_accounts": index scans: 1 pages: 0 removed, 819673 remain tuples: 19999999 removed, 30000022 remain buffer usage: 809537 hits, 749340 misses, 686660 dirtied system usage: CPU 5.70s/19.73u sec elapsed 2211.60 sec To check if this makes sense, we need the relevant parameters at the time, which were the defaults (I only tweaked some basic config bits here, including shared_buffers=400MB so a bit more was cached): vacuum_cost_page_hit = 1 # 0-10000 credits vacuum_cost_page_miss = 10 # 0-10000 credits vacuum_cost_page_dirty = 20 # 0-10000 credits vacuum_cost_limit = 200 # 1-10000 credits autovacuum_vacuum_cost_delay = 20ms Every 20ms equals 50 times/second. That means the cost accumulation should be 200 * 50 = 10000 / second, or 600K/minute. That's how fast the cost should be increasing here. Given a runtime of 2211.60 seconds, that's a total estimated cost of 2209.15 * 10000 = 22,091,500. Now we check that against the totals printed at the end of the vacuum: 1 * 809537 hits=809,537 10 * 749340 misses=7,493,400 20 * 686607 dirtied=13,732,140 And that gives a directly computed total of 22,035,077. Close enough to show this is working as expected. And how I computed all that should give you an idea how you might use these numbers to extract other useful statistics, if you'd like to tune the balance of various cost_page_* parameters as one example. I have no idea how anyone could ever set those relative to one another without this data, it would take epic guessing skills. What else can do you do with this data? -Figure out if the VACUUM is still making progress when it appears stuck -Estimate how long it will take to finish, based on current progress and whatever total cost was logged last time VACUUM ran against this relation. -Compute approximate hit rate on the read side. OS caching issues and the ring buffer are obviously a problem with that, this isn't too valuable. -Can see the cost split when multiple vacuums are running. This problem is why sites can't just use "total time to vacuum" as a useful proxy to estimate how long one will take to run. -Easy to track the read/write ratio -Directly measure the write rate That last one is I think the part people are most perplexed by right now, and this makes it trivial. How do you turn all these cost figures into real-world read/write rates? It's been hard to do. Now, you can take a bunch of samples of the data at 1 minute intervals, like my little "ps | grep" example above does. The delta in the "dirty=" column is how much was written per minute, in units of 8K (usually) buffers. Multiply that by 8192/(60*1024*1024), and you get MB/s out of there. I collected that data for a cleanup run of the pgbench_accounts damage done above, CSV file with all the statistics is attached. I also collected OS level stats from Linux about the actual read/write rate of the process, converted into "Write Mbps" (those are actually in MB/s, sloppy capitalization is via OpenOffice "autocorrect"). Those numbers are close enough to make me confident the dirty buffer totals tracked here do turn into useful MB/s values. Sample of the most interesting part: Cost Delta Dirty Mbps Write Mbps 589,890 2.56 2.73 591,151 2.57 2.73 589,035 2.56 2.72 593,775 3.14 0.20 599,420 2.05 0.00 598,503 2.05 0.00 599,421 2.05 0.00 574,046 0.60 0.01 574,779 0.64 0.67 609,140 2.56 2.68 612,397 2.57 2.69 611,744 2.57 2.69 610,008 2.56 2.68 This shows the expected 600K/minute cost accumulation. And using the dirty= numbers to compute MB/s of write speed closely matches the total write speed of this process. Some of the difference might be I/O to other things besides the main table here, some of it is just because OS write caching will influence the write rate. In the spots where the OS value and what's derived from the dirty rate diverge most, it appears to be because vacuum is filling Linux's write cache. Actual writes accumulated against the process them block for a while. It's a small difference most of the time. I'd be willing to accept a "Dirty MB/s" figure computed this way as accurate enough for most purposes. And this patch lets you get that data, currently unavailable without poking into the OS statistics (if at all), just by doing a little log file and/or command string scraping. Total at the end or real-time monitoring, based on how much work you want to put into it. For a busy site where one or more autovacuum processes are running most of the time, being able to monitor the vacuum portion of the I/O this way will be a huge improvement over the current state of things. I already have a stack of tools built on top of this data I'm working on, and they're making it much easier to come up with an iterative tuning process for autovacuum. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Вложения
В списке pgsql-hackers по дате отправления: