(auto-)analyze causing bloat/load
От | Andres Freund |
---|---|
Тема | (auto-)analyze causing bloat/load |
Дата | |
Msg-id | 20141018174909.GA5790@alap3.anarazel.de обсуждение исходный текст |
Ответы |
Re: (auto-)analyze causing bloat/load
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
Hi, I've noticed an interesting phenomenon. Look at the following pgbench -c 32 -j 32 -P 1 output on a scale 200 database. Then guess where I've started SET vacuum_cost_delay = '20ms'; ANALYZE pgbench_accounts; and when it finished. The cost_delay is to simulate a autovac triggered analyze. progress: 18.0 s, 17018.1 tps, lat 1.879 ms stddev 1.018 progress: 19.0 s, 17039.6 tps, lat 1.876 ms stddev 0.806 progress: 20.0 s, 16892.1 tps, lat 1.893 ms stddev 0.732 progress: 21.0 s, 16636.9 tps, lat 1.922 ms stddev 0.933 progress: 22.0 s, 16198.4 tps, lat 1.974 ms stddev 1.180 progress: 23.0 s, 16362.1 tps, lat 1.954 ms stddev 0.858 progress: 24.0 s, 15926.0 tps, lat 2.007 ms stddev 0.967 progress: 25.0 s, 14550.5 tps, lat 2.197 ms stddev 1.870 progress: 26.0 s, 15042.2 tps, lat 2.125 ms stddev 1.135 progress: 27.0 s, 14579.1 tps, lat 2.196 ms stddev 1.016 progress: 28.0 s, 14145.0 tps, lat 2.261 ms stddev 1.160 progress: 29.0 s, 13734.1 tps, lat 2.328 ms stddev 0.992 progress: 30.0 s, 12127.6 tps, lat 2.565 ms stddev 1.930 progress: 31.0 s, 12375.2 tps, lat 1.807 ms stddev 1.556 progress: 32.0 s, 11939.9 tps, lat 3.551 ms stddev 45.457 progress: 33.0 s, 12466.0 tps, lat 2.566 ms stddev 1.517 progress: 34.0 s, 12166.1 tps, lat 2.634 ms stddev 1.309 progress: 35.0 s, 11743.7 tps, lat 2.723 ms stddev 1.576 progress: 36.0 s, 11623.3 tps, lat 2.752 ms stddev 1.237 progress: 37.0 s, 11341.4 tps, lat 2.817 ms stddev 1.413 progress: 38.0 s, 11130.2 tps, lat 2.875 ms stddev 1.359 progress: 39.0 s, 10988.0 tps, lat 2.910 ms stddev 1.132 progress: 40.0 s, 10681.9 tps, lat 2.993 ms stddev 1.355 progress: 41.0 s, 10573.1 tps, lat 3.027 ms stddev 1.738 progress: 42.0 s, 10174.9 tps, lat 3.143 ms stddev 2.125 progress: 43.0 s, 10190.1 tps, lat 3.136 ms stddev 1.892 progress: 44.0 s, 9943.2 tps, lat 3.219 ms stddev 1.740 progress: 45.0 s, 9792.5 tps, lat 3.265 ms stddev 2.229 progress: 46.0 s, 9701.6 tps, lat 3.299 ms stddev 1.834 progress: 47.0 s, 9376.6 tps, lat 3.410 ms stddev 2.149 progress: 48.0 s, 9217.2 tps, lat 3.471 ms stddev 1.992 progress: 49.0 s, 9265.9 tps, lat 3.449 ms stddev 1.819 progress: 50.0 s, 9090.9 tps, lat 3.521 ms stddev 1.916 progress: 51.0 s, 8933.9 tps, lat 3.578 ms stddev 1.813 progress: 52.0 s, 8849.6 tps, lat 3.616 ms stddev 1.913 progress: 53.0 s, 8767.5 tps, lat 3.649 ms stddev 1.952 progress: 54.0 s, 8521.5 tps, lat 3.753 ms stddev 2.158 progress: 55.0 s, 8382.2 tps, lat 3.816 ms stddev 2.383 progress: 56.0 s, 8298.1 tps, lat 3.848 ms stddev 2.727 progress: 57.0 s, 8369.5 tps, lat 3.825 ms stddev 2.897 progress: 58.0 s, 8205.0 tps, lat 3.902 ms stddev 2.669 progress: 59.0 s, 8048.1 tps, lat 3.974 ms stddev 2.473 progress: 60.0 s, 7886.5 tps, lat 4.050 ms stddev 2.456 progress: 61.0 s, 7539.0 tps, lat 4.251 ms stddev 2.992 progress: 62.0 s, 7815.4 tps, lat 4.090 ms stddev 2.845 progress: 63.0 s, 7863.4 tps, lat 4.062 ms stddev 3.032 progress: 64.0 s, 7750.6 tps, lat 4.135 ms stddev 2.677 progress: 65.0 s, 7580.7 tps, lat 4.217 ms stddev 2.210 progress: 66.0 s, 7164.8 tps, lat 4.466 ms stddev 2.511 progress: 67.0 s, 7504.4 tps, lat 4.265 ms stddev 1.343 progress: 68.0 s, 9530.0 tps, lat 3.360 ms stddev 1.742 progress: 69.0 s, 16751.3 tps, lat 1.909 ms stddev 0.652 progress: 70.0 s, 16123.7 tps, lat 1.983 ms stddev 0.751 progress: 71.0 s, 15122.7 tps, lat 2.115 ms stddev 1.065 Yes. That's right. ANALYZE caused a slowdown of nearly *2. Executing it on a bigger database where it takes longer makes it much worse. Also, a nonuniform access pattern also makes it *much* worse. So, you might ask, why is that? It's because ANALYZE keeps a snapshot around and thus blocks RecentGlobalXmin from advancing. Leading to hot pruning being impossible on "popular" pages. Now, the update rates in this example certainly are above what most sites see, but in practical sites relations are also much larger. And thus autoanalyze also takes longer... Much longer with a large sample size. Interestingly we already set PROC_IN_ANALYZE - but we never actually look at it. I wonder what it'd take to ignore analyze in GetSnapshotData()/GetOldestXmin()... At first sight it looks quite doable to not have a snapshot around (or setup to be ignored) while acquire_sample_rows() is running. As that's the expensive bit... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: