testing ProcArrayLock patches
От | Robert Haas |
---|---|
Тема | testing ProcArrayLock patches |
Дата | |
Msg-id | CA+Tgmob5j=UmJKCRQZ5yhy6Fqmp+uZWKBVGEggZ3BQfei48L2Q@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: testing ProcArrayLock patches
("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
|
Список | pgsql-hackers |
We have three patches in the hopper that all have the same goal: reduce ProcArrayLock contention. They are: [1] Pavan's patch (subsequently revised by Heikki) to put the "hot" members of the PGPROC structure into a separate array http://archives.postgresql.org/message-id/4EB7C4C9.9070309@enterprisedb.com [2] my FlexLocks patch, and http://archives.postgresql.org/message-id/CA+Tgmoax_14rbx8Y6mmgvW64gCQL4ZviDzwEObXEMuiV=TwmxQ@mail.gmail.com [3] my patch to eliminate some snapshot (I think this is also better semantics, but at any rate it also improves performance) http://archives.postgresql.org/message-id/CA+TgmoYDe3dx7xuK_rCPLWy7P67hp96ozyGe_K6W87kfx3YCGw@mail.gmail.com Interestingly, these all try to reduce ProcArrayLock contention in different ways: [1] does it by making snapshot-taking scan fewer cache lines, [2] does it by reducing contention for the spinlock protecting ProcArrayLock, and [3] does it by taking fewer snapshots. So you might think that the effects of these patches would add, at least to some degree. Now the first two patches are the ones that seem to show the most performance improvement, so I tested both patches individually and also a combination of the two patches (the combined patch for this is attached, as there were numerous conflicts). I tested them on two different machines with completely different architectures; Nate Boley's AMD 6128 box (which has 32 cores) and an HP Integrity server (also with 32 cores). On Integrity, I compiled using the aCC compiler, adjusted the resulting binary with chatr +pi L +pd L, and ran both pgbench and the server with rtsched -s SCHED_NOAGE -p 178, which are settings that seem to be necessary for good performance on that platform. pgbench was run locally on the AMD box but from another server over a high-speed network interconnect on the Integrity server. Both servers were configured with shared_buffers=8GB, checkpoint_segments=300, wal_writer_delay=20ms, and synchronous_commit=off. Some of the other settings were different; on the Integrity server, I had effective_cache_size=340GB, checkpoint_timeout=30min, and wal_buffers=16MB, while on the AMD box I had checkpoint_completion_target=0.9 and maintenance_work_mem=1GB. I doubt that these settings differences were material (except that they probably made reinitializing the database between tests take longer on the Integrity system, since I forgot to set maintenance_work_mem), but I could double-check that if anyone is concerned about it. The results are below. In a nutshell, either patch by itself is very, very good; and both patches together are somewhat better. Which one helps more individually is somewhat variable. Lines marked "m" are unpatched master as of commit ff4fd4bf53c5512427f8ecea08d6ca7777efa2c5. "p" is Pavan's PGPROC patch (maybe I should have said ppp...) as revised by Heikki; "f" is the latest version of my FlexLocks patch, and "b" is the combination patch attached herewith. The number immediately following is the number of clients used, each with its own pgbench thread (i.e. -c N -j N). As usual, each number is the median of three five-minute runs at scale factor 100. Since Pavan's patch has the advantage of being quite simple, I'm thinking we should push that one through to completion first, and then test all the other possible improvements in this area relative to that new baseline. == AMD Opteron 6128, 32 cores, Permanent Tables == m01 tps = 631.208073 (including connections establishing) p01 tps = 631.182923 (including connections establishing) f01 tps = 636.308562 (including connections establishing) b01 tps = 629.295507 (including connections establishing) m08 tps = 4516.479854 (including connections establishing) p08 tps = 4614.772650 (including connections establishing) f08 tps = 4652.454768 (including connections establishing) b08 tps = 4679.363474 (including connections establishing) m16 tps = 7788.615240 (including connections establishing) p16 tps = 7824.025406 (including connections establishing) f16 tps = 7841.876146 (including connections establishing) b16 tps = 7859.334650 (including connections establishing) m24 tps = 11720.145052 (including connections establishing) p24 tps = 12782.696214 (including connections establishing) f24 tps = 12559.765555 (including connections establishing) b24 tps = 12891.945766 (including connections establishing) m32 tps = 10223.015618 (including connections establishing) p32 tps = 11585.902050 (including connections establishing) f32 tps = 11626.542744 (including connections establishing) b32 tps = 11866.969986 (including connections establishing) m80 tps = 7540.482189 (including connections establishing) p80 tps = 11598.446238 (including connections establishing) f80 tps = 11529.752081 (including connections establishing) b80 tps = 11714.364294 (including connections establishing) == AMD Opteron 6128, 32 cores, Unlogged Tables == m01 tps = 680.398630 (including connections establishing) p01 tps = 673.293390 (including connections establishing) f01 tps = 679.993953 (including connections establishing) b01 tps = 679.377600 (including connections establishing) m08 tps = 4760.964292 (including connections establishing) p08 tps = 4870.037842 (including connections establishing) f08 tps = 5028.719509 (including connections establishing) b08 tps = 4893.439824 (including connections establishing) m16 tps = 7997.051705 (including connections establishing) p16 tps = 8218.884377 (including connections establishing) f16 tps = 8160.373682 (including connections establishing) b16 tps = 8144.707958 (including connections establishing) m24 tps = 13066.867858 (including connections establishing) p24 tps = 14523.109116 (including connections establishing) f24 tps = 14098.978673 (including connections establishing) b24 tps = 14526.330294 (including connections establishing) m32 tps = 10800.711985 (including connections establishing) p32 tps = 19159.131614 (including connections establishing) f32 tps = 22224.839905 (including connections establishing) b32 tps = 23373.672552 (including connections establishing) m80 tps = 7885.663468 (including connections establishing) p80 tps = 17760.149440 (including connections establishing) f80 tps = 19960.356205 (including connections establishing) b80 tps = 18665.581069 (including connections establishing) == HP Integrity, 32 cores, Permanent Tables == m01 tps = 883.732295 (including connections establishing) p01 tps = 866.449154 (including connections establishing) f01 tps = 924.364403 (including connections establishing) b01 tps = 926.797302 (including connections establishing) m08 tps = 6098.047731 (including connections establishing) p08 tps = 6293.537855 (including connections establishing) f08 tps = 6059.635731 (including connections establishing) b08 tps = 6250.132288 (including connections establishing) m16 tps = 9995.755003 (including connections establishing) p16 tps = 10654.562946 (including connections establishing) f16 tps = 10258.008496 (including connections establishing) b16 tps = 10712.776806 (including connections establishing) m24 tps = 11646.915026 (including connections establishing) p24 tps = 13483.345338 (including connections establishing) f24 tps = 12815.456128 (including connections establishing) b24 tps = 13506.218109 (including connections establishing) m32 tps = 10433.315312 (including connections establishing) p32 tps = 14111.719739 (including connections establishing) f32 tps = 13990.284158 (including connections establishing) b32 tps = 14697.189751 (including connections establishing) m80 tps = 8177.428209 (including connections establishing) p80 tps = 11343.667289 (including connections establishing) f80 tps = 11651.244256 (including connections establishing) b80 tps = 12523.308466 (including connections establishing) == HP Integrity, 32 cores, Unlogged Tables == m01 tps = 949.594327 (including connections establishing) p01 tps = 958.753925 (including connections establishing) f01 tps = 931.276655 (including connections establishing) b01 tps = 943.836646 (including connections establishing) m08 tps = 6211.621726 (including connections establishing) p08 tps = 6412.267441 (including connections establishing) f08 tps = 5843.870591 (including connections establishing) b08 tps = 6428.415940 (including connections establishing) m16 tps = 10341.538889 (including connections establishing) p16 tps = 11161.425798 (including connections establishing) f16 tps = 10545.954472 (including connections establishing) b16 tps = 11235.441290 (including connections establishing) m24 tps = 11859.831632 (including connections establishing) p24 tps = 14380.766878 (including connections establishing) f24 tps = 13489.351324 (including connections establishing) b24 tps = 14579.649665 (including connections establishing) m32 tps = 10716.208372 (including connections establishing) p32 tps = 15497.819188 (including connections establishing) f32 tps = 14590.406972 (including connections establishing) b32 tps = 15991.920288 (including connections establishing) m80 tps = 8465.159253 (including connections establishing) p80 tps = 11945.494890 (including connections establishing) f80 tps = 14676.324769 (including connections establishing) b80 tps = 15623.109737 (including connections establishing) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Simon RiggsДата:
Сообщение: Re: [GENERAL] VACUUM touching file but not updating relation
Следующее
От: Simon RiggsДата:
Сообщение: Re: [GENERAL] VACUUM touching file but not updating relation