Re: Speed up transaction completion faster after many relations are accessed in a transaction
От | David Rowley |
---|---|
Тема | Re: Speed up transaction completion faster after many relations are accessed in a transaction |
Дата | |
Msg-id | CAApHDvoKqWRxw5nnUPZ8+mAJKHPOPxYGoY1gQdh0WeS4+biVhg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Speed up transaction completion faster after many relations areaccessed in a transaction (David Rowley <david.rowley@2ndquadrant.com>) |
Ответы |
Re: Speed up transaction completion faster after many relations are accessed in a transaction
(Zhihong Yu <zyu@yugabyte.com>)
Re: Speed up transaction completion faster after many relations are accessed in a transaction (David Rowley <dgrowleyml@gmail.com>) |
Список | pgsql-hackers |
On Wed, 14 Aug 2019 at 19:25, David Rowley <david.rowley@2ndquadrant.com> wrote: > For now, I'm out of ideas. If anyone else feels like suggesting > something of picking this up, feel free. This is a pretty old thread, so we might need a recap: # Recap Basically LockReleaseAll() becomes slow after a large number of locks have all been held at once by a backend. The problem is that the LockMethodLocalHash dynahash table must grow to store all the locks and when later transactions only take a few locks, LockReleaseAll() is slow due to hash_seq_search() having to skip the sparsely filled buckets in the bloated hash table. The following things were tried on this thread. Each one failed: 1) Use a dlist in LOCALLOCK to track the next and prev LOCALLOCK. Simply loop over the dlist in LockReleaseAll(). 2) Try dropping and recreating the dynahash table if it becomes bloated using some heuristics to decide what "bloated" means and if recreating is worthwhile. #1 failed due to concerns with increasing the size of LOCALLOCK to store the dlist pointers. Performance regressions were seen too. Possibly due to size increase or additional overhead from pushing onto the dlist. #2 failed because it was difficult to agree on what the heuristics would be and we had no efficient way to determine the maximum number of locks that a given transaction held at any one time. We only know how many were still held at LockReleaseAll(). There were also some suggestions to fix dynahash's hash_seq_search() slowness, and also a suggestion to try using simplehash.h instead of dynahash.c. Unfortunately simplehash.h would suffer the same issues as it too would have to skip over empty buckets in a sparsely populated hash table. I'd like to revive this effort as I have a new idea on how to solve the problem. # Background Over in [1] I'm trying to improve the performance of smgropen() during recovery. The slowness there comes from the dynahash table lookups to find the correct SMgrRelation. Over there I proposed to use simplehash instead of dynahash because it's quite a good bit faster and far lessens the hash lookup overhead during recovery. One problem on that thread is that relcache keeps a pointer into the SMgrRelation (RelationData.rd_smgr) and because simplehash moves things around during inserts and deletes, then we can't have anything point to simplehash entries, they're unstable. I fixed that over on the other thread by having the simplehash entry point to a palloced SMgrRelationData... My problem is, I don't really like that idea as it means we need to palloc() pfree() lots of little chunks of memory. To fix the above, I really think we need a version of simplehash that has stable pointers. Providing that implementation is faster than dynahash, then it will help solve the smgropen() slowness during recovery. # A new hashtable implementation I ended up thinking of this thread again because the implementation of the stable pointer hash that I ended up writing for [1] happens to be lightning fast for hash table sequential scans, even if the table has become bloated. The reason the seq scans are so fast is that the implementation loops over the data arrays, which are tightly packed and store the actual data rather than pointers to the data. The code does not need to loop over the bucket array for this at all, so how large that has become is irrelevant to hash table seq scan performance. The patch stores elements in "segments" which is set to some power of 2 value. When we run out of space to store new items in a segment, we just allocate another segment. When we remove items from the table, new items reuse the first unused item in the first segment with free space. This helps to keep the used elements tightly packed. A segment keeps a bitmap of used items so that means scanning all used items is very fast. If you flip the bits in the used_item bitmap, then you get a free list for that segment, so it's also very fast to find a free element when inserting a new item into the table. I've called the hash table implementation "generichash". It uses the same preprocessor tricks as simplehash.h does and borrows the same linear probing code that's used in simplehash. The bucket array just stores the hash value and a uint32 index into the segment item that stores the data. Since segments store a power of 2 items, we can easily address both the segment number and the item within the segment from the single uint32 index value. The 'index' field just stores a special value when the bucket is empty. No need to add another field for that. This means the bucket type is just 8 bytes wide. One thing I will mention about the new hash table implementation is that GH_ITEMS_PER_SEGMENT is, by default, set to 256. This means that's the minimum size for the table. I could drop this downto 64, but that's still quite a bit more than the default size of the dynahash table of 16. I think 16 is a bit on the low side and that it might be worth making this 64 anyway. I'd just need to lower GH_ITEMS_PER_SEGMENT down. The current code does not allow it to go lower as I've done nothing to allow partial bitmap words, they're 64-bits on a 64-bit machine. I've not done too much benchmarking between it and simplehash.h, but I think in some cases it will be faster. Since the bucket type is just 8 bytes, moving stuff around during insert/deletes will be cheaper than with simplehash. Lookups are likely to be a bit slower due to having to lookup the item within the segment, which is a few pointer dereferences away. A quick test shows an improvement when compared to dynahash. # select count(pg_try_advisory_lock(99999,99999)) from generate_series(1,1000000); Master: Time: 190.257 ms Time: 189.440 ms Time: 188.768 ms Patched: Time: 182.779 ms Time: 182.267 ms Time: 186.902 ms This is just hitting the local lock table. The advisory lock key is the same each time, so it remains a lock check. Also, it's a pretty small gain, but I'm mostly trying to show that the new hash table is not any slower than dynahash for probing for inserts. The real wins come from what we're trying to solve in this thread -- the performance of LockReleaseAll(). Benchmarking results measuring the TPS of a simple select from an empty table after another transaction has bloated the locallock table. Master: 127544 tps 113971 tps 123255 tps 121615 tps Patched: 170803 tps 167305 tps 165002 tps 171976 tps About 38% faster. The benchmark I used was: t1.sql: \set p 1 select a from t1 where a = :p hp.sql: select count(*) from hp "hp" is a hash partitioned table with 10k partitions. pgbench -j 20 -c 20 -T 60 -M prepared -n -f hp.sql@1 -f t1.sql@100000 postgres I'm using the query to the hp table to bloat the locallock table. It's only executed every 1 in 100,000 queries. The tps numbers above are the ones to run t1.sql I've not quite looked into why yet, but the hp.sql improved performance by 58%. It went from an average of 1.061377 in master to an average of 1.683616 in the patched version. I can't quite see where this gain is coming from. It's pretty hard to get good stable performance results out of this AMD machine, so it might be related to that. That's why I ran 20 threads. It seems slightly better. The machine seems to have trouble waking up properly for a single thread. It would be good if someone could repeat the tests to see if the gains appear on other hardware. Also, it would be good to hear what people think about solving the problem this way. Patch attached. David [1] https://www.postgresql.org/message-id/flat/CAApHDvpkWOGLh_bYg7jproXN8B2g2T9dWDcqsmKsXG5+WwZaqw@mail.gmail.com
Вложения
В списке pgsql-hackers по дате отправления:
Следующее
От: Nikolay ShaplovДата:
Сообщение: Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions