Обсуждение: Proposed fix for NOTIFY performance degradation
Hi, while measuring NOTIFY execution time, I noticed a significant performance drop. Please find a patch attached, together with some tests; more details are shown below. Best regards, Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it ---8<------8<------8<------8<------8<------8<------8<------8<------8<--- h1. The problem We record pending notifications in a transaction-based list. Notifications are recorded in the same order as they are issued. Because of PostgreSQL's asynchronous notification semantics, we don't need to record two identical notifications; we record a notification only if there are no duplicates. This is implemented by scanning the list and checking for duplicates. The list is scanned backwards because the last element is easily accessible, which is a sensible optimisation in the case of many notifications which are identical to the previous one (scenario A). However, scanning the list is quite expensive in the case of many notifications which are not identical to the previous one (scenario B, see Test 1 below when m > 1). h1. Proposed solution To check only the last element in that list, which is efficient in both scenarios (see Test 2 below). h1. Tests "PostgreSQL HEAD" has been fetched as after commit #a0e8df52 (Wed Apr 20 22:49:37 2011 -0400). Test 1 has been executed against PostgreSQL HEAD. Test 2 has been executed against patched version of PostgreSQL HEAD. In the tables below: * "n" denotes the number of notifications issued in a single transaction; * "m" denotes the number of distinct channels used for these notifications; * "iter" is the number of times each transaction has been repeated (to reduce the importance of occasional spikes); * "avg_usec" denotes the average time in microseconds required by each NOTIFY statement. h2. Test 1 - PostgreSQL HEAD n | m | iter | avg_usec -------+-------+------+---------- 10 | 1 | 10 | 43.730 100 | 1 | 10 | 37.630 1000 | 1 | 10 | 42.990 10000 | 1 | 10 | 36.225 10 | 10 | 10 | 43.960 100 | 100 | 10 | 46.537 1000 | 1000 | 10 | 126.115 10000 | 10000 | 10 | 906.501 h2. Test 2 - patched PostgreSQL n | m | iter | avg_usec -------+-------+------+---------- 10 | 1 | 10 | 43.810 100 | 1 | 10 | 38.256 1000 | 1 | 10 | 36.950 10000 | 1 | 10 | 36.638 10 | 10 | 10 | 44.830 100 | 100 | 10 | 38.684 1000 | 1000 | 10 | 38.924 10000 | 10000 | 10 | 38.032
Вложения
Gianni Ciolli <gianni.ciolli@2ndquadrant.it> writes: > [ proposes lobotomization of duplicate-elimination behavior in NOTIFY ] I think this change is likely to be penny-wise and pound-foolish. The reason the duplicate check is in there is that things like triggers may just do "NOTIFY my_table_changed". If the trigger is fired N times in a transaction, and you don't have duplicate-elimination in NOTIFY, then you get N duplicate messages to no purpose. And the expense of actually sending (and processing) those messages is a lot higher than suppressing them would be. With the proposed change, the simplest case of just one such trigger is still covered, but not two or more. I don't think this is good enough. It's basically throwing the responsibility on the application programmer to avoid duplicates --- and in most scenarios, it will cost much more to suppress duplicates in PL code than to do it here. When I started to read this patch I was hoping to see some clever scheme for detecting dups at lower cost than what we currently do, like perhaps hashing. I'm not impressed with just abandoning the responsibility, though. One idea we might consider is to offer two forms of NOTIFY, one that suppresses dups and one that doesn't, so that in cases where the app programmer knows his code doesn't generate (many) dups he can tell us not to bother checking. regards, tom lane
On Sat, Apr 23, 2011 at 2:57 PM, Gianni Ciolli <gianni.ciolli@2ndquadrant.it> wrote: > * "avg_usec" denotes the average time in microseconds required by each > NOTIFY statement. > > h2. Test 1 - PostgreSQL HEAD > > n | m | iter | avg_usec > -------+-------+------+---------- > 10 | 1 | 10 | 43.730 > 100 | 1 | 10 | 37.630 > 1000 | 1 | 10 | 42.990 > 10000 | 1 | 10 | 36.225 > 10 | 10 | 10 | 43.960 > 100 | 100 | 10 | 46.537 > 1000 | 1000 | 10 | 126.115 > 10000 | 10000 | 10 | 906.501 I read that wrong first time around. So the wasted time from duplicate checks is n | m | iter | msec-------+-------+------+---------- 10 | 10 | 10 | 0 100 | 100 | 10 | 0.3 1000 | 1000 | 10 | 80 10000 | 10000 | 10 | 8600 So the cost of the duplicate checks only kicks in at about 200 notifies. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Apr 23, 2011 at 8:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Gianni Ciolli <gianni.ciolli@2ndquadrant.it> writes: >> [ proposes lobotomization of duplicate-elimination behavior in NOTIFY ] > > I think this change is likely to be penny-wise and pound-foolish. > The reason the duplicate check is in there is that things like triggers > may just do "NOTIFY my_table_changed". If the trigger is fired N times > in a transaction, and you don't have duplicate-elimination in NOTIFY, > then you get N duplicate messages to no purpose. And the expense of > actually sending (and processing) those messages is a lot higher than > suppressing them would be. > > With the proposed change, the simplest case of just one such trigger is > still covered, but not two or more. I don't think this is good enough. > It's basically throwing the responsibility on the application programmer > to avoid duplicates --- and in most scenarios, it will cost much more > to suppress duplicates in PL code than to do it here. > > When I started to read this patch I was hoping to see some clever scheme > for detecting dups at lower cost than what we currently do, like perhaps > hashing. I'm not impressed with just abandoning the responsibility, > though. > > One idea we might consider is to offer two forms of NOTIFY, one that > suppresses dups and one that doesn't, so that in cases where the app > programmer knows his code doesn't generate (many) dups he can tell us > not to bother checking. We could check that with a heuristic. If duplicate checking successfully removes NOTIFYs then keep doing it as the list grows. Say, suppress duplicate check when list length is > 100 and <10% of checks removed anything. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services