Обсуждение: Proposed fix for NOTIFY performance degradation

Поиск
Список
Период
Сортировка

Proposed fix for NOTIFY performance degradation

От
Gianni Ciolli
Дата:
Hi,

while measuring NOTIFY execution time, I noticed a significant
performance drop.

Please find a patch attached, together with some tests; more details
are shown below.

Best regards,
Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it

---8<------8<------8<------8<------8<------8<------8<------8<------8<---

h1. The problem

We record pending notifications in a transaction-based list.
Notifications are recorded in the same order as they are issued.

Because of PostgreSQL's asynchronous notification semantics, we don't
need to record two identical notifications; we record a notification
only if there are no duplicates.

This is implemented by scanning the list and checking for duplicates.
The list is scanned backwards because the last element is easily
accessible, which is a sensible optimisation in the case of many
notifications which are identical to the previous one (scenario A).

However, scanning the list is quite expensive in the case of many
notifications which are not identical to the previous one (scenario B,
see Test 1 below when m > 1).

h1. Proposed solution

To check only the last element in that list, which is efficient in
both scenarios (see Test 2 below).

h1. Tests

"PostgreSQL HEAD" has been fetched as after commit #a0e8df52 (Wed Apr
20 22:49:37 2011 -0400).

Test 1 has been executed against PostgreSQL HEAD.

Test 2 has been executed against patched version of PostgreSQL HEAD.

In the tables below:

* "n" denotes the number of notifications issued in a single
  transaction;

* "m" denotes the number of distinct channels used for these
  notifications;

* "iter" is the number of times each transaction has been repeated (to
  reduce the importance of occasional spikes);

* "avg_usec" denotes the average time in microseconds required by each
  NOTIFY statement.

h2. Test 1 - PostgreSQL HEAD

   n   |   m   | iter | avg_usec
-------+-------+------+----------
    10 |     1 |   10 |   43.730
   100 |     1 |   10 |   37.630
  1000 |     1 |   10 |   42.990
 10000 |     1 |   10 |   36.225
    10 |    10 |   10 |   43.960
   100 |   100 |   10 |   46.537
  1000 |  1000 |   10 |  126.115
 10000 | 10000 |   10 |  906.501

h2. Test 2 - patched PostgreSQL

   n   |   m   | iter | avg_usec
-------+-------+------+----------
    10 |     1 |   10 |   43.810
   100 |     1 |   10 |   38.256
  1000 |     1 |   10 |   36.950
 10000 |     1 |   10 |   36.638
    10 |    10 |   10 |   44.830
   100 |   100 |   10 |   38.684
  1000 |  1000 |   10 |   38.924
 10000 | 10000 |   10 |   38.032

Вложения

Re: Proposed fix for NOTIFY performance degradation

От
Tom Lane
Дата:
Gianni Ciolli <gianni.ciolli@2ndquadrant.it> writes:
> [ proposes lobotomization of duplicate-elimination behavior in NOTIFY ]

I think this change is likely to be penny-wise and pound-foolish.
The reason the duplicate check is in there is that things like triggers
may just do "NOTIFY my_table_changed".  If the trigger is fired N times
in a transaction, and you don't have duplicate-elimination in NOTIFY,
then you get N duplicate messages to no purpose.  And the expense of
actually sending (and processing) those messages is a lot higher than
suppressing them would be.

With the proposed change, the simplest case of just one such trigger is
still covered, but not two or more.  I don't think this is good enough.
It's basically throwing the responsibility on the application programmer
to avoid duplicates --- and in most scenarios, it will cost much more
to suppress duplicates in PL code than to do it here.

When I started to read this patch I was hoping to see some clever scheme
for detecting dups at lower cost than what we currently do, like perhaps
hashing.  I'm not impressed with just abandoning the responsibility,
though.

One idea we might consider is to offer two forms of NOTIFY, one that
suppresses dups and one that doesn't, so that in cases where the app
programmer knows his code doesn't generate (many) dups he can tell us
not to bother checking.
        regards, tom lane


Re: Proposed fix for NOTIFY performance degradation

От
Simon Riggs
Дата:
On Sat, Apr 23, 2011 at 2:57 PM, Gianni Ciolli
<gianni.ciolli@2ndquadrant.it> wrote:

> * "avg_usec" denotes the average time in microseconds required by each
>  NOTIFY statement.
>
> h2. Test 1 - PostgreSQL HEAD
>
>   n   |   m   | iter | avg_usec
> -------+-------+------+----------
>    10 |     1 |   10 |   43.730
>   100 |     1 |   10 |   37.630
>  1000 |     1 |   10 |   42.990
>  10000 |     1 |   10 |   36.225
>    10 |    10 |   10 |   43.960
>   100 |   100 |   10 |   46.537
>  1000 |  1000 |   10 |  126.115
>  10000 | 10000 |   10 |  906.501

I read that wrong first time around. So the wasted time from duplicate checks is
  n   |   m   | iter | msec-------+-------+------+----------   10 |    10 |   10 |   0  100 |   100 |   10 |   0.3 1000
| 1000 |   10 |  80 10000 | 10000 |   10 |  8600 

So the cost of the duplicate checks only kicks in at about 200 notifies.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Proposed fix for NOTIFY performance degradation

От
Simon Riggs
Дата:
On Sat, Apr 23, 2011 at 8:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Gianni Ciolli <gianni.ciolli@2ndquadrant.it> writes:
>> [ proposes lobotomization of duplicate-elimination behavior in NOTIFY ]
>
> I think this change is likely to be penny-wise and pound-foolish.
> The reason the duplicate check is in there is that things like triggers
> may just do "NOTIFY my_table_changed".  If the trigger is fired N times
> in a transaction, and you don't have duplicate-elimination in NOTIFY,
> then you get N duplicate messages to no purpose.  And the expense of
> actually sending (and processing) those messages is a lot higher than
> suppressing them would be.
>
> With the proposed change, the simplest case of just one such trigger is
> still covered, but not two or more.  I don't think this is good enough.
> It's basically throwing the responsibility on the application programmer
> to avoid duplicates --- and in most scenarios, it will cost much more
> to suppress duplicates in PL code than to do it here.
>
> When I started to read this patch I was hoping to see some clever scheme
> for detecting dups at lower cost than what we currently do, like perhaps
> hashing.  I'm not impressed with just abandoning the responsibility,
> though.
>
> One idea we might consider is to offer two forms of NOTIFY, one that
> suppresses dups and one that doesn't, so that in cases where the app
> programmer knows his code doesn't generate (many) dups he can tell us
> not to bother checking.

We could check that with a heuristic. If duplicate checking
successfully removes NOTIFYs then keep doing it as the list grows.

Say, suppress duplicate check when list length is > 100 and <10% of
checks removed anything.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services