Обсуждение: NOTIFY performance

Поиск
Список
Период
Сортировка

NOTIFY performance

От
Artur Zając
Дата:
Hello,


I would like to create some application using triggers and LISTEN/NOTIFY
framework. I've tested it, and I noticed that performance of NOTIFY
significally decreases with increasing number of distinct NOTIFIES in
transaction.
I found that function AsyncExistsPendingNotify is responsibe for it. I think
that complexivity of searching duplicates there is O(N^2). Would be possible
to improve performance of it? Maybe by using list for elements precedence
and binary search tree for searching duplicates - with complexivity of
O(Nlog2(N)).

I'v tested with 50000 of NOTICES. Updating table with 20000 NOTICES when
searching is not performed took 1,5 second. With searching it took 28
seconds.

-------------------------------------------
Artur Zajac





Re: NOTIFY performance

От
Merlin Moncure
Дата:
On Fri, Aug 24, 2012 at 1:46 PM, Artur Zając <azajac@ang.com.pl> wrote:
> Hello,
>
>
> I would like to create some application using triggers and LISTEN/NOTIFY
> framework. I've tested it, and I noticed that performance of NOTIFY
> significally decreases with increasing number of distinct NOTIFIES in
> transaction.
> I found that function AsyncExistsPendingNotify is responsibe for it. I think
> that complexivity of searching duplicates there is O(N^2). Would be possible
> to improve performance of it? Maybe by using list for elements precedence
> and binary search tree for searching duplicates - with complexivity of
> O(Nlog2(N)).
>
> I'v tested with 50000 of NOTICES. Updating table with 20000 NOTICES when
> searching is not performed took 1,5 second. With searching it took 28
> seconds.

I've confirmed the n^2 behavior on 9.2:
postgres=# select pg_notify(v::text, null) from generate_series(1,10000) v;
Time: 281.000 ms
postgres=# select pg_notify(v::text, null) from generate_series(1,50000) v;
Time: 7148.000 ms

...but i'm curious if you're going about things the right
way...typically I'd imagine you'd write out actionable items to a
table and issue a much broader NOTIFY which taps listeners on the
table to search the action queue.  Could you describe your problem in
a little more detail?

merlin


Re: NOTIFY performance

От
Artur Zając
Дата:
>> I would like to create some application using triggers and
>> LISTEN/NOTIFY framework. I've tested it, and I noticed that
>> performance of NOTIFY significally decreases with increasing number of
>> distinct NOTIFIES in transaction.
>> I found that function AsyncExistsPendingNotify is responsibe for it. I
>> think that complexivity of searching duplicates there is O(N^2). Would
>> be possible to improve performance of it? Maybe by using list for
>> elements precedence and binary search tree for searching duplicates -
>> with complexivity of O(Nlog2(N)).
>>
>> I'v tested with 50000 of NOTICES. Updating table with 20000 NOTICES
>> when searching is not performed took 1,5 second. With searching it
>> took 28 seconds.
>
>I've confirmed the n^2 behavior on 9.2:
>postgres=# select pg_notify(v::text, null) from generate_series(1,10000) v;
>Time: 281.000 ms
>postgres=# select pg_notify(v::text, null) from generate_series(1,50000) v;
>Time: 7148.000 ms
>
>...but i'm curious if you're going about things the right way...typically I'd imagine you'd write out actionable items
toa table and issue a much broader NOTIFY which taps listeners on the table to search the action queue.  Could you
describeyour problem in >a little more detail? 

When there was only NOTIFY option with simple channel name there was no need to send so many messages - creating 50000
channelswould be really stupid. NOTIFY to channel might only mean that there is sth new in table or sth similar. But
withpayload option it would be possible to make simple system for notify other database clients (or self notify - when
changesare made by triggers) that some single record has changed and it should be invalidated in client cache. I would
made(and I already made) that system (similar to streaming replication :) but more more simple), but unfortunately even
notbig update on table would kill my system with complexivity O(N^2). In general , I know that this system would be not
efficient,but for my application it would simply solve my many problems. 

-------------------------------------------
Artur Zajac







Re: NOTIFY performance

От
Merlin Moncure
Дата:
On Fri, Aug 24, 2012 at 4:11 PM, Artur Zając <azajac@ang.com.pl> wrote:
>>> I would like to create some application using triggers and
>>> LISTEN/NOTIFY framework. I've tested it, and I noticed that
>>> performance of NOTIFY significally decreases with increasing number of
>>> distinct NOTIFIES in transaction.
>>> I found that function AsyncExistsPendingNotify is responsibe for it. I
>>> think that complexivity of searching duplicates there is O(N^2). Would
>>> be possible to improve performance of it? Maybe by using list for
>>> elements precedence and binary search tree for searching duplicates -
>>> with complexivity of O(Nlog2(N)).
>>>
>>> I'v tested with 50000 of NOTICES. Updating table with 20000 NOTICES
>>> when searching is not performed took 1,5 second. With searching it
>>> took 28 seconds.
>>
>>I've confirmed the n^2 behavior on 9.2:
>>postgres=# select pg_notify(v::text, null) from generate_series(1,10000) v;
>>Time: 281.000 ms
>>postgres=# select pg_notify(v::text, null) from generate_series(1,50000) v;
>>Time: 7148.000 ms
>>
>>...but i'm curious if you're going about things the right way...typically I'd imagine you'd write out actionable
itemsto a table and issue a much broader NOTIFY which taps listeners on the table to search the action queue.  Could
youdescribe your problem in >a little more detail? 
>
> When there was only NOTIFY option with simple channel name there was no need to send so many messages - creating
50000channels would be really stupid. NOTIFY to channel might only mean that there is sth new in table or sth similar.
Butwith payload option it would be possible to make simple system for notify other database clients (or self notify -
whenchanges are made by triggers) that some single record has changed and it should be invalidated in client cache. I
wouldmade (and I already made) that system (similar to streaming replication :) but more more simple), but
unfortunatelyeven not big update on table would kill my system with complexivity O(N^2). In general , I know that this
systemwould be not efficient, but for my application it would simply solve my many problems. 

Yeah -- my take is that you're pushing too much information through
the notify.  If I was in your shoes, I'd be notifying the client to
come and check and invalidation queue which would be updated through
some sort of trigger.  The payload options is great in that it can
save you a round trip in some latency sensitive cases but it's not a
replacement for a proper queue.

merlin


Re: NOTIFY performance

От
Jeff Janes
Дата:
On Fri, Aug 24, 2012 at 11:46 AM, Artur Zając <azajac@ang.com.pl> wrote:
> Hello,
>
>
> I would like to create some application using triggers and LISTEN/NOTIFY
> framework. I've tested it, and I noticed that performance of NOTIFY
> significally decreases with increasing number of distinct NOTIFIES in
> transaction.
> I found that function AsyncExistsPendingNotify is responsibe for it. I think
> that complexivity of searching duplicates there is O(N^2). Would be possible
> to improve performance of it? Maybe by using list for elements precedence
> and binary search tree for searching duplicates - with complexivity of
> O(Nlog2(N)).

I wonder if should be trying to drop duplicates at all.  I think that
doing that made a lot more sense before payloads existed.

The docs said that the system "can" drop duplicates, so making it no
longer do so would be backwards compatible.

Maybe drop duplicates where the payload was the empty string, but keep
them otherwise?

Cheers,

Jeff


Re: NOTIFY performance

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> I wonder if should be trying to drop duplicates at all.  I think that
> doing that made a lot more sense before payloads existed.

Perhaps, but we have a lot of history to be backwards-compatible with.

> The docs said that the system "can" drop duplicates, so making it no
> longer do so would be backwards compatible.

Maybe compatible from a language-lawyerly point of view, but the
performance characteristics would be hugely different - and since this
complaint is entirely about performance, I don't think it's fair to
ignore that.  We'd be screwing people who've depended on the historical
behavior to accommodate people who expect something that never worked
well before to start working well.

The case that I'm specifically worried about is rules and triggers that
issue NOTIFY without worrying about generating lots of duplicates when
many rows are updated in one command.

> Maybe drop duplicates where the payload was the empty string, but keep
> them otherwise?

Maybe, but that seems pretty weird/unpredictable.  (In particular, if
you have a mixed workload with some of both types of notify, you lose
twice: some of the inserts will need to scan the list, so that cost
is still quadratic, but you still have a huge event list to dump into
the queue when the time comes.)

I seem to recall that we discussed the idea of checking only the last N
notifies for duplicates, for some reasonably small N (somewhere between
10 and 100 perhaps).  That would prevent the quadratic behavior and yet
also eliminate dups in most of the situations where it would matter.
Any N>1 would require a more complicated data structure than is there
now, but it doesn't seem that hard.

The other thing we'd need to find out is whether that's the only problem
for generating bazillions of notify events per transaction.  It won't
help to hack AsyncExistsPendingNotify if dropping the events into the
queue is still too expensive.  I am worried about the overall processing
cost here, consumers and producers both.

            regards, tom lane


Re: NOTIFY performance

От
Jeff Janes
Дата:
On Fri, Aug 31, 2012 at 1:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> I wonder if should be trying to drop duplicates at all.  I think that
>> doing that made a lot more sense before payloads existed.
>
> Perhaps, but we have a lot of history to be backwards-compatible with.
>
>> The docs said that the system "can" drop duplicates, so making it no
>> longer do so would be backwards compatible.
>
> Maybe compatible from a language-lawyerly point of view, but the
> performance characteristics would be hugely different - and since this
> complaint is entirely about performance, I don't think it's fair to
> ignore that.  We'd be screwing people who've depended on the historical
> behavior to accommodate people who expect something that never worked
> well before to start working well.
>
> The case that I'm specifically worried about is rules and triggers that
> issue NOTIFY without worrying about generating lots of duplicates when
> many rows are updated in one command.

Would those ones generally have an empty payload?  I would think they
do, but I have not yet used NOTIFY in anger.

>
>> Maybe drop duplicates where the payload was the empty string, but keep
>> them otherwise?
>
> Maybe, but that seems pretty weird/unpredictable.  (In particular, if
> you have a mixed workload with some of both types of notify, you lose
> twice: some of the inserts will need to scan the list, so that cost
> is still quadratic, but you still have a huge event list to dump into
> the queue when the time comes.)

But only empties would need to do searches, and you would only need to
search a list of channels that have already seen an empty in that same
sub-transaction (with an auxiliary data structure), so it would only
be quadratic if you used a huge number of channels.


Perhaps an adaptive heuristic could be used, where if you sent 1000
notices in this subtransaction and the resulting queue is more than,
say, 900, we stop looking for any more duplicates because there don't
seem to be many.  And even then, still check the most recent one in
the queue, just not the whole list.

I think it would virtually take an act of malice to start out sending
all distinct messages, then to switch to sending mostly replicates,
but with the replicates interleaved.

...

>
> The other thing we'd need to find out is whether that's the only problem
> for generating bazillions of notify events per transaction.  It won't
> help to hack AsyncExistsPendingNotify if dropping the events into the
> queue is still too expensive.  I am worried about the overall processing
> cost here, consumers and producers both.

AsyncExistsPendingNotify is head and shoulders above anything else, as
long as we can assume someone can do something meaningful with the
messages in constant and reasonable time.

The time in milliseconds to send x notifies in one subtrans is about
(valid only for large x):

t = 1.9e-05 * x^2

So to send 10,000,000 would take about 22 days just on the sending side.

If I knock out the AsyncExistsPendingNotify with the attached patch, I
can send 10,000,000 notifies in 50.1 seconds, about 10 seconds on the
notifier side pre-commit and the rest post-commit for the notifier and
on the listen side.   (According to "top", most of the time seems to
be CPU time of psql serving as the listener.  According to opreport,
it is not.)

I used this as the listener:

perl -le 'print "listen foo;"; print "select pg_sleep(1), now();"
foreach 1..10000000'| psql > foo

And this as the notifier:

select now(); select count(pg_notify('foo', gen::text)) from
generate_series(1,10000000) as gen;

Time was measured from when the sender started sending to when the
listener finished writing out async messages and hit the next now().


Cheers,

Jeff

Вложения