Обсуждение: listen/notify argument (old topic revisited)

Поиск
Список
Период
Сортировка

listen/notify argument (old topic revisited)

От
Jeff Davis
Дата:
A while ago, I started a small discussion about passing arguments to a NOTIFY 
so that the listening backend could get more information about the event.

There wasn't exactly a consensus from what I understand, but the last thing I 
remember is that someone intended to speed up the notification process by 
storing the events in shared memory segments (IIRC this was Tom's idea). That 
would create a remote possibility of a spurious notification, but the idea is 
that the listening application can check the status and determine what 
happened.

I looked at the TODO, but I couldn't find anything, nor could I find anything 
in the docs. 

Is someone still interested in implementing this feature? Are there still 
people who disagree with the above implementation strategy?

Regards,Jeff




Re: listen/notify argument (old topic revisited)

От
nconway@klamath.dyndns.org (Neil Conway)
Дата:
On Tue, Jul 02, 2002 at 02:37:19AM -0700, Jeff Davis wrote:
> A while ago, I started a small discussion about passing arguments to a NOTIFY 
> so that the listening backend could get more information about the event.

Funny, I was just about to post to -hackers about this.

> There wasn't exactly a consensus from what I understand, but the last thing I 
> remember is that someone intended to speed up the notification process by 
> storing the events in shared memory segments (IIRC this was Tom's idea). That 
> would create a remote possibility of a spurious notification, but the idea is 
> that the listening application can check the status and determine what 
> happened.

Yes, that was Tom Lane. IMHO, we need to replace the existing
pg_listener scheme with an improved model if we want to make any
significant improvements to asynchronous notifications. In summary,
the two designs that have been suggested are:
   pg_notify: a new system catalog, stores notifications only --   pg_listener stores only listening backends.
   shmem: all notifications are done via shared memory and not stored   in system catalogs at all, in a manner similar
tothe cache   invalidation code that already exists. This avoids the MVCC-induced   performence problem with storing
notificationin system catalogs,   but can lead to spurrious notifications -- the statically sized   buffer in which
notificationsare stored can overflow. Applications   will be able to differentiate between overflow-induced and regular
 messages.
 

> Is someone still interested in implementing this feature? Are there still 
> people who disagree with the above implementation strategy?

Some people objected to shmem at the time; personally, I'm not really
sure which design is best. Any comments from -hackers?

If there's a consensus on which route to take, I'll probably implement
the preferred design for 7.3. However, I think that a proper
implementation of notify messages will need an FE/BE protocol change,
so that will need to wait for 7.4.

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC




Re: listen/notify argument (old topic revisited)

От
Bruce Momjian
Дата:
Jeff Davis wrote:
> A while ago, I started a small discussion about passing arguments to a NOTIFY 
> so that the listening backend could get more information about the event.
> 
> There wasn't exactly a consensus from what I understand, but the last thing I 
> remember is that someone intended to speed up the notification process by 
> storing the events in shared memory segments (IIRC this was Tom's idea). That 
> would create a remote possibility of a spurious notification, but the idea is 
> that the listening application can check the status and determine what 
> happened.

I don't see a huge value to using shared memory.   Once we get
auto-vacuum, pg_listener will be fine, and shared memory like SI is just
too hard to get working reliabily because of all the backends
reading/writing in there.  We have tables that have the proper sharing
semantics;  I think we should use those and hope we get autovacuum soon.

As far as the message, perhaps passing the oid of the pg_listener row to
the backend would help, and then the backend can look up any message for
that oid in pg_listener.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 




Re: listen/notify argument (old topic revisited)

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Why can't we do efficient indexing, or clear out the table?  I don't
> > remember.
> 
> I don't recall either, but I do recall that we tried to index it and
> backed out the changes.  In any case, a table on disk is just plain
> not the right medium for transitory-by-design notification messages.

OK, I can help here.  I added an index on pg_listener so lookups would
go faster in the backend, but inserts/updates into the table also
require index additions, and your feeling was that the table was small
and we would be better without the index and just sequentially scanning
the table.  I can easily add the index and make sure it is used properly
if you are now concerned about table access time.

I think your issue was that it is only looked up once, and only updated
once, so there wasn't much sense in having that index maintanance
overhead, i.e. you only used the index once per row.

(I remember the item being on TODO for quite a while when we discussed
this.)

Of course, a shared memory system probably is going to either do it
sequentailly or have its own index issues, so I don't see a huge
advantage to going to shared memory, and I do see extra code and a queue
limit.

> >> A curious statement considering that PG depends critically on SI
> >> working.  This is a solved problem.
> 
> > My point is that SI was buggy for years until we found all the bugs, so
> > yea, it is a solved problem, but solved with difficulty.
> 
> The SI message mechanism itself was not the source of bugs, as I recall
> it (although certainly the code was incomprehensible in the extreme;
> the original programmer had absolutely no grasp of readable coding style
> IMHO).  The problem was failure to properly design the interactions with
> relcache and catcache, which are pretty complex in their own right.
> An SI-like NOTIFY mechanism wouldn't have those issues.

Oh, OK, interesting.  So _that_ was the issue there.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 




Re: listen/notify argument (old topic revisited)

От
Bruce Momjian
Дата:
Let me tell you what would be really interesting.  If we didn't report
the pid of the notifying process and we didn't allow arbitrary strings
for notify (just pg_class relation names), we could just add a counter
to pg_class that is updated for every notify.  If a backend is
listening, it remembers the counter at listen time, and on every commit
checks the pg_class counter to see if it has incremented.  That way,
there is no queue, no shared memory, and there is no scanning. You just
pull up the cache entry for pg_class and look at the counter.

One problem is that pg_class would be updated more frequently.  Anyway,
just an idea.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Is disk i/o a real performance
> > penalty for notify, and is performance a huge issue for notify anyway,
> 
> Yes, and yes.  I have used NOTIFY in production applications, and I know
> that performance is an issue.
> 
> >> The queue limit problem is a valid argument, but it's the only valid
> >> complaint IMHO; and it seems a reasonable tradeoff to make for the
> >> other advantages.
> 
> BTW, it occurs to me that as long as we make this an independent message
> buffer used only for NOTIFY (and *not* try to merge it with SI), we
> don't have to put up with overrun-reset behavior.  The overrun reset
> approach is useful for SI because there are only limited times when
> we are prepared to handle SI notification in the backend work cycle.
> However, I think a self-contained NOTIFY mechanism could be much more
> flexible about when it will remove messages from the shared buffer.
> Consider this:
> 
> 1. To send NOTIFY: grab write lock on shared-memory circular buffer.
> If enough space, insert message, release lock, send signal, done.
> If not enough space, release lock, send signal, sleep some small
> amount of time, and then try again.  (Hard failure would occur only
> if the proposed message size exceeds the buffer size; as long as we
> make the buffer size a parameter, this is the DBA's fault not ours.)
> 
> 2. On receipt of signal: grab read lock on shared-memory circular
> buffer, copy all data up to write pointer into private memory,
> advance my (per-process) read pointer, release lock.  This would be
> safe to do pretty much anywhere we're allowed to malloc more space,
> so it could be done say at the same points where we check for cancel
> interrupts.  Therefore, the expected time before the shared buffer
> is emptied after a signal is pretty small.
> 
> In this design, if someone sits in a transaction for a long time,
> there is no risk of shared memory overflow; that backend's private
> memory for not-yet-reported NOTIFYs could grow large, but that's
> his problem.  (We could avoid unnecessary growth by not storing
> messages that don't correspond to active LISTENs for that backend.
> Indeed, a backend with no active LISTENs could be left out of the
> circular buffer participation list altogether.)
> 
> We'd need to separate this processing from the processing that's used to
> force SI queue reading (dz's old patch), so we'd need one more signal
> code than we use now.  But we do have SIGUSR1 available.
> 
>             regards, tom lane
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 




Re: listen/notify argument (old topic revisited)

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I don't see a huge value to using shared memory.   Once we get
> auto-vacuum, pg_listener will be fine,

No it won't.  The performance of notify is *always* going to suck
as long as it depends on going through a table.  This is particularly
true given the lack of any effective way to index pg_listener; the
more notifications you feed through, the more dead rows there are
with the same key...

> and shared memory like SI is just
> too hard to get working reliabily because of all the backends
> reading/writing in there.

A curious statement considering that PG depends critically on SI
working.  This is a solved problem.
        regards, tom lane




Re: listen/notify argument (old topic revisited)

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Why can't we do efficient indexing, or clear out the table?  I don't
> remember.

I don't recall either, but I do recall that we tried to index it and
backed out the changes.  In any case, a table on disk is just plain
not the right medium for transitory-by-design notification messages.

>> A curious statement considering that PG depends critically on SI
>> working.  This is a solved problem.

> My point is that SI was buggy for years until we found all the bugs, so
> yea, it is a solved problem, but solved with difficulty.

The SI message mechanism itself was not the source of bugs, as I recall
it (although certainly the code was incomprehensible in the extreme;
the original programmer had absolutely no grasp of readable coding style
IMHO).  The problem was failure to properly design the interactions with
relcache and catcache, which are pretty complex in their own right.
An SI-like NOTIFY mechanism wouldn't have those issues.
        regards, tom lane




Re: listen/notify argument (old topic revisited)

От
Bruce Momjian
Дата:
Jeff Davis wrote:
> On Tuesday 02 July 2002 06:03 pm, Bruce Momjian wrote:
> > Let me tell you what would be really interesting.  If we didn't report
> > the pid of the notifying process and we didn't allow arbitrary strings
> > for notify (just pg_class relation names), we could just add a counter
> > to pg_class that is updated for every notify.  If a backend is
> > listening, it remembers the counter at listen time, and on every commit
> > checks the pg_class counter to see if it has incremented.  That way,
> > there is no queue, no shared memory, and there is no scanning. You just
> > pull up the cache entry for pg_class and look at the counter.
> >
> > One problem is that pg_class would be updated more frequently.  Anyway,
> > just an idea.
> 
> I think that currently a lot of people use select() (after all, it's mentioned 
> in the docs) in the frontend to determine when a notify comes into a 
> listening backend. If the backend only checks on commit, and the backend is 
> largely idle except for notify processing, might it be a while before the 
> frontend realizes that a notify was sent?

I meant to check exactly when it does now;  when a query completes.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 




Re: listen/notify argument (old topic revisited)

От
Jeff Davis
Дата:
On Tuesday 02 July 2002 06:03 pm, Bruce Momjian wrote:
> Let me tell you what would be really interesting.  If we didn't report
> the pid of the notifying process and we didn't allow arbitrary strings
> for notify (just pg_class relation names), we could just add a counter
> to pg_class that is updated for every notify.  If a backend is
> listening, it remembers the counter at listen time, and on every commit
> checks the pg_class counter to see if it has incremented.  That way,
> there is no queue, no shared memory, and there is no scanning. You just
> pull up the cache entry for pg_class and look at the counter.
>
> One problem is that pg_class would be updated more frequently.  Anyway,
> just an idea.

I think that currently a lot of people use select() (after all, it's mentioned 
in the docs) in the frontend to determine when a notify comes into a 
listening backend. If the backend only checks on commit, and the backend is 
largely idle except for notify processing, might it be a while before the 
frontend realizes that a notify was sent?

Regards,Jeff



>
> ---------------------------------------------------------------------------
>
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > Is disk i/o a real performance
> > > penalty for notify, and is performance a huge issue for notify anyway,
> >
> > Yes, and yes.  I have used NOTIFY in production applications, and I know
> > that performance is an issue.
> >
> > >> The queue limit problem is a valid argument, but it's the only valid
> > >> complaint IMHO; and it seems a reasonable tradeoff to make for the
> > >> other advantages.
> >
> > BTW, it occurs to me that as long as we make this an independent message
> > buffer used only for NOTIFY (and *not* try to merge it with SI), we
> > don't have to put up with overrun-reset behavior.  The overrun reset
> > approach is useful for SI because there are only limited times when
> > we are prepared to handle SI notification in the backend work cycle.
> > However, I think a self-contained NOTIFY mechanism could be much more
> > flexible about when it will remove messages from the shared buffer.
> > Consider this:
> >
> > 1. To send NOTIFY: grab write lock on shared-memory circular buffer.
> > If enough space, insert message, release lock, send signal, done.
> > If not enough space, release lock, send signal, sleep some small
> > amount of time, and then try again.  (Hard failure would occur only
> > if the proposed message size exceeds the buffer size; as long as we
> > make the buffer size a parameter, this is the DBA's fault not ours.)
> >
> > 2. On receipt of signal: grab read lock on shared-memory circular
> > buffer, copy all data up to write pointer into private memory,
> > advance my (per-process) read pointer, release lock.  This would be
> > safe to do pretty much anywhere we're allowed to malloc more space,
> > so it could be done say at the same points where we check for cancel
> > interrupts.  Therefore, the expected time before the shared buffer
> > is emptied after a signal is pretty small.
> >
> > In this design, if someone sits in a transaction for a long time,
> > there is no risk of shared memory overflow; that backend's private
> > memory for not-yet-reported NOTIFYs could grow large, but that's
> > his problem.  (We could avoid unnecessary growth by not storing
> > messages that don't correspond to active LISTENs for that backend.
> > Indeed, a backend with no active LISTENs could be left out of the
> > circular buffer participation list altogether.)
> >
> > We'd need to separate this processing from the processing that's used to
> > force SI queue reading (dz's old patch), so we'd need one more signal
> > code than we use now.  But we do have SIGUSR1 available.
> >
> >             regards, tom lane





Re: listen/notify argument (old topic revisited)

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Of course, a shared memory system probably is going to either do it
> sequentailly or have its own index issues, so I don't see a huge
> advantage to going to shared memory, and I do see extra code and a queue
> limit.

Disk I/O vs. no disk I/O isn't a huge advantage?  Come now.

A shared memory system would use sequential (well, actually
circular-buffer) access, which is *exactly* what you want given
the inherently sequential nature of the messages.  The reason that
table storage hurts is that we are forced to do searches, which we
could eliminate if we had control of the storage ordering.  Again,
it comes down to the fact that tables don't provide the right
abstraction for this purpose.

The "extra code" argument doesn't impress me either; async.c is
currently 900 lines, about 2.5 times the size of sinvaladt.c which is
the guts of SI message passing.  I think it's a good bet that a SI-like
notify module would be much smaller than async.c is now; it's certainly
unlikely to be significantly larger.

The queue limit problem is a valid argument, but it's the only valid
complaint IMHO; and it seems a reasonable tradeoff to make for the
other advantages.
        regards, tom lane




Re: listen/notify argument (old topic revisited)

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I don't see a huge value to using shared memory.   Once we get
> > auto-vacuum, pg_listener will be fine,
> 
> No it won't.  The performance of notify is *always* going to suck
> as long as it depends on going through a table.  This is particularly
> true given the lack of any effective way to index pg_listener; the
> more notifications you feed through, the more dead rows there are
> with the same key...

Why can't we do efficient indexing, or clear out the table?  I don't
remember.

> > and shared memory like SI is just
> > too hard to get working reliabily because of all the backends
> > reading/writing in there.
> 
> A curious statement considering that PG depends critically on SI
> working.  This is a solved problem.

My point is that SI was buggy for years until we found all the bugs, so
yea, it is a solved problem, but solved with difficulty.

Do we want to add another SI-type capability that could be as difficult
to get working properly, or will the notify piggyback on the existing SI
code.  If that latter, that would be fine with me, but we still have the
overflow queue problem.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 




Re: listen/notify argument (old topic revisited)

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Of course, a shared memory system probably is going to either do it
> > sequentailly or have its own index issues, so I don't see a huge
> > advantage to going to shared memory, and I do see extra code and a queue
> > limit.
> 
> Disk I/O vs. no disk I/O isn't a huge advantage?  Come now.

My assumption is that it throws to disk as backing store, which seems
better to me than dropping the notifies.  Is disk i/o a real performance
penalty for notify, and is performance a huge issue for notify anyway,
assuming autovacuum?

> A shared memory system would use sequential (well, actually
> circular-buffer) access, which is *exactly* what you want given
> the inherently sequential nature of the messages.  The reason that
> table storage hurts is that we are forced to do searches, which we
> could eliminate if we had control of the storage ordering.  Again,
> it comes down to the fact that tables don't provide the right
> abstraction for this purpose.

To me, it just seems like going to shared memory is taking our existing
table structure and moving it to memory.  Yea, there is no tuple header,
and yea we can make a circular list, but we can't index the thing, so is
spinning around a circular list any better than a sequential scan of a
table.  Yea, we can delete stuff better, but autovacuum would help with
that.  It just seems like we are reinventing the wheel.

Are there other uses for this? Can we make use of RAM-only tables?

> The "extra code" argument doesn't impress me either; async.c is
> currently 900 lines, about 2.5 times the size of sinvaladt.c which is
> the guts of SI message passing.  I think it's a good bet that a SI-like
> notify module would be much smaller than async.c is now; it's certainly
> unlikely to be significantly larger.
> 
> The queue limit problem is a valid argument, but it's the only valid
> complaint IMHO; and it seems a reasonable tradeoff to make for the
> other advantages.

I am just not excited about it.  What do others think?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 




Re: listen/notify argument (old topic revisited)

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Is disk i/o a real performance
> penalty for notify, and is performance a huge issue for notify anyway,

Yes, and yes.  I have used NOTIFY in production applications, and I know
that performance is an issue.

>> The queue limit problem is a valid argument, but it's the only valid
>> complaint IMHO; and it seems a reasonable tradeoff to make for the
>> other advantages.

BTW, it occurs to me that as long as we make this an independent message
buffer used only for NOTIFY (and *not* try to merge it with SI), we
don't have to put up with overrun-reset behavior.  The overrun reset
approach is useful for SI because there are only limited times when
we are prepared to handle SI notification in the backend work cycle.
However, I think a self-contained NOTIFY mechanism could be much more
flexible about when it will remove messages from the shared buffer.
Consider this:

1. To send NOTIFY: grab write lock on shared-memory circular buffer.
If enough space, insert message, release lock, send signal, done.
If not enough space, release lock, send signal, sleep some small
amount of time, and then try again.  (Hard failure would occur only
if the proposed message size exceeds the buffer size; as long as we
make the buffer size a parameter, this is the DBA's fault not ours.)

2. On receipt of signal: grab read lock on shared-memory circular
buffer, copy all data up to write pointer into private memory,
advance my (per-process) read pointer, release lock.  This would be
safe to do pretty much anywhere we're allowed to malloc more space,
so it could be done say at the same points where we check for cancel
interrupts.  Therefore, the expected time before the shared buffer
is emptied after a signal is pretty small.

In this design, if someone sits in a transaction for a long time,
there is no risk of shared memory overflow; that backend's private
memory for not-yet-reported NOTIFYs could grow large, but that's
his problem.  (We could avoid unnecessary growth by not storing
messages that don't correspond to active LISTENs for that backend.
Indeed, a backend with no active LISTENs could be left out of the
circular buffer participation list altogether.)

We'd need to separate this processing from the processing that's used to
force SI queue reading (dz's old patch), so we'd need one more signal
code than we use now.  But we do have SIGUSR1 available.
        regards, tom lane




Re: listen/notify argument (old topic revisited)

От
"Christopher Kings-Lynne"
Дата:
> Of course, a shared memory system probably is going to either do it
> sequentailly or have its own index issues, so I don't see a huge
> advantage to going to shared memory, and I do see extra code and a queue
> limit.

Is a shared memory implementation going to play silly buggers with the Win32
port?

Chris





Re: listen/notify argument (old topic revisited)

От
Hannu Krosing
Дата:
On Wed, 2002-07-03 at 08:20, Christopher Kings-Lynne wrote:
> > Of course, a shared memory system probably is going to either do it
> > sequentailly or have its own index issues, so I don't see a huge
> > advantage to going to shared memory, and I do see extra code and a queue
> > limit.
> 
> Is a shared memory implementation going to play silly buggers with the Win32
> port?

Perhaps this is a good place to introduce anonymous mmap ?

Is there a way to grow anonymous mmap on demand ?

----------------
Hannu





Re: listen/notify argument (old topic revisited)

От
Hannu Krosing
Дата:
On Tue, 2002-07-02 at 23:35, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Is disk i/o a real performance
> > penalty for notify, and is performance a huge issue for notify anyway,
> 
> Yes, and yes.  I have used NOTIFY in production applications, and I know
> that performance is an issue.
> 
> >> The queue limit problem is a valid argument, but it's the only valid
> >> complaint IMHO; and it seems a reasonable tradeoff to make for the
> >> other advantages.
> 
> BTW, it occurs to me that as long as we make this an independent message
> buffer used only for NOTIFY (and *not* try to merge it with SI), we
> don't have to put up with overrun-reset behavior.  The overrun reset
> approach is useful for SI because there are only limited times when
> we are prepared to handle SI notification in the backend work cycle.
> However, I think a self-contained NOTIFY mechanism could be much more
> flexible about when it will remove messages from the shared buffer.
> Consider this:
> 
> 1. To send NOTIFY: grab write lock on shared-memory circular buffer.

Are you planning to have one circular buffer per listening backend ?

Would that not be waste of space for large number of backends with long
notify arguments ?

--------------
Hannu





Re: listen/notify argument (old topic revisited)

От
Rod Taylor
Дата:
On Tue, 2002-07-02 at 17:12, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > Of course, a shared memory system probably is going to either do it
> > > sequentailly or have its own index issues, so I don't see a huge
> > > advantage to going to shared memory, and I do see extra code and a queue
> > > limit.
> > 
> > Disk I/O vs. no disk I/O isn't a huge advantage?  Come now.
> 
> My assumption is that it throws to disk as backing store, which seems
> better to me than dropping the notifies.  Is disk i/o a real performance
> penalty for notify, and is performance a huge issue for notify anyway,
> assuming autovacuum?

For me, performance would be one of the only concerns. Currently I use
two methods of finding changes, one is NOTIFY which directs frontends to
reload various sections of data, the second is a table which holds a
QUEUE of actions to be completed (which must be tracked, logged and
completed).

If performance wasn't a concern, I'd simply use more RULES which insert
requests into my queue table.





Re: listen/notify argument (old topic revisited)

От
Tom Lane
Дата:
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> Is a shared memory implementation going to play silly buggers with the Win32
> port?

No.  Certainly no more so than shared disk buffers or the SI message
facility, both of which are *not* optional.
        regards, tom lane




Re: listen/notify argument (old topic revisited)

От
Tom Lane
Дата:
Hannu Krosing <hannu@tm.ee> writes:
> Perhaps this is a good place to introduce anonymous mmap ?

I don't think so; it just adds a portability variable without buying
us anything.

> Is there a way to grow anonymous mmap on demand ?

Nope.  Not portably, anyway.  For instance, the HPUX man page for mmap
sayeth:
    If the size of the mapped file changes after the call to mmap(), the    effect of references to portions of the
mappedregion that correspond    to added or removed portions of the file is unspecified.
 

Dynamically re-mmapping after enlarging the file might work, but there
are all sorts of interesting constraints on that too; it looks like
you'd have to somehow synchronize things so that all the backends do it
at the exact same time.

On the whole I see no advantage to be gained here, compared to the
implementation I sketched earlier with a fixed-size shared buffer and
enlargeable internal buffers in backends.
        regards, tom lane




Re: listen/notify argument (old topic revisited)

От
Tom Lane
Дата:
Hannu Krosing <hannu@tm.ee> writes:
> Are you planning to have one circular buffer per listening backend ?

No; one circular buffer, period.

Each backend would also internally buffer notifies that it hadn't yet
delivered to its client --- but since the time until delivery could vary
drastically across clients, I think that's reasonable.  I'd expect
clients that are using LISTEN to avoid doing long-running transactions,
so under normal circumstances the internal buffer should not grow very
large.
        regards, tom lane




Re: listen/notify argument (old topic revisited)

От
Hannu Krosing
Дата:
On Wed, 2002-07-03 at 15:51, Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> > Are you planning to have one circular buffer per listening backend ?
> 
> No; one circular buffer, period.
> 
> Each backend would also internally buffer notifies that it hadn't yet
> delivered to its client --- but since the time until delivery could vary
> drastically across clients, I think that's reasonable.  I'd expect
> clients that are using LISTEN to avoid doing long-running transactions,
> so under normal circumstances the internal buffer should not grow very
> large.
> 
>             regards, tom lane

> 2. On receipt of signal: grab read lock on shared-memory circular
> buffer, copy all data up to write pointer into private memory,
> advance my (per-process) read pointer, release lock.  This would be
> safe to do pretty much anywhere we're allowed to malloc more space,
> so it could be done say at the same points where we check for cancel
> interrupts.  Therefore, the expected time before the shared buffer
> is emptied after a signal is pretty small.
>
> In this design, if someone sits in a transaction for a long time,
> there is no risk of shared memory overflow; that backend's private
> memory for not-yet-reported NOTIFYs could grow large, but that's
> his problem.  (We could avoid unnecessary growth by not storing
> messages that don't correspond to active LISTENs for that backend.
> Indeed, a backend with no active LISTENs could be left out of the
> circular buffer participation list altogether.)

There could a little more smartness here to avoid unneccessary copying
(not just storing) of not-listened-to data. Perhaps each notify message
could be stored as

(ptr_to_next_blk,name,data)

so that the receiving backend could skip uninetersting (not-listened-to)
messages. 

I guess that depending on the circumstances this can be either faster or
slower than copying them all in one memmove.

This will be slower if all messages are interesting, this will be an
overall win if there is one backend listening to messages with big
dataload and lots of other backends listening to relatively small
messages.

There are scenarios where some more complex structure will be faster (a
sparse communication structure, say 1000 backends each listening to 1
name and notifying ten others - each backend has to (manually ;) check
1000 messages to find the one that is for it) but your proposed
structure seems good enough for most common uses (and definitely better
than the current one)

---------------------
Hannu





Re: listen/notify argument (old topic revisited)

От
Tom Lane
Дата:
Hannu Krosing <hannu@tm.ee> writes:
> There could a little more smartness here to avoid unneccessary copying
> (not just storing) of not-listened-to data.

Yeah, I was wondering about that too.

> I guess that depending on the circumstances this can be either faster or
> slower than copying them all in one memmove.

The more interesting question is whether it's better to hold the read
lock on the shared buffer for the minimum possible amount of time; if
so, we'd be better off to pull the data from the buffer as quickly as
possible and then sort it later.  Determining whether we are interested
in a particular notify name will probably take a probe into a (local)
hashtable, so it won't be super-quick.  However, I think we could
arrange for readers to use a sharable lock on the buffer, so having them
expend that processing while holding the read lock might be acceptable.

My guess is that the actual volume of data going through the notify
mechanism isn't going to be all that large, and so avoiding one memcpy
step for it isn't going to be all that exciting.  I think I'd lean
towards minimizing the time spent holding the shared lock, instead.
But it's a judgment call.
        regards, tom lane




Re: listen/notify argument (old topic revisited)

От
Hannu Krosing
Дата:
On Wed, 2002-07-03 at 16:30, Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> > There could a little more smartness here to avoid unneccessary copying
> > (not just storing) of not-listened-to data.
> 
> Yeah, I was wondering about that too.
> 
> > I guess that depending on the circumstances this can be either faster or
> > slower than copying them all in one memmove.
> 
> The more interesting question is whether it's better to hold the read
> lock on the shared buffer for the minimum possible amount of time;

OTOH, we may decide that getting a notify ASAP is not a priority and
just go on doing what we did before if we can't get the lock and try
again the next time around.

This may have some pathological behaviours (starving some backends who
always come late ;), but we are already attracting a thundering herd by
sending a signal to all _possibly_ interested backends at the same time
time.

Keeping a list of who listens to what can solve this problem (but only
in case of sparse listening habits).

-----------------
Hannu





Re: listen/notify argument (old topic revisited)

От
Hannu Krosing
Дата:
On Wed, 2002-07-03 at 16:30, Tom Lane wrote:
> My guess is that the actual volume of data going through the notify
> mechanism isn't going to be all that large, and so avoiding one memcpy
> step for it isn't going to be all that exciting. 

It may become large if we will have an implementation which can cope
well with lage volumes :)

> I think I'd lean towards minimizing the time spent holding the
> shared lock, instead.

In case you are waiting for just one message out of 1000 it may still be
faster to do selective copying.

It is possible that 1000 strcmp's + 1000 pointer traversals are faster
than one big memcpy, no ?

> But it's a judgment call.

If we have a clean C interface + separate PG binding we may write
several different modules for different scenarios and let the user
choose (even at startup time) - code optimized for messages that
everybody wants is bound to be suboptimal for case when they only want 1
out of 1000 messages. Same for different message sizes.

-------------
Hannu





Re: listen/notify argument (old topic revisited)

От
nconway@klamath.dyndns.org (Neil Conway)
Дата:
On Tue, Jul 02, 2002 at 05:35:42PM -0400, Tom Lane wrote:
> 1. To send NOTIFY: grab write lock on shared-memory circular buffer.
> If enough space, insert message, release lock, send signal, done.
> If not enough space, release lock, send signal, sleep some small
> amount of time, and then try again.  (Hard failure would occur only
> if the proposed message size exceeds the buffer size; as long as we
> make the buffer size a parameter, this is the DBA's fault not ours.)

How would this interact with the current transactional behavior of
NOTIFY? At the moment, executing a NOTIFY command only stores the
pending notification in a List in the backend you're connected to;
when the current transaction commits, the NOTIFY is actually
processed (stored in pg_listener, SIGUSR2 sent, etc) -- if the
transaction is rolled back, the NOTIFY isn't sent. If we do the
actual insertion when the NOTIFY is executed, I don't see a simple
way to get this behavior...

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC




Re: listen/notify argument (old topic revisited)

От
Tom Lane
Дата:
Hannu Krosing <hannu@tm.ee> writes:
> but we are already attracting a thundering herd by
> sending a signal to all _possibly_ interested backends at the same time

That's why it's so important that the readers use a sharable lock.  The
only thing they'd be locking out is some new writer trying to send (yet
another) notify.

Also, it's a pretty important optimization to avoid signaling backends
that are not listening for any notifies at all.

We could improve on it further by keeping info in shared memory about
which backends are listening for which notify names, but I don't see
any good way to do that in a fixed amount of space.
        regards, tom lane




Re: listen/notify argument (old topic revisited)

От
Tom Lane
Дата:
nconway@klamath.dyndns.org (Neil Conway) writes:
> How would this interact with the current transactional behavior of
> NOTIFY?

No change.  Senders would only insert notify messages into the shared
buffer when they commit (uncommited notifies would live in a list in
the sender, same as now).  Readers would be expected to remove messages
from the shared buffer ASAP after receiving the signal, but they'd
store those messages internally and not forward them to the client until
such time as they're not inside a transaction block.
        regards, tom lane




Re: listen/notify argument (old topic revisited)

От
Hannu Krosing
Дата:
On Wed, 2002-07-03 at 17:48, Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> > but we are already attracting a thundering herd by
> > sending a signal to all _possibly_ interested backends at the same time
> 
> That's why it's so important that the readers use a sharable lock.  The
> only thing they'd be locking out is some new writer trying to send (yet
> another) notify.

But there must be some way to communicate the positions of read pointers
of all backends for managing the free space, lest we are unable to know
when the buffer is full. 

I imagined that at least this info was kept in share memory.

> Also, it's a pretty important optimization to avoid signaling backends
> that are not listening for any notifies at all.

But of little help when they are all listening to something ;)

> We could improve on it further by keeping info in shared memory about
> which backends are listening for which notify names, but I don't see
> any good way to do that in a fixed amount of space.

A compromize would be to do it for some fixed amount of mem (say 10
names/backend) and assume "all" if out of that memory.

Notifying everybody has less bad effects when backends listen to more
names and keeping lists is pure overhead when all listeners listen to
all names.

--------------
Hannu




Re: listen/notify argument (old topic revisited)

От
Tom Lane
Дата:
Hannu Krosing <hannu@tm.ee> writes:
> On Wed, 2002-07-03 at 17:48, Tom Lane wrote:
>> That's why it's so important that the readers use a sharable lock.  The
>> only thing they'd be locking out is some new writer trying to send (yet
>> another) notify.

> But there must be some way to communicate the positions of read pointers
> of all backends for managing the free space, lest we are unable to know
> when the buffer is full. 

Right.  But we play similar games already with the existing SI buffer,
to wit:

Writers grab the controlling lock LW_EXCLUSIVE, thereby having sole
access; in this state it's safe for them to examine all the read
pointers as well as examine/update the write pointer (and of course
write data into the buffer itself).  The furthest-back read pointer
limits what they can write.

Readers grab the controlling lock LW_SHARED, thereby ensuring there
is no writer (but there may be other readers).  In this state they
may examine the write pointer (to see how much data there is) and
may examine and update their own read pointer.  This is safe and
useful because no reader cares about any other's read pointer.

>> We could improve on it further by keeping info in shared memory about
>> which backends are listening for which notify names, but I don't see
>> any good way to do that in a fixed amount of space.

> A compromize would be to do it for some fixed amount of mem (say 10
> names/backend) and assume "all" if out of that memory.

I thought of that too, but it's not clear how much it'd help.  The
writer would have to scan through all the per-reader data while holding
the write lock, which is not good for concurrency.  On SMP hardware it
could actually be a net loss.  Might be worth experimenting with though.

You could make a good reduction in the shared-memory space needed by
storing just a hash code for the interesting names, and not the names
themselves.  (I'd also be inclined to include the hash code in the
transmitted message, so that readers could more quickly ignore
uninteresting messages.)
        regards, tom lane




Re: listen/notify argument (old topic revisited)

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> themselves.  (I'd also be inclined to include the hash code in the
> >> transmitted message, so that readers could more quickly ignore
> >> uninteresting messages.)
> 
> > Doesn't seem worth it, and how would the user know their hash;
> 
> This is not the user's problem; it is the writing backend's
> responsibility to compute and add the hash.  Basically we trade off some
> space to compute the hash code once at the writer not N times at all the
> readers.

Oh, OK.  When you said "transmitted", I thought you meant transmitted to
the client.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 




Re: listen/notify argument (old topic revisited)

От
Bruce Momjian
Дата:
Tom Lane wrote:
> themselves.  (I'd also be inclined to include the hash code in the
> transmitted message, so that readers could more quickly ignore
> uninteresting messages.)

Doesn't seem worth it, and how would the user know their hash;  they
already have a C string for comparison.  Do we have to handle possible
hash collisions?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 




Re: listen/notify argument (old topic revisited)

От
Hannu Krosing
Дата:
On Wed, 2002-07-03 at 22:43, Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> > On Wed, 2002-07-03 at 17:48, Tom Lane wrote:
> >> That's why it's so important that the readers use a sharable lock.  The
> >> only thing they'd be locking out is some new writer trying to send (yet
> >> another) notify.
> 
> > But there must be some way to communicate the positions of read pointers
> > of all backends for managing the free space, lest we are unable to know
> > when the buffer is full. 
> 
> Right.  But we play similar games already with the existing SI buffer,
> to wit:
> 
> Writers grab the controlling lock LW_EXCLUSIVE, thereby having sole
> access; in this state it's safe for them to examine all the read
> pointers as well as examine/update the write pointer (and of course
> write data into the buffer itself).  The furthest-back read pointer
> limits what they can write.

It means a full seq scan over pointers ;)

> Readers grab the controlling lock LW_SHARED, thereby ensuring there
> is no writer (but there may be other readers).  In this state they
> may examine the write pointer (to see how much data there is) and
> may examine and update their own read pointer.  This is safe and
> useful because no reader cares about any other's read pointer.

OK. Now, how will we introduce transactional behaviour to this scheme ?

It is easy to save transaction id with each notify message, but is there
a quick way for backends to learn when these transactions commit/abort
or if they have done either in the past ?

Is there already a good common facility for that, or do I just need to
examine some random tuples in hope of finding out ;)

--------------
Hannu






Re: listen/notify argument (old topic revisited)

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> themselves.  (I'd also be inclined to include the hash code in the
>> transmitted message, so that readers could more quickly ignore
>> uninteresting messages.)

> Doesn't seem worth it, and how would the user know their hash;

This is not the user's problem; it is the writing backend's
responsibility to compute and add the hash.  Basically we trade off some
space to compute the hash code once at the writer not N times at all the
readers.
        regards, tom lane




Re: listen/notify argument (old topic revisited)

От
Tom Lane
Дата:
Hannu Krosing <hannu@tm.ee> writes:
>> Right.  But we play similar games already with the existing SI buffer,
>> to wit:

> It means a full seq scan over pointers ;)

I have not seen any indication that the corresponding scan in the SI
code is a bottleneck --- and that has to scan over *all* backends,
without even the opportunity to skip those that aren't LISTENing.

> OK. Now, how will we introduce transactional behaviour to this scheme ?

It's no different from before --- notify messages don't get into the
buffer at all, until they're committed.  See my earlier response to Neil.
        regards, tom lane