On Sat, Nov 13, 2010 at 10:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> One idea I've had is that we might want to think about defining an
>> operation that is effectively "store, with a memory barrier". For
>> example, on x86, this could be implemented using xchg. I think if you
>> have a single-word variable in shared memory that is always updated
>> using a locked store, then individual backends should be able to read
>> it unlocked without risk of seeing a stale value.
>
> You're still guilty of fuzzy thinking here. What does "stale" mean?
Well, that obviously depends on what algorithm you use to add and
remove items from the array. I know my thinking is fuzzy; I was
trying to say "I'm mulling over whether we could do something like
this" rather than "I know exactly how to make this work".
*thinks it over*
Here's an algorithm that might work. Let's assume that we're still
going to allocate an array of size max_wal_senders, but we want to
make max_wal_senders relatively large butl avoid iterating over the
entire array on each commit. Store a variable in shared memory called
FirstWalSenderOffset which is always updating using a memory barrier
operation. This value is -1 if there are no currently connected WAL
senders, or the array slot of a currently connected walsender if there
is one. Each array slot also has a structure member called
NextWalSenderOffset, so that the list of connect WAL senders is
effectively stored as a linked list, but with array slot numbers
rather than pointers.
To add a WAL sender, we initialize the new array slot, setting
NextWalSenderOffset to FirstWalSenderOffset, and then set
FirstWalSenderOffset to the slot number of the newly allocated slot.
To remove a WAL sender, we loop through the notional linked list and,
when we find the "pointer" to the slot we want to remove, we override
it with a pointer to the slot to which the removed entry points. All
of these stores are done using the store-with-memory-barrier, so that
readers can iterate over the linked list without a lock. However, we
can't let two processes try to MODIFY the list at the same time (at
least, not without a more powerful memory ordering primitive; COMPXCHG
might be enough) so just protect list modification with a global
spinlock; updates won't be frequent.
This algorithm makes the assumption that it's OK for a scan to miss
WAL senders that are added mid-scan. But the current code makes that
assumption, too: a new WAL sender could grab an array slot we've
already passed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company