Re: dynamic background workers

Поиск
Список
Период
Сортировка
От Markus Wanner
Тема Re: dynamic background workers
Дата
Msg-id 51C30A2F.2090203@bluegap.ch
обсуждение исходный текст
Ответ на dynamic background workers  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: dynamic background workers  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Robert,

On 06/14/2013 11:00 PM, Robert Haas wrote:
> Parallel query, or any subset of that project such as parallel sort,
> will require a way to start background workers on demand.

thanks for continuing this, very much appreciated. Postgres-R and thus
TransLattice successfully use a similar approach for years, now.

I only had a quick glance over the patch, yet. Some comments on the design:

> This requires some communication channel from ordinary workers to the
> postmaster, because it is the postmaster that must ultimately start
> the newly-registered workers.  However, that communication channel has
> to be designed pretty carefully, lest a shared memory corruption take
> out the postmaster and lead to inadvertent failure to restart after a
> crash.  Here's how I implemented that: there's an array in shared
> memory of a size equal to max_worker_processes.  This array is
> separate from the backend-private list of workers maintained by the
> postmaster, but the two are kept in sync.  When a new background
> worker registration is added to the shared data structure, the backend
> adding it uses the existing pmsignal mechanism to kick the postmaster,
> which then scans the array for new registrations.

That sounds like a good simplification. Even if it's an O(n) operation,
the n in question here has relatively low practical limits. It's
unlikely to be much of a concern, I guess.

Back then, I solved it by having a "fork request slot". After starting,
the bgworker then had to clear that slot and register with a coordinator
process (i.e. the original requestor), so that one learned its fork
request was successful. At some point I expanded that to multiple
request slots to better handle multiple concurrent fork requests.
However, it was difficult to get right and requires more IPC than your
approach.

On the pro side: The shared memory area used by the postmaster was very
small in size and read-only to the postmaster. These were my main goals,
which I'm not sure were the best ones, now that I read your concept.

> I have attempted to
> make the code that transfers the shared_memory state into the
> postmaster's private state as paranoid as humanly possible.  The
> precautions taken are documented in the comments.  Conversely, when a
> background worker flagged as BGW_NEVER_RESTART is considered for
> restart (and we decide against it), the corresponding slot in the
> shared memory array is marked as no longer in use, allowing it to be
> reused for a new registration.

Sounds like the postmaster is writing to shared memory. Not sure why
I've been trying so hard to avoid that, though. After all, it can hardly
hurt itself *writing* to shared memory.

> Since the postmaster cannot take locks, synchronization between the
> postmaster and other backends using the shared memory segment has to
> be lockless.  This mechanism is also documented in the comments.  An
> lwlock is used to prevent two backends that are both registering a new
> worker at about the same time from stomping on each other, but the
> postmaster need not care about that lwlock.
> 
> This patch also extends worker_spi as a demonstration of the new
> interface.  With this patch, you can CREATE EXTENSION worker_spi and
> then call worker_spi_launch(int4) to launch a new background worker,
> or combine it with generate_series() to launch a bunch at once.  Then
> you can kill them off with pg_terminate_backend() and start some new
> ones.  That, in my humble opinion, is pretty cool.

It definitely is. Thanks again.

Regards

Markus Wanner



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: dynamic background workers
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls