Re: autovacuum process handling

Поиск
Список
Период
Сортировка
От Markus Schiltknecht
Тема Re: autovacuum process handling
Дата
Msg-id 45B9D1AC.9080804@bluegap.ch
обсуждение исходный текст
Ответ на Re: autovacuum process handling  (Alvaro Herrera <alvherre@commandprompt.com>)
Ответы Re: autovacuum process handling  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: autovacuum process handling  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

Alvaro Herrera wrote:
> Yeah.  For what I need, the launcher just needs to know when a worker
> has finished and how many workers there are.

Oh, so it's not all that less communication. My replication manager also 
needs to know when a worker dies. You said you are using a signal from 
manager to postmaster to request a worker to be forked. How do you do 
the other part, where the postmaster needs to tell the launcher which 
worker terminated?

>> For Postgres-R, I'm currently questioning if I shouldn't merge the 
>> replication manager process with the postmaster. Of course, that would 
>> violate the "postmaster does not touch shared memory" constraint.
> 
> I suggest you don't.  Reliability from Postmaster is very important.

Yes, so? As long as I can't restart the replication manager, but 
operation of the whole DBMS relies on it, I have to take the postmaster 
dows as soon as it detects a crashed replication manager.

So I still argue that reliability is getting better than status quo, if 
I'm merging these two processes (because of less code for communication 
between the two).

Of course, the other way to gain reliability would be to make the 
replication manager restartable. But restarting the replication manager 
means recovering data from other nodes in the cluster, thus a lot of 
network traffic. Needless to say, this is quite an expensive operation.

That's why I'm questioning, if that's the behavior we want. Isn't it 
better to force the administrators to look into the issue and probably 
replace a broken node instead of having one node going amok by 
requesting recovery over and over again, possibly forcing crashes of 
other nodes, too, because of the additional load for recovery?

>> But it would make some things a lot easier:
>>
>>  * What if the launcher/manager dies (but you potentially still have
>>    active workers)?
>>
>>    Maybe, for autovacuum you can simply restart the launcher and that
>>    one detects workers from shmem.
>>
>>    With replication, I certainly have to take down the postmaster as
>>    well, as we are certainly out of sync and can't simply restart the
>>    replication manager. So in that case, no postmaster can run without a
>>    replication manager and vice versa. Why not make it one single
>>    process, then?
> 
> Well, the point of the postmaster is that it can notice when one process
> dies and take appropriate action.  When a backend dies, the postmaster
> closes all others.  But if the postmaster crashes due to a bug in the
> manager (due to both being integrated in a single process), how do you
> close the backends?  There's no one to do it.

That's a point.

But again, as long as the replication manager won't be able to restart, 
you gain nothing by closing backends on a crashed node.

> In my case, the launcher is not critical.  It can die and the postmaster
> should just start a new one without much noise.  A worker is critical
> because it's connected to tables; it's as critical as a regular backend.
> So if a worker dies, the postmaster must take everyone down and cause a
> restart.  This is pretty easy to do.

Yeah, that's the main difference, and I see why your approach makes 
perfect sense for the autovacuum case.

In contrast, the replication manager is critical (to one node), and a 
restart is expensive (for the whole cluster).

>>  * Startup races: depending on how you start workers, the launcher/
>>    manager may get a "database is starting up" error when requesting
>>    the postmaster to fork backends.
>>    That probably also applies to autovacuum, as those workers shouldn't
>>    work concurrently to a startup process. But maybe there are other
>>    means of ensuring that no autovacuum gets triggered during startup?
> 
> Oh, this is very easy as well.  In my case the launcher just sets a
> database OID to be processed in shared memory, and then calls
> SendPostmasterSignal with a particular value.  The postmaster must only
> check this signal within ServerLoop, which means it won't act on it
> (i.e., won't start a worker) until the startup process has finished.

It seems like your launcher is perfectly fine with requesting workers 
and not getting them. The replication manager currently isn't. Maybe I 
should make it more fault tolerant in that regard...

> I guess your problem is that the manager's task is quite a lot more
> involved than my launcher's.  But in that case, it's even more important
> to have them separate.

More involved with what? It does not touch shared memory, it mainly 
keeps track of the backends states (by getting a notice from the 
postmaster) and does all the necessary forwarding of messages between 
the communication system and the backends. It's main loop is similar to 
the postmasters, mainly consisting of a select().

> I don't understand why the manager talks to postmaster.  If it doesn't,
> well, then there's no concurrency issue gone, because the remote
> backends will be talking to *somebody* anyway; be it postmaster, or
> manager.

As with your launcher, I only send one message: the worker request. But 
the other way around, from the postmaster to the replication manager, 
there are also some messages: a "database is ready" message and a 
"worker terminated" messages. Thinking about handling the restarting 
cycle, I would need to add a "database is restarting" messages, which 
has to be followed by another "database is ready" message.

For sure, the replication manager needs to keep running during a 
restarting cycle. And it needs to know the database's state, so as to be 
able to decide if it can request workers or not.

> (Maybe your problem is that the manager is not correctly designed.  We
> can talk about checking that code.  I happen to know the Postmaster
> process handling code because of my previous work with Autovacuum and
> because of Mammoth Replicator.)

Thanks for the offer, I'll get back to that.

> I think you're underestimating the postmaster's task.

Maybe, but it certainly looses importance within a cluster, since it 
controls only part of the whole database system.

> Ok.  I have one ready, and it works very well.  It only ever starts one
> worker -- I have constrained that way just to keep the current behavior
> of a single autovacuum process running at any time.  My plan is to get
> it submitted for review, and then start working on having it consider
> multiple workers and introduce more scheduling smarts.

Sounds like a good plan.

Thank you for your inputs. You made me rethink some issues and pointed 
me to some open questions.

Regards

Markus


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Piggybacking vacuum I/O
Следующее
От: Markus Schiltknecht
Дата:
Сообщение: Re: Proposal: Commit timestamp