Re: improving wraparound behavior
От | Stephen Frost |
---|---|
Тема | Re: improving wraparound behavior |
Дата | |
Msg-id | 20190504030844.GR6197@tamriel.snowman.net обсуждение исходный текст |
Ответ на | Re: improving wraparound behavior (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: improving wraparound behavior
(Andres Freund <andres@anarazel.de>)
|
Список | pgsql-hackers |
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2019-05-03 22:41:11 -0400, Stephen Frost wrote: > > I suppose it is a pretty big change in the base autovacuum launcher to > > be something that's run per database instead and then deal with the > > coordination between the two... but I can't help but feel like it > > wouldn't be that much *work*. I'm not against doing something smaller > > but was something smaller actually proposed for this specific issue..? > > I think it'd be fairly significant. And that we should redo it from > scratch if we go there - because what we have isn't worth using as a > basis. Alright, what I'm hearing here is that we should probably have a dedicated thread for this discussion, if someone has the cycles to spend on it. I'm not against that. > > > I'm thinking that we'd do something roughly like (in actual code) for > > > GetNewTransactionId(): > > > > > > TransactionId dat_limit = ShmemVariableCache->oldestXid; > > > TransactionId slot_limit = Min(replication_slot_xmin, replication_slot_catalog_xmin); > > > Transactionid walsender_limit; > > > Transactionid prepared_xact_limit; > > > Transactionid backend_limit; > > > > > > ComputeOldestXminFromProcarray(&walsender_limit, &prepared_xact_limit, &backend_limit); > > > > > > if (IsOldest(dat_limit)) > > > ereport(elevel, > > > errmsg("close to xid wraparound, held back by database %s"), > > > errdetail("current xid %u, horizon for database %u, shutting down at %u"), > > > errhint("...")); > > > else if (IsOldest(slot_limit)) > > > ereport(elevel, errmsg("close to xid wraparound, held back by replication slot %s"), > > > ...); > > > > > > where IsOldest wouldn't actually compare plainly numerically, but would > > > actually prefer showing the slot, backend, walsender, prepared_xact, as > > > long as they are pretty close to the dat_limit - as in those cases > > > vacuuming wouldn't actually solve the issue, unless the other problems > > > are addressed first (as autovacuum won't compute a cutoff horizon that's > > > newer than any of those). > > > > Where the errhint() above includes a recommendation to run the SRF > > described below, I take it? > > Not necessarily. I feel conciseness is important too, and this would be > the most imporant thing to tackle. I'm imagining a relatively rare scenario, just to be clear, where "pretty close to the dat_limit" would apply to more than just one thing. > > Also, should this really be an 'else if', or should it be just a set of > > 'if()'s, thereby giving users more info right up-front? > > Possibly? But it'd also make it even harder to read the log / the system > to keep up with logging, because we already log *so* much when close to > wraparound. Yes, we definitely log a *lot*, and probably too much since other critical messages might get lost in the noise. > If we didn't order it, it'd be hard for users to figure out which to > address first. If we ordered it, people have to further up in the log to > figure out which is the most urgent one (unless we reverse the order, > which is odd too). This makes me think we should both order it and combine it into one message... but that'd then be pretty difficult to deal with, potentially, from a translation standpoint and just from a "wow, that's a huge log message", which is kind of the idea behind the SRF- to give you all that info in a more easily digestible manner. Not sure I've got any great ideas on how to improve on this. I do think that if we know that there's multiple different things that are within a small number of xids of the oldest xmin then we should notify the user about all of them, either directly in the error messages or by referring them to the SRF, so they have the opportunity to address them all, or at least know about them all. As mentioned though, it's likely to be a quite rare thing to run into, so you'd have to be extra unlucky to even hit this case and perhaps the extra code complication just isn't worth it. Thanks, Stephen
Вложения
В списке pgsql-hackers по дате отправления: