Обсуждение: Unifying VACUUM VERBOSE and log_autovacuum_min_duration output
I think that it's worth unifying VACUUM VERBOSE and log_autovacuum_min_duration output, to remove the redundancy, and to provide more useful VACUUM VERBOSE output. Both variants already output approximately the same things. But, each variant reports on certain details that the other variant lacks. I find the extra information provided by log_autovacuum_min_duration far more useful than the extra details provided by VACUUM VERBOSE. This is probably because we've focussed on improving the former over the latter, probably because autovacuum is much more interesting than VACUUM on average, in practice, to users. Unifying everything cannot be approached mechanically, so doing this requires real buy-in. It's a bit tricky because VACUUM VERBOSE is supposed to show real time information about what just finished, as a kind of rudimentary progress indicator, while log_autovacuum_* summarizes the whole entire VACUUM operation. This difference is most notable when there are multiple index vacuuming passes ("index scans"), or when we truncate the heap relation. My preferred approach to this is simple: redefine VACUUM VERBOSE to not show incremental output, which seems rather unhelpful anyway. This does mean that VACUUM VERBOSE won't show certain information that might occasionally be useful to hackers. For example, there is detailed information about how rel truncation happened in the VERBOSE output, and detailed information about how many index tuples were deleted by each round of index vacuuming, for each individual index. We can keep this extra information as DEBUG2 messages, as in the current !VERBOSE case (or perhaps make some of them DEBUG1). I don't think that we need to keep the getrusage() stuff at all, though. I think that this would significantly improve VACUUM VERBOSE output, especially for users, but also for hackers. Here are my reasons, in detail: * We have pg_stat_progress_vacuum these days. * VACUUM VERBOSE doesn't provide much of the most useful instrumentation that we have available in log_autovacuum_min_duration, and yet produces output that is ludicrously, unmanageably verbose -- lots of pg_rusage_show() information for each and every step, which just isn't useful. * I really miss the extra stuff that log_autovacuum_min_duration provides when I run VACUUM VERBOSE, though. * In practice having multiple rounds of index vacuuming is quite rare these days. And when it does happen it's interesting because it happened at all -- I don't really care about the breakdown beyond that. If I ever do care about the very fine details, I can easily set client_min_messages to DEBUG2 on that one occasion. * The fact that VACUUM VERBOSE will no longer report on IndexBulkDeleteResult.num_index_tuples and IndexBulkDeleteResult.tuples_removed seems like no great loss to me -- the fact that the number might be higher or lower for an index typically means very little these days, with the improvements made to index tuple deletion. VERBOSE will still report on IndexBulkDeleteResult.pages_*, which is what really matters. VERBOSE will also report on LP_DEAD-in-heap items removed (or not removed) directly, which is a generic upper bound on tuples_removed, that applies to all indexes. * The detailed lazy_truncate_heap() instrumentation output by VACUUM VERBOSE just isn't useful outside of debugging scenarios -- it just isn't actionable to users (users only really care about how much smaller the table became through truncation). The low level details could easily be output as DEBUG1 (not DEBUG2) instead. Thoughts? -- Peter Geoghegan
On Fri, Nov 26, 2021 at 12:37:32PM -0800, Peter Geoghegan wrote: > My preferred approach to this is simple: redefine VACUUM VERBOSE to > not show incremental output, which seems rather unhelpful anyway. > I don't think that we need to keep the getrusage() stuff at all, though. +1 > * VACUUM VERBOSE doesn't provide much of the most useful > instrumentation that we have available in log_autovacuum_min_duration, > and yet produces output that is ludicrously, unmanageably verbose -- > lots of pg_rusage_show() information for each and every step, which > just isn't useful. Not only not useful/unhelpful, but confusing. It's what I complained about here. https://www.postgresql.org/message-id/flat/20191220171132.GB30414@telsasoft.com I see that lazy_scan_heap() still has a shadow variable buf... -- Justin
On Fri, Nov 26, 2021 at 12:37 PM Peter Geoghegan <pg@bowt.ie> wrote: > Unifying everything cannot be approached mechanically, so doing this > requires real buy-in. It's a bit tricky because VACUUM VERBOSE is > supposed to show real time information about what just finished, as a > kind of rudimentary progress indicator, while log_autovacuum_* > summarizes the whole entire VACUUM operation. This difference is most > notable when there are multiple index vacuuming passes ("index > scans"), or when we truncate the heap relation. I based these remarks on one sentence about VERBOSE that appears in vacuum.sgml: "When VERBOSE is specified, VACUUM emits progress messages to indicate which table is currently being processed. Various statistics about the tables are printed as well." There is a very similar sentence in analyze.sgml. It seems that I overinterpreted the word "progress" before. I now believe that VACUUM VERBOSE wasn't ever really intended to indicate the progress of one particular vacuumlazy.c-wise operation targeting one particular heap relation with storage. The VERBOSE option gives some necessary table-level structure to an unqualified "VACUUM VERBOSE" -- same as an unqualified "ANALYZE VERBOSE". The progress is explicitly table granularity progress. Nothing more. I definitely need to preserve that aspect of VERBOSE output -- obviously the output must still make it perfectly clear which particular table a given run of information relates to, especially with unqualified "VACUUM VERBOSE". Fortunately, that'll be easy. In fact, my proposal will improve things here, because now there will only be a single extra INFO line per table (so one INFO line for the table name, another with newlines for the instrumentation itself). This matches the current behavior with an unqualified "ANALYZE VERBOSE". -- Peter Geoghegan
On Fri, Nov 26, 2021 at 1:57 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > * VACUUM VERBOSE doesn't provide much of the most useful > > instrumentation that we have available in log_autovacuum_min_duration, > > and yet produces output that is ludicrously, unmanageably verbose -- > > lots of pg_rusage_show() information for each and every step, which > > just isn't useful. > > Not only not useful/unhelpful, but confusing. Also makes testing harder. > It's what I complained about here. > https://www.postgresql.org/message-id/flat/20191220171132.GB30414@telsasoft.com > > I see that lazy_scan_heap() still has a shadow variable buf... I noticed that myself. That function has had many accretions of code, over decades. I often notice things that seem like they once made sense (e.g., before we had HOT), but don't anymore. I hope to be able to pay down more technical debt in this area for Postgres 15. -- Peter Geoghegan
On Fri, Nov 26, 2021 at 12:37 PM Peter Geoghegan <pg@bowt.ie> wrote: > My preferred approach to this is simple: redefine VACUUM VERBOSE to > not show incremental output, which seems rather unhelpful anyway. This > does mean that VACUUM VERBOSE won't show certain information that > might occasionally be useful to hackers. Attached is a WIP patch doing this. One thing that's still unclear is what the new elevel should be for the ereport messages that used to be either LOG (for VACUUM VERBOSE) or DEBUG2 (for everything else) -- what should I change them to now? For now I've done taken the obvious approach of making everything DEBUG2. There is of course no reason why some messages can't be DEBUG1 instead. Some of them do seem more interesting than others (though still not particularly interesting overall). Here is an example of VACUUM VERBOSE on HEAD: pg@regression=# vacuum VERBOSE foo; INFO: vacuuming "public.foo" INFO: table "public.foo": found 0 removable, 54 nonremovable row versions in 1 out of 45 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 770 Skipped 0 pages due to buffer pins, 0 frozen pages. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. VACUUM Here's what a VACUUM VERBOSE against the same table looks like with the patch applied: pg@regression=# vacuum VERBOSE foo; INFO: vacuuming "regression.public.foo" INFO: finished vacuuming "regression.public.foo": index scans: 0 pages: 0 removed, 45 remain, 0 skipped due to pins, 0 skipped frozen tuples: 0 removed, 7042 remain, 0 are dead but not yet removable, oldest xmin: 770 index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed I/O timings: read: 0.065 ms, write: 0.000 ms avg read rate: 147.406 MB/s, avg write rate: 14.741 MB/s buffer usage: 22 hits, 10 misses, 1 dirtied WAL usage: 1 records, 1 full page images, 1401 bytes system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s VACUUM It's easy to produce examples where the patch is somewhat more verbose than HEAD (that's what you see here). It's also easy to produce examples where HEAD is *significantly* more verbose than the patch. Especially when VERBOSE shows many lines of getrusage() output (patch only ever shows one of those, at the end). Another factor is index vacuuming. With the patch, you'll only see one extra line per index, versus several lines on HEAD. I cannot find clear guidelines on multiline INFO messages lines -- all I'm really doing here is selectively making the LOG output from log_autovacuum_min_duration into INFO output for VACUUM VERBOSE (actually there are 2 INFO messages per heap relation processed). It would be nice if there was a clear message style precedent that I could point to for this. -- Peter Geoghegan
Вложения
I think the 2nd chunk here could say "if (instrument)" like the first: > @@ -482,8 +480,10 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, > TransactionId FreezeLimit; > MultiXactId MultiXactCutoff; > > - /* measure elapsed time iff autovacuum logging requires it */ > - if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0) > + verbose = (params->options & VACOPT_VERBOSE) != 0; > + instrument = (verbose || (IsAutoVacuumWorkerProcess() && > + params->log_min_duration >= 0)); > + if (instrument) ... > @@ -702,12 +705,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, > vacrel->new_dead_tuples); > pgstat_progress_end_command(); > > - /* and log the action if appropriate */ > - if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0) > - if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0) > + /* Output instrumentation where appropriate */ > + if (verbose || > + (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)) Autovacuum's format doesn't show the number of scanned pages ; it shows how many pages were skipped due to frozen bit, but not how many were skipped due to the all visible bit: > INFO: table "public.foo": found 0 removable, 54 nonremovable row versions in 1 out of 45 pages ... > INFO: finished vacuuming "regression.public.foo": index scans: 0 > pages: 0 removed, 45 remain, 0 skipped due to pins, 0 skipped frozen If the format of autovacuum output were to change, maybe it's an opportunity to show some of the stuff Jeff mentioned: |Also, I'd appreciate a report on how many hint-bits were set, and how many |pages were marked all-visible and/or frozen -- Justin
On Mon, Nov 29, 2021 at 8:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > I think the 2nd chunk here could say "if (instrument)" like the first: I agree that that would be clearer. > Autovacuum's format doesn't show the number of scanned pages ; it shows how > many pages were skipped due to frozen bit, but not how many were skipped due to > the all visible bit: That's a weird historical accident. I had planned on fixing that as part of ongoing refactoring work [1]. The short explanation for why it works that way goes like this: while it makes zero practical sense (who wants to see how many frozen pages we skipped, without also seeing merely all-visible pages skipped?), it does make some sense when your starting point is the code itself. > If the format of autovacuum output were to change, maybe it's an opportunity to > show some of the stuff Jeff mentioned: You must be referencing the thread again, from your earlier message -- you must mean Jeff Janes here. Jeff said something about the number of all-visible pages accessed (i.e. not skipped over) being implicit. For what it's worth, that isn't true in the general case -- there simply is no reliable way to see the total number of pages that were skipped using the VM, as of right now. > |Also, I'd appreciate a report on how many hint-bits were set, and how many > |pages were marked all-visible and/or frozen I will probably also add the latter in the Postgres 15 cycle. Hint-bits-set is much harder, and not likely to happen soon. [1] https://postgr.es/m/CAH2-Wznp=c=Opj8Z7RMR3G=ec3_JfGYMN_YvmCEjoPCHzWbx0g@mail.gmail.com -- Peter Geoghegan
Hi, On 2021-11-29 18:51:37 -0800, Peter Geoghegan wrote: > One thing that's still unclear is what the new elevel should be for > the ereport messages that used to be either LOG (for VACUUM VERBOSE) > or DEBUG2 (for everything else) -- what should I change them to now? > For now I've done taken the obvious approach of making everything > DEBUG2. I think some actually ended up being omitted compared to the previous state. E.g. "aggressively vacuuming ...", but I think others as well. > It's easy to produce examples where the patch is somewhat more verbose > than HEAD (that's what you see here). We could make verbose a more complicated parameter if that turns out to be a problem. E.g. controlling whether resource usage is included. > It's also easy to produce examples where HEAD is *significantly* more > verbose than the patch. Especially when VERBOSE shows many lines of > getrusage() output (patch only ever shows one of those, at the end). That's not really equivalent though? It does seem somewhat useful to be able to distinguish the cost of heap and index processing? > I cannot find clear guidelines on multiline INFO messages lines -- all > I'm really doing here is selectively making the LOG output from > log_autovacuum_min_duration into INFO output for VACUUM VERBOSE > (actually there are 2 INFO messages per heap relation processed). Using multiple messages has the clear drawback of including context/statement multiple times... But if part of the point is to be able to analyze what's currently happening there's not really an alternative. However that need probably is lessened now that we have pg_stat_progress_vacuum. > @@ -702,12 +705,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, > vacrel->new_dead_tuples); > pgstat_progress_end_command(); > > - /* and log the action if appropriate */ > - if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0) > + /* Output instrumentation where appropriate */ > + if (verbose || > + (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)) > { > TimestampTz endtime = GetCurrentTimestamp(); > > - if (params->log_min_duration == 0 || > + if (verbose || params->log_min_duration == 0 || > TimestampDifferenceExceeds(starttime, endtime, > params->log_min_duration)) > { This is quite the nest of conditions by now. Any chance of cleaning that up? > @@ -3209,7 +3144,7 @@ lazy_truncate_heap(LVRelState *vacrel) > * We failed to establish the lock in the specified number of > * retries. This means we give up truncating. > */ > - ereport(elevel, > + ereport(DEBUG2, > (errmsg("\"%s\": stopping truncate due to conflicting lock request", > vacrel->relname))); > return; > @@ -3279,12 +3214,10 @@ lazy_truncate_heap(LVRelState *vacrel) > vacrel->pages_removed += orig_rel_pages - new_rel_pages; > vacrel->rel_pages = new_rel_pages; > > - ereport(elevel, > + ereport(DEBUG2, > (errmsg("table \"%s\": truncated %u to %u pages", > vacrel->relname, > - orig_rel_pages, new_rel_pages), > - errdetail_internal("%s", > - pg_rusage_show(&ru0)))); > + orig_rel_pages, new_rel_pages))); > orig_rel_pages = new_rel_pages; > } while (new_rel_pages > vacrel->nonempty_pages && lock_waiter_detected); > } These imo are useful. Perhaps we could just make them part of some log message that autovac logging includes as well? Greetings, Andres Freund
On Fri, Dec 10, 2021 at 8:30 PM Andres Freund <andres@anarazel.de> wrote: > I think some actually ended up being omitted compared to the previous > state. E.g. "aggressively vacuuming ...", but I think others as well. The "aggressive-ness" is reported by a distinct ereport() with the patch, so you'll still see that information. You'll still be able to see when each VACUUM begins and ends, which matters in database level "VACUUM" command (a VACUUM that doesn't specify any relation, and so vacuums everything). VACUUM VERBOSE should still work as a progress indicator at the whole-VACUUM-operation level (when there will be more than a single operation per command), but it won't indicate the progress of any individual VACUUM operation anymore. That's the trade-off. To me the most notable loss of VERBOSE information is the number of index tuples deleted in each index. But even that isn't so useful, since you can already see the number of LP_DEAD items, which is a more interesting number (that applies equally to all indexes, and the table itself). > > It's easy to produce examples where the patch is somewhat more verbose > > than HEAD (that's what you see here). > > We could make verbose a more complicated parameter if that turns out to be a > problem. E.g. controlling whether resource usage is included. That's true, but I don't think that it's going to be a problem. I'd rather avoid it if possible. If we need to place some of the stuff that's currently only shown by VERBOSE to be shown by the autovacuum log output too, then that's fine. You said something about showing the number of workers launched in the autovacuum log output (also the new VERBOSE output). That could make sense. But there could be a different number of workers for cleanup and for vacuuming. Should I show both together, or just the high watermark? I think that it needs to be okay to suppress the output in the common case where parallelism isn't used (e.g., in every autovacuum). > > It's also easy to produce examples where HEAD is *significantly* more > > verbose than the patch. Especially when VERBOSE shows many lines of > > getrusage() output (patch only ever shows one of those, at the end). > > That's not really equivalent though? It does seem somewhat useful to be able > to distinguish the cost of heap and index processing? I've personally never used VACUUM VERBOSE like that. I agree that it could be useful, but I would argue that it's not worth it. I'd just use the DEBUG1 version, or more likely use my own custom microbenchmark. > This is quite the nest of conditions by now. Any chance of cleaning that up? Yes, I can simplify that code a little. > > @@ -3279,12 +3214,10 @@ lazy_truncate_heap(LVRelState *vacrel) > These imo are useful. Perhaps we could just make them part of some log > message that autovac logging includes as well? I would argue that it already does -- because you see pages removed (which is heap pages truncation). We do lose the details with the patch, of course -- you'll no longer see the progress of truncation, which works incrementally. But as I said, that's the general trade-off that the patch makes. If you can't truncate the table due to a conflicting lock request, then that might just have been for the last round of truncation. And so reporting that aspect in the whole-autovacuum log output (or in the new format VACUUM VERBOSE output) seems like it could be misleading. I went as far as removing the getrusage stuff for the ereport() messages that get demoted to DEBUG2. What do you think of that aspect? I could add some the getrusage output back where that makes sense. I don't have very strong feelings about that. -- Peter Geoghegan
On 2021-12-11 09:52:29 -0800, Peter Geoghegan wrote: > On Fri, Dec 10, 2021 at 8:30 PM Andres Freund <andres@anarazel.de> wrote: > > I think some actually ended up being omitted compared to the previous > > state. E.g. "aggressively vacuuming ...", but I think others as well. > > The "aggressive-ness" is reported by a distinct ereport() with the > patch, so you'll still see that information. But the ereport is inside an if (verbose), no?
On Sat, Dec 11, 2021 at 12:24 PM Andres Freund <andres@anarazel.de> wrote: > But the ereport is inside an if (verbose), no? Yes -- in order to report aggressiveness in VACUUM VERBOSE. But the autovacuum case still reports verbose-ness, in the same way as it always has -- in that same LOG entry. We don't want to repeat ourselves in the VERBOSE case, which will have already indicated its verboseness in the up-front ereport(). In other words, every distinct case reports on its aggressiveness exactly once per call into lazyvacuum.c. In roughly the same way as it works on HEAD. -- Peter Geoghegan
On Sat, Dec 11, 2021 at 1:13 PM Peter Geoghegan <pg@bowt.ie> wrote: > Yes -- in order to report aggressiveness in VACUUM VERBOSE. But the > autovacuum case still reports verbose-ness, in the same way as it > always has -- in that same LOG entry. We don't want to repeat > ourselves in the VERBOSE case, which will have already indicated its > verboseness in the up-front ereport(). Sorry, I meant "indicated its aggressiveness in the up-front ereport()". -- Peter Geoghegan
Hi, On 2021-12-11 13:13:56 -0800, Peter Geoghegan wrote: > On Sat, Dec 11, 2021 at 12:24 PM Andres Freund <andres@anarazel.de> wrote: > > But the ereport is inside an if (verbose), no? > > Yes -- in order to report aggressiveness in VACUUM VERBOSE. But the > autovacuum case still reports verbose-ness, in the same way as it > always has -- in that same LOG entry. We don't want to repeat > ourselves in the VERBOSE case, which will have already indicated its > verboseness in the up-front ereport(). I feel one, or both, must be missing something here. My point was that you said upthread that the patch doesn't change DEBUG2/non-verbose logging for most messages. But the fact that those messages are only emitted inside and if (verbose) seems to contradict that? Greetings, Andres Freund
On Sat, Dec 11, 2021 at 2:52 PM Andres Freund <andres@anarazel.de> wrote: > I feel one, or both, must be missing something here. My point was that you > said upthread that the patch doesn't change DEBUG2/non-verbose logging for > most messages. But the fact that those messages are only emitted inside and if > (verbose) seems to contradict that? That is technically true, but it's not true in any practical sense. Yes, there are 2 distinct ereports() per vacuumlazy.c call for VACUUM VERBOSE (i.e. 2 per relation processed by the command). Yes, only the second one is actually "shared" with log_autovacuum_* (the first one is just shows that we're processing a new relation, with the aggressiveness). But that's not very significant. The only reason that I did it that way is because there is an expectation that plain "VACUUM VERBOSE" (i.e. no target relation specified) will work as a rudimentary progress indicator at the heap rel granularity -- the VACUUM VERBOSE docs pretty much say so. As I pointed out before, the docs for VERBOSE that appear in vacuum.sgml say: "When VERBOSE is specified, VACUUM emits progress messages to indicate which table is currently being processed. Various statistics about the tables are printed as well." Having 2 ereports (not just 1) isn't essential, but it seems useful because it makes the new VACUUM VERBOSE continue to work like this. But without any of the downsides that go with seeing way too much detail, moment to moment. -- Peter Geoghegan
On Mon, Nov 29, 2021 at 6:51 PM Peter Geoghegan <pg@bowt.ie> wrote: > Attached is a WIP patch doing this. This has bitrot, so I attach v2, mostly just to keep the CFTester status green. The only real change is one minor simplification to how we set everything up, inside heap_vacuum_rel(). -- Peter Geoghegan
Вложения
On Tue, Dec 21, 2021 at 11:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I've looked at the patch and here are comments: Thanks! The patch bitrot again, so attached is a rebased version, v3. > I think we should set message_level. Otherwise, index AM will set an > invalid log level, although any index AM in core seems not to use it. Fixed. > --- > - /* > - * Update error traceback information. This is the > last phase during > - * which we add context information to errors, so we > don't need to > - * revert to the previous phase. > - */ > > Why is this comment removed? ISTM this comment is still valid. We don't "revert to the previous phase" here, which is always VACUUM_ERRCB_PHASE_SCAN_HEAP in practice, per the comment -- but that doesn't seem significant. It's not just unnecessary to do so, as the comment claims -- it actually seems *wrong*. That is, it would be wrong to go back to VACUUM_ERRCB_PHASE_SCAN_HEAP here, since we're completely finished scanning the heap at this point. There is still perhaps a small danger that somebody will forget to add a new VACUUM_ERRCB_PHASE_* for some new kind of work that happens after this point, at the very last moment. But that would be equally true if the new kind of work took place earlier, inside lazy_scan_heap(). And so the last call to update_vacuum_error_info() isn't special compared to any other update_vacuum_error_info() call (or any other call that doesn't set a saved_err_info). -- Peter Geoghegan
Вложения
On Thu, Jan 6, 2022 at 10:21 AM Peter Geoghegan <pg@bowt.ie> wrote: > The patch bitrot again, so attached is a rebased version, v3. I pushed a version of this patch earlier. This final version didn't go quite as far as v3 did: it retained a few VACUUM VERBOSE only INFO messages (it didn't demote them to DEBUG2). See the commit message for details. Thank you for your review work, Masahiko and Andres. -- Peter Geoghegan