Обсуждение: vacuumdb: add --dry-run

Поиск
Список
Период
Сортировка

vacuumdb: add --dry-run

От
Corey Huinker
Дата:
This is a small patch to add a new option to vacuumdb to answer the question "what commands will actually be run by this combination of command-line switches against this database?" without actually running the commands.

Including Nathan because we had previously discussed the utility of just such a thing.
Вложения

Re: vacuumdb: add --dry-run

От
Nathan Bossart
Дата:
On Mon, Nov 10, 2025 at 02:44:41PM -0500, Corey Huinker wrote:
> This is a small patch to add a new option to vacuumdb to answer the
> question "what commands will actually be run by this combination of
> command-line switches against this database?" without actually running the
> commands.

My attempts to test this all got stuck in wait_on_slots().  I haven't
looked too closely, but I suspect the issue is that the socket never
becomes readable because we don't send a query.  If I set free_slot->inUse
to false before printing the command, it no longer hangs.  We probably want
to create a function in parallel_slot.c to mark slots that we don't intend
to give a query as idle.

-- 
nathan



Re: vacuumdb: add --dry-run

От
Corey Huinker
Дата:

My attempts to test this all got stuck in wait_on_slots().  I haven't
looked too closely, but I suspect the issue is that the socket never
becomes readable because we don't send a query.  If I set free_slot->inUse
to false before printing the command, it no longer hangs.  We probably want
to create a function in parallel_slot.c to mark slots that we don't intend
to give a query as idle.

Would that be preferable to skipping the creation of extra connections for parallel workers? I can see it both ways. On the one hand we want to give as true a reflection of "what would happen with these options", and on the other hand one could view the creation of extra workers as "real" vs a dry run.

 

Re: vacuumdb: add --dry-run

От
Nathan Bossart
Дата:
On Mon, Nov 10, 2025 at 05:33:34PM -0500, Corey Huinker wrote:
>> My attempts to test this all got stuck in wait_on_slots().  I haven't
>> looked too closely, but I suspect the issue is that the socket never
>> becomes readable because we don't send a query.  If I set free_slot->inUse
>> to false before printing the command, it no longer hangs.  We probably want
>> to create a function in parallel_slot.c to mark slots that we don't intend
>> to give a query as idle.
> 
> Would that be preferable to skipping the creation of extra connections for
> parallel workers? I can see it both ways. On the one hand we want to give
> as true a reflection of "what would happen with these options", and on the
> other hand one could view the creation of extra workers as "real" vs a dry
> run.

I think what I'm proposing actually does skip creating extra connections.
If we're immediately marking the first connection as idle, each loop
iteration should reuse the same connection.

BTW it might be better to modify run_vacuum_command() to skip running the
command in dry-run mode.  That would also take care of the
ONLY_DATABASE_STATS stuff.  We should probably do something about the
executeCommand() for --analyze-in-stages, too.

-- 
nathan



Re: vacuumdb: add --dry-run

От
Nathan Bossart
Дата:
I created a commitfest entry for this one.

    https://commitfest.postgresql.org/patch/6230/

-- 
nathan



Re: vacuumdb: add --dry-run

От
Chao Li
Дата:

> On Nov 11, 2025, at 06:33, Corey Huinker <corey.huinker@gmail.com> wrote:
>
>
> My attempts to test this all got stuck in wait_on_slots().  I haven't
> looked too closely, but I suspect the issue is that the socket never
> becomes readable because we don't send a query.  If I set free_slot->inUse
> to false before printing the command, it no longer hangs.  We probably want
> to create a function in parallel_slot.c to mark slots that we don't intend
> to give a query as idle.
>
> Would that be preferable to skipping the creation of extra connections for parallel workers? I can see it both ways.
Onthe one hand we want to give as true a reflection of "what would happen with these options", and on the other hand
onecould view the creation of extra workers as "real" vs a dry run. 
>
>

I test the patch, but my first run was with no luck:

```
% vacuumdb --dry-run --echo -d evantest
SELECT pg_catalog.set_config('search_path', '', false);
vacuumdb: vacuuming database "evantest"
RESET search_path;
SELECT c.relname, ns.nspname FROM pg_catalog.pg_class c
 JOIN pg_catalog.pg_namespace ns ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid
 CROSS JOIN LATERAL (SELECT c.relkind IN ('p', 'I')) as p (inherited)
 LEFT JOIN pg_catalog.pg_class t ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid
 WHERE c.relpersistence OPERATOR(pg_catalog.!=) 't'
 AND c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm'])
 ORDER BY c.relpages DESC;
SELECT pg_catalog.set_config('search_path', '', false);
VACUUM (SKIP_DATABASE_STATS) pg_catalog.pg_proc; (not executed)

^CCancel request sent
^CCancel request sent
^CCancel request sent
^CCancel request sent
```

After skip the first vacuum command, the process got stuck, and ctrl-c couldn’t break it.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: vacuumdb: add --dry-run

От
Corey Huinker
Дата:
> My attempts to test this all got stuck in wait_on_slots().  I haven't
> looked too closely, but I suspect the issue is that the socket never
> becomes readable because we don't send a query.  If I set free_slot->inUse
> to false before printing the command, it no longer hangs.  We probably want
> to create a function in parallel_slot.c to mark slots that we don't intend
> to give a query as idle.
>
> Would that be preferable to skipping the creation of extra connections for parallel workers? I can see it both ways. On the one hand we want to give as true a reflection of "what would happen with these options", and on the other hand one could view the creation of extra workers as "real" vs a dry run.
>



Now with zero hangs and some test cases. I didn't create a function (yet) as it seemed trivial.
Вложения

Re: vacuumdb: add --dry-run

От
Nathan Bossart
Дата:
On Wed, Nov 19, 2025 at 05:23:48PM -0500, Corey Huinker wrote:
> Now with zero hangs and some test cases. I didn't create a function (yet)
> as it seemed trivial.

I still think it could be worth moving the dry-run code into
run_vacuum_command() (which might entail moving the calls to
ParallelSlotSetHandler() there, too).  We can probably piggy-back on the
"if (echo)" branch in that function.

Also, we can probably skip the executeCommand() calls for
--analyze-in-stages.

-- 
nathan



Re: vacuumdb: add --dry-run

От
Corey Huinker
Дата:
On Wed, Nov 19, 2025 at 5:44 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Wed, Nov 19, 2025 at 05:23:48PM -0500, Corey Huinker wrote:
> Now with zero hangs and some test cases. I didn't create a function (yet)
> as it seemed trivial.

I still think it could be worth moving the dry-run code into
run_vacuum_command() (which might entail moving the calls to
ParallelSlotSetHandler() there, too).  We can probably piggy-back on the
"if (echo)" branch in that function.

We _could_ get away with moving ParallelSlotGetIdle() in there too. The only catch would be that we'd have to refactor prepare_vacuum_command() to take a serverVersionNumber parameter instead of the whole connection. Thoughts?
 

Also, we can probably skip the executeCommand() calls for
--analyze-in-stages.

+1  

Re: vacuumdb: add --dry-run

От
Corey Huinker
Дата:
On Wed, Nov 19, 2025 at 6:55 PM Corey Huinker <corey.huinker@gmail.com> wrote:
On Wed, Nov 19, 2025 at 5:44 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Wed, Nov 19, 2025 at 05:23:48PM -0500, Corey Huinker wrote:
> Now with zero hangs and some test cases. I didn't create a function (yet)
> as it seemed trivial.

I still think it could be worth moving the dry-run code into
run_vacuum_command() (which might entail moving the calls to
ParallelSlotSetHandler() there, too).  We can probably piggy-back on the
"if (echo)" branch in that function.

We _could_ get away with moving ParallelSlotGetIdle() in there too. The only catch would be that we'd have to refactor prepare_vacuum_command() to take a serverVersionNumber parameter instead of the whole connection. Thoughts?
 

Also, we can probably skip the executeCommand() calls for
--analyze-in-stages.

+1  

Here's a shopping list of incremental changes. I'm happy with whatever makes the most sense to you.
 
Вложения

Re: vacuumdb: add --dry-run

От
Nathan Bossart
Дата:
On Wed, Nov 19, 2025 at 07:54:06PM -0500, Corey Huinker wrote:
> Here's a shopping list of incremental changes. I'm happy with whatever
> makes the most sense to you.

Here is a v4 patch set.  I've made a variety of small changes.  I think
there's some room to bike-shed on the messages we send to the user to
assure them we're not actually doing anything, but this is roughly what I
imagined.

-- 
nathan

Вложения

Re: vacuumdb: add --dry-run

От
Corey Huinker
Дата:
On Thu, Nov 20, 2025 at 4:46 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Wed, Nov 19, 2025 at 07:54:06PM -0500, Corey Huinker wrote:
> Here's a shopping list of incremental changes. I'm happy with whatever
> makes the most sense to you.

Here is a v4 patch set.  I've made a variety of small changes.  I think
there's some room to bike-shed on the messages we send to the user to
assure them we're not actually doing anything, but this is roughly what I
imagined.

Everything here looks good to me. I have nothing to add.

I have no objections to, but I am curious about the factors that went into making dry_run an independent boolean rather than part of vacopts.

Re: vacuumdb: add --dry-run

От
Nathan Bossart
Дата:
On Thu, Nov 20, 2025 at 05:09:54PM -0500, Corey Huinker wrote:
> I have no objections to, but I am curious about the factors that went into
> making dry_run an independent boolean rather than part of vacopts.

My thinking was that it's closer to "echo" and "quiet" than anything in
vacuumingOptions.  Most of that stuff seems geared towards controlling the
precise behavior of the commands rather than the behavior of the
application.  TBH I think it'd be fine either way.  We could probably even
move "echo" and "quiet" into vacuumingOptions if we really wanted to.

-- 
nathan



Re: vacuumdb: add --dry-run

От
Nathan Bossart
Дата:
On Thu, Nov 20, 2025 at 04:16:13PM -0600, Nathan Bossart wrote:
> On Thu, Nov 20, 2025 at 05:09:54PM -0500, Corey Huinker wrote:
>> I have no objections to, but I am curious about the factors that went into
>> making dry_run an independent boolean rather than part of vacopts.
> 
> My thinking was that it's closer to "echo" and "quiet" than anything in
> vacuumingOptions.  Most of that stuff seems geared towards controlling the
> precise behavior of the commands rather than the behavior of the
> application.  TBH I think it'd be fine either way.  We could probably even
> move "echo" and "quiet" into vacuumingOptions if we really wanted to.

Yeah, I'm finding myself liking the idea of moving all of these things into
vacuumingOptions so that we don't have to cart around so many bool
arguments.  Here's a new patch set that does it this way.

The remaining question in my mind is where we should let the user know that
we're in dry-run mode.  The three options I see are 1) at the beginning of
vacuumdb execution, 2) in the !quiet block for each database, and 3) in
each command (via a comment).  In v5, I've added a message to all three,
but I'm eager to hear what folks think.

-- 
nathan

Вложения

Re: vacuumdb: add --dry-run

От
Corey Huinker
Дата:
Yeah, I'm finding myself liking the idea of moving all of these things into
vacuumingOptions so that we don't have to cart around so many bool
arguments.  Here's a new patch set that does it this way.

+1. Every time I modify a global variable, I hear my high school CS teacher scolding me.
 
The remaining question in my mind is where we should let the user know that
we're in dry-run mode.  The three options I see are 1) at the beginning of
vacuumdb execution, 2) in the !quiet block for each database, and 3) in
each command (via a comment).  In v5, I've added a message to all three,
but I'm eager to hear what folks think.

Looking at them, I think they're all good. I think #3 is a must-have in all circumstances. I wouldn't be mad if we removed #1 or #2, but I see the value in each of them.

Re: vacuumdb: add --dry-run

От
Nathan Bossart
Дата:
On Thu, Dec 04, 2025 at 06:40:42PM -0500, Corey Huinker wrote:
> Looking at them, I think they're all good. I think #3 is a must-have in all
> circumstances. I wouldn't be mad if we removed #1 or #2, but I see the
> value in each of them.

Alright.  I think these are ready to go, but I'll wait for a bit in case
anyone else has feedback.

-- 
nathan



Re: vacuumdb: add --dry-run

От
Kirill Reshke
Дата:
On Fri, 5 Dec 2025 at 02:52, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Thu, Nov 20, 2025 at 04:16:13PM -0600, Nathan Bossart wrote:
> > On Thu, Nov 20, 2025 at 05:09:54PM -0500, Corey Huinker wrote:
> >> I have no objections to, but I am curious about the factors that went into
> >> making dry_run an independent boolean rather than part of vacopts.
> >
> > My thinking was that it's closer to "echo" and "quiet" than anything in
> > vacuumingOptions.  Most of that stuff seems geared towards controlling the
> > precise behavior of the commands rather than the behavior of the
> > application.  TBH I think it'd be fine either way.  We could probably even
> > move "echo" and "quiet" into vacuumingOptions if we really wanted to.
>
> Yeah, I'm finding myself liking the idea of moving all of these things into
> vacuumingOptions so that we don't have to cart around so many bool
> arguments.  Here's a new patch set that does it this way.
>
> The remaining question in my mind is where we should let the user know that
> we're in dry-run mode.  The three options I see are 1) at the beginning of
> vacuumdb execution, 2) in the !quiet block for each database, and 3) in
> each command (via a comment).  In v5, I've added a message to all three,
> but I'm eager to hear what folks think.
>
> --
> nathan


Hi!

1)
>  +     <varlistentry>
> +      <term><option>--dry-run</option></term>
> +      <listitem>
> +       <para>
> +        Print, but do not execute, the vacuum and analyze commands that would
> +        have been sent to the server (performs a dry run).
> +       </para>
> +      </listitem>
> +     </varlistentry>


I compared this smgl section to analogous sections of server utilities
(pg_dump, pg_resetwal, pg_rewind), none of them mentions "dry run" in
description of what "--dry run" is. So, I think my feeling is that
this `(performs a dry run).` is unneeded is correct.

2)

>  - printf(_("%s: vacuuming database \"%s\"\n"),
> -   progname, PQdb(conn));
> + printf(_("%s: vacuuming database \"%s\"%s\n"),
> +   progname, PQdb(conn),
> +   vacopts->dry_run ? " (dry-run)" : "");


I am also not sure we need this change. Look:

```
reshke@yezzey-cbdb-bench:~/pg$ ./bin/bin/vacuumdb --dry-run
vacuumdb: Executing in dry-run mode.
vacuumdb: vacuuming database "reshke" (dry-run)
```

We have two lines which say the same. Well, maybe there is value in
this change, if we are vacuuming multiple databases, but given that
--dry-run produces a lot of
`VACUUM ...  -- not executed` output, I think It will be obvious that
this vacuumdb run does not modify the system. WDYT?

Overall, 0001 and 0003 are ok, I don't have an opinion on 0002.

-- 
Best regards,
Kirill Reshke



Re: vacuumdb: add --dry-run

От
Nathan Bossart
Дата:
On Sat, Dec 06, 2025 at 12:56:22AM +0500, Kirill Reshke wrote:
> Hi!

Thanks for reviewing.

>> +     <varlistentry>
>> +      <term><option>--dry-run</option></term>
>> +      <listitem>
>> +       <para>
>> +        Print, but do not execute, the vacuum and analyze commands that would
>> +        have been sent to the server (performs a dry run).
>> +       </para>
>> +      </listitem>
>> +     </varlistentry>
> 
> I compared this smgl section to analogous sections of server utilities
> (pg_dump, pg_resetwal, pg_rewind), none of them mentions "dry run" in
> description of what "--dry run" is. So, I think my feeling is that
> this `(performs a dry run).` is unneeded is correct.

I borrowed this from pg_archivecleanup's documentation.  But to your point,
there doesn't seem to be a tremendous amount of consistency in the dry-run
options for various utilities.

>>  - printf(_("%s: vacuuming database \"%s\"\n"),
>> -   progname, PQdb(conn));
>> + printf(_("%s: vacuuming database \"%s\"%s\n"),
>> +   progname, PQdb(conn),
>> +   vacopts->dry_run ? " (dry-run)" : "");
> 
> I am also not sure we need this change. Look:
> 
> ```
> reshke@yezzey-cbdb-bench:~/pg$ ./bin/bin/vacuumdb --dry-run
> vacuumdb: Executing in dry-run mode.
> vacuumdb: vacuuming database "reshke" (dry-run)
> ```
> 
> We have two lines which say the same. Well, maybe there is value in
> this change, if we are vacuuming multiple databases, but given that
> --dry-run produces a lot of
> `VACUUM ...  -- not executed` output, I think It will be obvious that
> this vacuumdb run does not modify the system. WDYT?

I guess we could probably remove the top-level "Executing in dry-run mode"
message, provided we say the same thing in the per-database message.
However, the latter can be turned off with --quiet.  Maybe we should
consider disallowing --quiet and --dry-run.

Overall, I can't claim to have super principled arguments about where I've
added these dry-run messages.  I kind-of just sprinkled them around.

-- 
nathan



Re: vacuumdb: add --dry-run

От
Kirill Reshke
Дата:
On Sat, 6 Dec 2025 at 01:08, Nathan Bossart <nathandbossart@gmail.com> wrote:

>
> Overall, I can't claim to have super principled arguments about where I've
> added these dry-run messages.  I kind-of just sprinkled them around.
>
> --


Yep, sure, just polishing a patch. Let's remove the top-level message, indeed.


-- 
Best regards,
Kirill Reshke



Re: vacuumdb: add --dry-run

От
Corey Huinker
Дата:
I guess we could probably remove the top-level "Executing in dry-run mode"
message, provided we say the same thing in the per-database message.
However, the latter can be turned off with --quiet.  Maybe we should
consider disallowing --quiet and --dry-run.

--quiet does appear to be the sworn enemy of "tell me what you would have done if you were really gonna do it". So yeah, disallowing that combination would make sense.


Re: vacuumdb: add --dry-run

От
Nathan Bossart
Дата:
On Fri, Dec 05, 2025 at 03:34:12PM -0500, Corey Huinker wrote:
>> I guess we could probably remove the top-level "Executing in dry-run mode"
>> message, provided we say the same thing in the per-database message.
>> However, the latter can be turned off with --quiet.  Maybe we should
>> consider disallowing --quiet and --dry-run.
> 
> --quiet does appear to be the sworn enemy of "tell me what you would have
> done if you were really gonna do it". So yeah, disallowing that combination
> would make sense.

On the other hand, --quiet is handy if you're trying to save the output of
--dry-run elsewhere to run later.  (Of course, if you use --all, that
output won't be tremendously useful.)

-- 
nathan



Re: vacuumdb: add --dry-run

От
Chao Li
Дата:
Hi Nathan,

I just reviewed v5, and overall looks very good patch quality. Just a few nit comments on 0001 and 0003.

> On Dec 5, 2025, at 05:52, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
>
> --
> nathan
>
<v5-0001-Move-some-vacuumdb-options-to-vacopts-struct.patch><v5-0002-Add-ParallelSlotSetIdle.patch><v5-0003-Add-dry-run-to-vacuumdb.patch>


1 - 0001
```
    /* initialize options */
memset(&vacopts, 0, sizeof(vacopts));
vacopts.objfilter = 0; /* no filter */
vacopts.parallel_workers = -1;
vacopts.buffer_usage_limit = NULL;
vacopts.no_index_cleanup = false;
vacopts.force_index_cleanup = false;
vacopts.do_truncate = true;
vacopts.process_main = true;
vacopts.process_toast = true;
```

Now echo and print are moved into vacopts and their default values are false. Here, memset() have properly initialized
theirvalues. But this piece of code still explicitly set boolean values to vacopts fields. So, to make it consistent, I
feelwe can also add explicit assignments to echo and print here, or remove those “false” assignments. This is not a
correctnessissue, just to keep in a consistent style. 

2 - 0003
```
+    if (echo || dry_run)
+        printf("%s%s\n", sql, dry_run ? "  -- not executed" : "");
```

There are two white-spaces before “--“, I think one is enough, In the other place of 0003, you just one white-space
before“--“. 

3 - 0003
```
@@ -1001,15 +1009,19 @@ prepare_vacuum_command(PGconn *conn, PQExpBuffer sql,
  * Any errors during command execution are reported to stderr.
  */
 static void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
-                   const char *table)
+run_vacuum_command(ParallelSlot *free_slot, const char *sql,
+                   bool echo, bool dry_run, const char *table)
 {
```

Here are two comments:

* As run_vacuum_command() takes both echo and dry_run, and both of them are defined in vcaopts, why not change this
functionto take a const vcaopts * instead of two bools? 

* The function comment needs to be updated. Now it won’t always send a command to server, with “dry_run”, it behaves
differently.

4 - 0003
```
+    if (vacopts.dry_run)
+        pg_log_info("Executing in dry-run mode.”);
```

Feels like “Running” is better than “Executing”.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/