Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows
Дата
Msg-id 20240111.173322.1809044112677090191.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Thanks for restarting this thread.

At Tue, 9 Jan 2024 09:40:23 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Fri, Jan 05, 2024 at 02:58:55PM -0500, Robert Haas wrote:
> > I'm not a Windows expert, but my guess is that 0001 is a very good
> > idea. I hope someone who is a Windows expert will comment on that.
> 
> I am +1 on 0001.  It is just something we've never anticipated when
> these wrappers around cmd in pg_ctl were written.

Thanks for committing it.

> > 0002 seems problematic to me. One potential issue is that it would
> > break if someone renamed postgres.exe to something else -- although
> > that's probably not really a serious problem.
> 
> We do a find_other_exec_or_die() on "postgres" with what could be a
> custom execution path.  So we're actually sure that the binary will be
> there in the start path, no?  I don't like much the hardcoded
> dependency to .exe here.

The patch doesn't care of the path for postgres.exe. If you are referring to the code you cited below, it's for another
reason.I'll describe that there.
 

> > A bigger issue is that
> > it seems like it would break if someone used pg_ctl to start several
> > instances in different data directories on the same machine. If I'm
> > understanding correctly, that currently mostly works, and this would
> > break it.
> 
> Not having the guarantee that a single shell_pid is associated to a
> single postgres.exe would be a problem.  Now the patch includes this
> code:
> +        if (ppe.th32ParentProcessID == shell_pid &&
> +            strcmp("postgres.exe", ppe.szExeFile) == 0)
> +        {
> +            if (pm_pid != ppe.th32ProcessID && pm_pid != 0)
> +                multiple_children = true;
> +            pm_pid = ppe.th32ProcessID;
> +        }
> 
> Which is basically giving this guarantee?  multiple_children should
> never happen once the autorun part is removed.  Is that right?

The patch indeed ensures the relationship between the parent
pg_ctl.exe and postgres.exe.  However, for some reason, in my Windows
11 environment with the /D option specified, I observed that another
cmd.exe is spawned as the second child process of the parent
cmd.exe. This is why there is a need to verify the executable file
name. I have no idea how the second cmd.exe is being spawned.

> +        * The launcher shell might start other cmd.exe instances or programs
> +        * besides postgres.exe. Veryfying the program file name is essential.
> 
> With the autorun part of cmd.exe removed, what's still relevant here?

Yes, if the strcmp() is commented out, multiple_children sometimes
becomes true..

> s/Veryfying/Verifying/.

Oops!

> Perhaps 0002 should make more efforts in documenting things like
> th32ProcessID and th32ParentProcessID.

Is it correct to understand that you are requesting changes as follows?

--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -1995,11 +1995,14 @@ pgwin32_find_postmaster_pid(pid_t shell_pid)
      *
      * Check for duplicate processes to ensure reliability.
      *
-      * The launcher shell might start other cmd.exe instances or programs
-     * besides postgres.exe. Verifying the program file name is essential.
-     *
-     * The launcher shell process isn't checked in this function.  It will be
-     * checked by the caller.
+     * The ppe entry to be examined is identified by th32ParentProcessID, which
+     * should correspond to the cmd.exe process that executes the postgres.exe
+     * binary. Additionally, th32ProcessID in the same entry should be the PID
+     * of the launched postgres.exe. However, even though we have launched the
+     * parent cmd.exe with the /D option specified, it is sometimes observed
+     * that another cmd.exe is launched for unknown reasons. Therefore, it is
+     * crucial to verify the program file name to avoid returning the wrong
+     * PID.
      */


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bertrand Drouvot
Дата:
Сообщение: Re: Synchronizing slots from primary to standby
Следующее
От: Nisha Moond
Дата:
Сообщение: Improve the connection failure error messages