Обсуждение: initdb's -c option behaves wrong way?

Поиск
Список
Период
Сортировка

initdb's -c option behaves wrong way?

От
Kyotaro Horiguchi
Дата:
Hello.

I noticed that -c option of initdb behaves in an unexpected
manner. Identical variable names with variations in letter casing are
treated as distinct variables.

$ initdb -cwork_mem=100 -cWORK_MEM=1000 -cWork_mem=2000
...
$ grep -i 'work_mem ' $PGDATA/postgresql.conf
work_mem = 100                          # min 64kB
#maintenance_work_mem = 64MB            # min 1MB
#autovacuum_work_mem = -1               # min 1MB, or -1 to use maintenance_work_mem
#logical_decoding_work_mem = 64MB       # min 64kB
WORK_MEM = 1000
Work_mem = 2000


The original intention was apparently to overwrite the existing
line. Furthermore, I surmise that preserving the original letter
casing is preferable.

Attached is a patch to address this issue.  To retrieve the variable
name from the existing line, the code is slightly restructured.
Alternatively, should we just down-case the provided variable names?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Вложения

Re: initdb's -c option behaves wrong way?

От
Daniel Gustafsson
Дата:
> On 28 Sep 2023, at 09:49, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

> I noticed that -c option of initdb behaves in an unexpected
> manner. Identical variable names with variations in letter casing are
> treated as distinct variables.
>
> $ initdb -cwork_mem=100 -cWORK_MEM=1000 -cWork_mem=2000

> The original intention was apparently to overwrite the existing
> line. Furthermore, I surmise that preserving the original letter
> casing is preferable.

Circling back to an old thread, I agree that this seems odd and the original
thread [0] makes no mention of it being intentional.

The patch seems fine to me, the attached version is rebased, pgindented and has
a test case added.

--
Daniel Gustafsson

[0] https://www.postgresql.org/message-id/flat/2844176.1674681919%40sss.pgh.pa.us


Вложения

Re: initdb's -c option behaves wrong way?

От
Alvaro Herrera
Дата:
On 2024-Jan-16, Daniel Gustafsson wrote:

> > On 28 Sep 2023, at 09:49, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> 
> > I noticed that -c option of initdb behaves in an unexpected
> > manner. Identical variable names with variations in letter casing are
> > treated as distinct variables.
> > 
> > $ initdb -cwork_mem=100 -cWORK_MEM=1000 -cWork_mem=2000
> 
> > The original intention was apparently to overwrite the existing
> > line. Furthermore, I surmise that preserving the original letter
> > casing is preferable.
> 
> Circling back to an old thread, I agree that this seems odd and the original
> thread [0] makes no mention of it being intentional.

Hmm, how about raising an error if multiple options are given targetting
the same GUC?

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: initdb's -c option behaves wrong way?

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Hmm, how about raising an error if multiple options are given targetting
> the same GUC?

I don't see any reason to do that.  The underlying configuration
files don't complain about duplicate entries, they just take the
last setting.

            regards, tom lane



Re: initdb's -c option behaves wrong way?

От
Daniel Gustafsson
Дата:
> On 17 Jan 2024, at 18:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>> Hmm, how about raising an error if multiple options are given targetting
>> the same GUC?
>
> I don't see any reason to do that.  The underlying configuration
> files don't complain about duplicate entries, they just take the
> last setting.

Agreed, I think the patch as it stands now where it replaces case insensitive,
while keeping the original casing, is the best path forward.  The issue exist
in 16 as well so I propose a backpatch to there.

--
Daniel Gustafsson




Re: initdb's -c option behaves wrong way?

От
Robert Haas
Дата:
On Wed, Jan 17, 2024 at 2:31 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> Agreed, I think the patch as it stands now where it replaces case insensitive,
> while keeping the original casing, is the best path forward.  The issue exist
> in 16 as well so I propose a backpatch to there.

I like that approach, too. I could go either way on back-patching. It
doesn't seem important, but it probably won't break anything, either.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: initdb's -c option behaves wrong way?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Jan 17, 2024 at 2:31 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>> Agreed, I think the patch as it stands now where it replaces case insensitive,
>> while keeping the original casing, is the best path forward.  The issue exist
>> in 16 as well so I propose a backpatch to there.

> I like that approach, too. I could go either way on back-patching. It
> doesn't seem important, but it probably won't break anything, either.

We just added this switch in 16, so I think backpatching to keep all
the branches consistent is a good idea.

However ... I don't like the patch much.  It seems to have left
the code in a rather random state.  Why, for example, didn't you
keep all the code that constructs the "newline" value together?
I'm also unconvinced by the removal of the "assume there's only
one match" break --- if we need to support multiple matches
I doubt that's sufficient.

            regards, tom lane



Re: initdb's -c option behaves wrong way?

От
Tom Lane
Дата:
I wrote:
> However ... I don't like the patch much.  It seems to have left
> the code in a rather random state.  Why, for example, didn't you
> keep all the code that constructs the "newline" value together?

After thinking about it a bit more, I don't see why you didn't just
s/strncmp/strncasecmp/ and call it good.  The messiness seems to be
a result of your choice to replace the GUC's case as shown in the
file with the case used on the command line, which is not better
IMO.  We don't change our mind about the canonical spelling of a
GUC because somebody varied the case in a SET command.

            regards, tom lane



Re: initdb's -c option behaves wrong way?

От
Daniel Gustafsson
Дата:
> On 17 Jan 2024, at 21:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
>> However ... I don't like the patch much.  It seems to have left
>> the code in a rather random state.  Why, for example, didn't you
>> keep all the code that constructs the "newline" value together?
>
> After thinking about it a bit more, I don't see why you didn't just
> s/strncmp/strncasecmp/ and call it good.  The messiness seems to be
> a result of your choice to replace the GUC's case as shown in the
> file with the case used on the command line, which is not better
> IMO.  We don't change our mind about the canonical spelling of a
> GUC because somebody varied the case in a SET command.

The original patch was preserving the case of the file throwing away the case
from the commandline.  The attached is a slimmed down version which only moves
the assembly of newline to the different cases (replace vs.  new) keeping the
rest of the code intact.  Keeping it in one place gets sort of messy too since
it needs to use different values for a replacement and a new variable.  Not
sure if there is a cleaner approach?

--
Daniel Gustafsson


Вложения

Re: initdb's -c option behaves wrong way?

От
Kyotaro Horiguchi
Дата:
Thank you for upicking this up.

At Wed, 17 Jan 2024 23:47:41 +0100, Daniel Gustafsson <daniel@yesql.se> wrote in 
> > On 17 Jan 2024, at 21:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > 
> > I wrote:
> >> However ... I don't like the patch much.  It seems to have left
> >> the code in a rather random state.  Why, for example, didn't you
> >> keep all the code that constructs the "newline" value together?
> > 
> > After thinking about it a bit more, I don't see why you didn't just
> > s/strncmp/strncasecmp/ and call it good.  The messiness seems to be
> > a result of your choice to replace the GUC's case as shown in the
> > file with the case used on the command line, which is not better
> > IMO.  We don't change our mind about the canonical spelling of a
> > GUC because somebody varied the case in a SET command.
> 
> The original patch was preserving the case of the file throwing away the case
> from the commandline.  The attached is a slimmed down version which only moves
> the assembly of newline to the different cases (replace vs.  new) keeping the
> rest of the code intact.  Keeping it in one place gets sort of messy too since
> it needs to use different values for a replacement and a new variable.  Not
> sure if there is a cleaner approach?

Just to clarify, the current code breaks out after processing the
first matching line. I haven't changed that behavior.  The reason I
moved the rewrite processing code out of the loop was I wanted to
avoid adding more lines that are executed only once into the
loop. However, it is in exchange of additional complexity to remember
the original spelling of the parameter name. Personally, I believe
separating the search and rewrite processing is better, but I'm not
particularly attached to the approach, so either way is fine with me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: initdb's -c option behaves wrong way?

От
Daniel Gustafsson
Дата:
> On 18 Jan 2024, at 05:49, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>
> Thank you for upicking this up.
>
> At Wed, 17 Jan 2024 23:47:41 +0100, Daniel Gustafsson <daniel@yesql.se> wrote in
>>> On 17 Jan 2024, at 21:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>
>>> I wrote:
>>>> However ... I don't like the patch much.  It seems to have left
>>>> the code in a rather random state.  Why, for example, didn't you
>>>> keep all the code that constructs the "newline" value together?
>>>
>>> After thinking about it a bit more, I don't see why you didn't just
>>> s/strncmp/strncasecmp/ and call it good.  The messiness seems to be
>>> a result of your choice to replace the GUC's case as shown in the
>>> file with the case used on the command line, which is not better
>>> IMO.  We don't change our mind about the canonical spelling of a
>>> GUC because somebody varied the case in a SET command.
>>
>> The original patch was preserving the case of the file throwing away the case
>> from the commandline.  The attached is a slimmed down version which only moves
>> the assembly of newline to the different cases (replace vs.  new) keeping the
>> rest of the code intact.  Keeping it in one place gets sort of messy too since
>> it needs to use different values for a replacement and a new variable.  Not
>> sure if there is a cleaner approach?
>
> Just to clarify, the current code breaks out after processing the
> first matching line. I haven't changed that behavior.

Yup.

> The reason I
> moved the rewrite processing code out of the loop was I wanted to
> avoid adding more lines that are executed only once into the
> loop. However, it is in exchange of additional complexity to remember
> the original spelling of the parameter name. Personally, I believe
> separating the search and rewrite processing is better, but I'm not
> particularly attached to the approach, so either way is fine with me.

I'll give some more time for opinions, then I'll go ahead with one of the
patches with a backpatch to v16.

--
Daniel Gustafsson




Re: initdb's -c option behaves wrong way?

От
Tom Lane
Дата:
Daniel Gustafsson <daniel@yesql.se> writes:
> I'll give some more time for opinions, then I'll go ahead with one of the
> patches with a backpatch to v16.

OK, I take back my previous complaint that just using strncasecmp
would be enough --- I was misremembering how the code worked, and
you're right that it would use the spelling from the command line
rather than that from the file.

However, the v3 patch is flat broken.  You can't assume you have
found a match until you've verified that whitespace and '='
appear next --- otherwise, you'll be fooled by a guc_name that
is a prefix of one that appears in the file.  I think the simplest
change that does it correctly is as attached.

Now, since I was the one who wrote the existing code, I freely
concede that I may have too high an opinion of its readability.
Maybe some larger refactoring is appropriate.  But I didn't
find that what you'd done improved the readability.  I'd still
rather keep the newline-assembly code together as much as possible.
Maybe we should do the search part before we build any of newline?

            regards, tom lane

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ac409b0006..6a620c9d5f 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -484,6 +484,7 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value,
     for (i = 0; lines[i]; i++)
     {
         const char *where;
+        const char *namestart;

         /*
          * Look for a line assigning to guc_name.  Typically it will be
@@ -494,15 +495,19 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value,
         where = lines[i];
         while (*where == '#' || isspace((unsigned char) *where))
             where++;
-        if (strncmp(where, guc_name, namelen) != 0)
+        if (strncasecmp(where, guc_name, namelen) != 0)
             continue;
+        namestart = where;
         where += namelen;
         while (isspace((unsigned char) *where))
             where++;
         if (*where != '=')
             continue;

-        /* found it -- append the original comment if any */
+        /* found it -- let's use the canonical casing shown in the file */
+        memcpy(&newline->data[mark_as_comment ? 1 : 0], namestart, namelen);
+
+        /* now append the original comment if any */
         where = strrchr(where, '#');
         if (where)
         {
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 03376cc0f7..413a5eca67 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -199,4 +199,17 @@ command_fails(
 command_fails([ 'initdb', '--no-sync', '--set', 'foo=bar', "$tempdir/dataX" ],
     'fails for invalid --set option');

+# Make sure multiple invocations of -c parameters are added case insensitive
+command_ok(
+    [
+        'initdb', '-cwork_mem=128', '-cWork_Mem=256', '-cWORK_MEM=512',
+        "$tempdir/dataY"
+    ],
+    'multiple -c options with different case');
+
+my $conf = slurp_file("$tempdir/dataY/postgresql.conf");
+ok($conf !~ qr/^WORK_MEM = /m, "WORK_MEM should not be configured");
+ok($conf !~ qr/^Work_Mem = /m, "Work_Mem should not be configured");
+ok($conf =~ qr/^work_mem = 512/m, "work_mem should be in config");
+
 done_testing();

Re: initdb's -c option behaves wrong way?

От
Daniel Gustafsson
Дата:
> On 19 Jan 2024, at 17:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>> I'll give some more time for opinions, then I'll go ahead with one of the
>> patches with a backpatch to v16.
>
> OK, I take back my previous complaint that just using strncasecmp
> would be enough --- I was misremembering how the code worked, and
> you're right that it would use the spelling from the command line
> rather than that from the file.
>
> However, the v3 patch is flat broken.  You can't assume you have
> found a match until you've verified that whitespace and '='
> appear next --- otherwise, you'll be fooled by a guc_name that
> is a prefix of one that appears in the file.  I think the simplest
> change that does it correctly is as attached.

The attached v4 looks good to me, I don't think it moves the needle wrt
readability (ie, no need to move the search).  Feel free to go ahead with that
version, or let me know if you want me to deal with it.

--
Daniel Gustafsson




Re: initdb's -c option behaves wrong way?

От
Daniel Gustafsson
Дата:
> On 22 Jan 2024, at 11:09, Daniel Gustafsson <daniel@yesql.se> wrote:

> Feel free to go ahead with that
> version, or let me know if you want me to deal with it.

I took the liberty to add this to the upcoming CF to make sure we don't lose
track of it.

--
Daniel Gustafsson




Re: initdb's -c option behaves wrong way?

От
Tom Lane
Дата:
Daniel Gustafsson <daniel@yesql.se> writes:
> I took the liberty to add this to the upcoming CF to make sure we don't lose
> track of it.

Thanks for doing that, because the cfbot pointed out a problem:
I should have written pg_strncasecmp not strncasecmp.  If this
version tests cleanly, I'll push it.

            regards, tom lane

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ac409b0006..200b2e8e31 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -484,6 +484,7 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value,
     for (i = 0; lines[i]; i++)
     {
         const char *where;
+        const char *namestart;

         /*
          * Look for a line assigning to guc_name.  Typically it will be
@@ -494,15 +495,19 @@ replace_guc_value(char **lines, const char *guc_name, const char *guc_value,
         where = lines[i];
         while (*where == '#' || isspace((unsigned char) *where))
             where++;
-        if (strncmp(where, guc_name, namelen) != 0)
+        if (pg_strncasecmp(where, guc_name, namelen) != 0)
             continue;
+        namestart = where;
         where += namelen;
         while (isspace((unsigned char) *where))
             where++;
         if (*where != '=')
             continue;

-        /* found it -- append the original comment if any */
+        /* found it -- let's use the canonical casing shown in the file */
+        memcpy(&newline->data[mark_as_comment ? 1 : 0], namestart, namelen);
+
+        /* now append the original comment if any */
         where = strrchr(where, '#');
         if (where)
         {
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 03376cc0f7..413a5eca67 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -199,4 +199,17 @@ command_fails(
 command_fails([ 'initdb', '--no-sync', '--set', 'foo=bar', "$tempdir/dataX" ],
     'fails for invalid --set option');

+# Make sure multiple invocations of -c parameters are added case insensitive
+command_ok(
+    [
+        'initdb', '-cwork_mem=128', '-cWork_Mem=256', '-cWORK_MEM=512',
+        "$tempdir/dataY"
+    ],
+    'multiple -c options with different case');
+
+my $conf = slurp_file("$tempdir/dataY/postgresql.conf");
+ok($conf !~ qr/^WORK_MEM = /m, "WORK_MEM should not be configured");
+ok($conf !~ qr/^Work_Mem = /m, "Work_Mem should not be configured");
+ok($conf =~ qr/^work_mem = 512/m, "work_mem should be in config");
+
 done_testing();

Re: initdb's -c option behaves wrong way?

От
Daniel Gustafsson
Дата:

> On 4 Mar 2024, at 02:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>> I took the liberty to add this to the upcoming CF to make sure we don't lose
>> track of it.
>
> Thanks for doing that, because the cfbot pointed out a problem:
> I should have written pg_strncasecmp not strncasecmp.  If this
> version tests cleanly, I'll push it.

+1, LGTM.

--
Daniel Gustafsson




Re: initdb's -c option behaves wrong way?

От
Kyotaro Horiguchi
Дата:
At Mon, 4 Mar 2024 09:39:39 +0100, Daniel Gustafsson <daniel@yesql.se> wrote in 
> 
> 
> > On 4 Mar 2024, at 02:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > 
> > Daniel Gustafsson <daniel@yesql.se> writes:
> >> I took the liberty to add this to the upcoming CF to make sure we don't lose
> >> track of it.
> > 
> > Thanks for doing that, because the cfbot pointed out a problem:
> > I should have written pg_strncasecmp not strncasecmp.  If this
> > version tests cleanly, I'll push it.
> 
> +1, LGTM.

Thank you for fixing this, Tom!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center