Обсуждение: Lift line-length limit for pg_service.conf

Поиск

Список

Период

Сортировка

Lift line-length limit for pg_service.conf

От

Daniel Gustafsson

Дата:

21 сентября 2020 г., 11:56:00

The pg_service.conf parsing thread [0] made me realize that we have a hardwired
line length of max 256 bytes.  Lifting this would be in line with recent work
for ecpg, pg_regress and pg_hba (784b1ba1a2 and 8f8154a50).  The attached moves
pg_service.conf to use the new pg_get_line_append API and a StringInfo to lift
the restriction. Any reason not to do that while we're lifting other such limits?

cheers ./daniel



[0] CABUevEw_RRRgG2uwsO7CD0TQf+Z=oR=S1=QyifV8D_5hatJ=oQ@mail.gmail.com

Вложения

0001-Remove-arbitrary-line-length-in-pg_service.conf-pars.patch

Re: Lift line-length limit for pg_service.conf

От

Tom Lane

Дата:

21 сентября 2020 г., 15:09:23

Daniel Gustafsson <daniel@yesql.se> writes:
> The pg_service.conf parsing thread [0] made me realize that we have a hardwired
> line length of max 256 bytes.  Lifting this would be in line with recent work
> for ecpg, pg_regress and pg_hba (784b1ba1a2 and 8f8154a50).  The attached moves
> pg_service.conf to use the new pg_get_line_append API and a StringInfo to lift
> the restriction. Any reason not to do that while we're lifting other such limits?

+1.  I'd been thinking of looking around at our fgets calls to see
which ones need this sort of work, but didn't get to it yet.

Personally, I'd avoid depending on StringInfo.cursor here, as the
dependency isn't really buying you anything.  Instead you could do

    line = linebuf.data;

just after the trim-trailing-whitespace loop and then leave the
"ignore leading whitespace too" code as it stands.

Also, the need for inserting the pfree into multiple exit paths kind
of makes me itch.  I wonder if we should change the ending code to
look like

exit:
    fclose(f);
    pfree(linebuf.data);

    return result;

and then the early exit spots would be replaced with "result = x;
goto exit".  (Some of them could use "break", but not all, so
it's probably better to consistently use "goto".)

            regards, tom lane

Re: Lift line-length limit for pg_service.conf

От

Daniel Gustafsson

Дата:

22 сентября 2020 г., 11:50:11

> On 21 Sep 2020, at 17:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>> The pg_service.conf parsing thread [0] made me realize that we have a hardwired
>> line length of max 256 bytes.  Lifting this would be in line with recent work
>> for ecpg, pg_regress and pg_hba (784b1ba1a2 and 8f8154a50).  The attached moves
>> pg_service.conf to use the new pg_get_line_append API and a StringInfo to lift
>> the restriction. Any reason not to do that while we're lifting other such limits?
>
> +1.  I'd been thinking of looking around at our fgets calls to see
> which ones need this sort of work, but didn't get to it yet.

I took a quick look and found the TOC parsing in pg_restore which used a 100
byte buffer and then did some juggling to find EOL for >100b long lines.  There
we wont see a bugreport for exceeded line length, but simplifying the code
seemed like a win to me so included that in the updated patch as well.

> Personally, I'd avoid depending on StringInfo.cursor here, as the
> dependency isn't really buying you anything.

Fair enough, I was mainly a bit excited at finally finding a use for .cursor =)
Fixed.

> Also, the need for inserting the pfree into multiple exit paths kind
> of makes me itch.  I wonder if we should change the ending code to
> look like
>
> exit:
>     fclose(f);
>     pfree(linebuf.data);
>
>     return result;
>
> and then the early exit spots would be replaced with "result = x;
> goto exit".  (Some of them could use "break", but not all, so
> it's probably better to consistently use "goto".)

Agreed, fixed.  I was a bit tempted to use something less magic and more
descriptive than result = 3; but in the end opted for keeping changes to one
thing at a time.

cheers ./daniel

Вложения

0001-Refactor-pg_service.conf-and-pg_restore-TOC-file-par.patch

Re: Lift line-length limit for pg_service.conf

От

Tom Lane

Дата:

22 сентября 2020 г., 20:07:10

Daniel Gustafsson <daniel@yesql.se> writes:
> [ 0001-Refactor-pg_service.conf-and-pg_restore-TOC-file-par.patch ]

I reviewed this and noticed that you'd missed adding resetStringInfo
calls in some code paths, which made me realize that while
pg_get_line_append() is great for its original customer in hba.c,
it kinda sucks for most other callers.  Having to remember to do
resetStringInfo in every path through a loop is too error-prone,
and it's unnecessary.  So I made another subroutine that just adds
that step, and updated the existing callers that could use it.

Pushed with that correction.

            regards, tom lane

Re: Lift line-length limit for pg_service.conf

От

Tom Lane

Дата:

22 сентября 2020 г., 21:24:11

In the same vein, here's a patch to remove the hard-coded line length
limit for tsearch dictionary files.

            regards, tom lane

diff --git a/src/backend/tsearch/dict_thesaurus.c b/src/backend/tsearch/dict_thesaurus.c
index cb0835982d..64c979086d 100644
--- a/src/backend/tsearch/dict_thesaurus.c
+++ b/src/backend/tsearch/dict_thesaurus.c
@@ -286,11 +286,6 @@ thesaurusRead(const char *filename, DictThesaurus *d)
                     (errcode(ERRCODE_CONFIG_FILE_ERROR),
                      errmsg("unexpected end of line")));

-        /*
-         * Note: currently, tsearch_readline can't return lines exceeding 4KB,
-         * so overflow of the word counts is impossible.  But that may not
-         * always be true, so let's check.
-         */
         if (nwrd != (uint16) nwrd || posinsubst != (uint16) posinsubst)
             ereport(ERROR,
                     (errcode(ERRCODE_CONFIG_FILE_ERROR),
diff --git a/src/backend/tsearch/ts_locale.c b/src/backend/tsearch/ts_locale.c
index a916dd6cb6..247180d56e 100644
--- a/src/backend/tsearch/ts_locale.c
+++ b/src/backend/tsearch/ts_locale.c
@@ -14,6 +14,8 @@
 #include "postgres.h"

 #include "catalog/pg_collation.h"
+#include "common/string.h"
+#include "lib/stringinfo.h"
 #include "storage/fd.h"
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_public.h"
@@ -204,29 +206,41 @@ tsearch_readline_callback(void *arg)
 char *
 t_readline(FILE *fp)
 {
+    StringInfoData buf;
     int            len;
     char       *recoded;
-    char        buf[4096];        /* lines must not be longer than this */

-    if (fgets(buf, sizeof(buf), fp) == NULL)
+    initStringInfo(&buf);
+
+    if (!pg_get_line_buf(fp, &buf))
+    {
+        pfree(buf.data);
         return NULL;
+    }

-    len = strlen(buf);
+    len = buf.len;

     /* Make sure the input is valid UTF-8 */
-    (void) pg_verify_mbstr(PG_UTF8, buf, len, false);
+    (void) pg_verify_mbstr(PG_UTF8, buf.data, len, false);

     /* And convert */
-    recoded = pg_any_to_server(buf, len, PG_UTF8);
-    if (recoded == buf)
+    recoded = pg_any_to_server(buf.data, len, PG_UTF8);
+    if (recoded == buf.data)
     {
         /*
          * conversion didn't pstrdup, so we must. We can use the length of the
          * original string, because no conversion was done.
+         *
+         * Note: it might seem attractive to just return buf.data, and in most
+         * usages that'd be fine.  But a few callers save the returned string
+         * as long-term data, so returning a palloc chunk that's bigger than
+         * necessary is a bad idea.
          */
         recoded = pnstrdup(recoded, len);
     }

+    pfree(buf.data);
+
     return recoded;
 }

Re: Lift line-length limit for pg_service.conf

От

Daniel Gustafsson

Дата:

22 сентября 2020 г., 21:35:37

> On 22 Sep 2020, at 23:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> In the same vein, here's a patch to remove the hard-coded line length
> limit for tsearch dictionary files.

LGTM.  I like the comment on why not to return buf.data directly, that detail
would be easy to miss.

cheers ./daniel

Re: Lift line-length limit for pg_service.conf

От

Tom Lane

Дата:

22 сентября 2020 г., 22:09:55

Daniel Gustafsson <daniel@yesql.se> writes:
>> On 22 Sep 2020, at 23:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In the same vein, here's a patch to remove the hard-coded line length
>> limit for tsearch dictionary files.

> LGTM.  I like the comment on why not to return buf.data directly, that detail
> would be easy to miss.

Yeah.  In a quick scan, it appears that there is only one caller that
tries to save the result directly.  So I considered making that caller
do a pstrdup and eliminating the extra thrashing in t_readline itself.
But it seemed too fragile; somebody would get it wrong and then have
excess space consumption for their dictionary.

            regards, tom lane

Re: Lift line-length limit for pg_service.conf

От

Tom Lane

Дата:

23 сентября 2020 г., 01:14:28

I wrote:
> Yeah.  In a quick scan, it appears that there is only one caller that
> tries to save the result directly.  So I considered making that caller
> do a pstrdup and eliminating the extra thrashing in t_readline itself.
> But it seemed too fragile; somebody would get it wrong and then have
> excess space consumption for their dictionary.

I had a better idea: if we get rid of t_readline() itself, which has
been deprecated for years anyway, we can have the calling layer
tsearch_readline_xxx maintain a StringInfo across the whole file
read process and thus get rid of alloc/free cycles for the StringInfo.

However ... while working on that, I realized that the usage of
the "curline" field in that code is completely unsafe.  We save
a pointer to the result of tsearch_readline() for possible use
by the error context callback, *but the caller is likely to free that
string at some point*.  So there is a window where an error would
result in trying to print already-freed data.

It looks like all of the core-code dictionaries free the result
string at the bottoms of their loops, so that the window for
trouble is pretty much empty.  But contrib/dict_xsyn doesn't
do it like that, and so it's surely at risk.  Any external
dictionaries that have copied that code would also be at risk.

So the attached adds a pstrdup/pfree to ensure that "curline"
has its own storage, putting us right back at two palloc/pfree
cycles per line.  I don't think there's a lot of choice though;
in fact, I'm leaning to the idea that we need to back-patch
that part of this.  The odds of trouble in a production build
probably aren't high, but still...

            regards, tom lane

diff --git a/src/backend/tsearch/dict_thesaurus.c b/src/backend/tsearch/dict_thesaurus.c
index cb0835982d..64c979086d 100644
--- a/src/backend/tsearch/dict_thesaurus.c
+++ b/src/backend/tsearch/dict_thesaurus.c
@@ -286,11 +286,6 @@ thesaurusRead(const char *filename, DictThesaurus *d)
                     (errcode(ERRCODE_CONFIG_FILE_ERROR),
                      errmsg("unexpected end of line")));

-        /*
-         * Note: currently, tsearch_readline can't return lines exceeding 4KB,
-         * so overflow of the word counts is impossible.  But that may not
-         * always be true, so let's check.
-         */
         if (nwrd != (uint16) nwrd || posinsubst != (uint16) posinsubst)
             ereport(ERROR,
                     (errcode(ERRCODE_CONFIG_FILE_ERROR),
diff --git a/src/backend/tsearch/ts_locale.c b/src/backend/tsearch/ts_locale.c
index a916dd6cb6..c85939631e 100644
--- a/src/backend/tsearch/ts_locale.c
+++ b/src/backend/tsearch/ts_locale.c
@@ -14,6 +14,7 @@
 #include "postgres.h"

 #include "catalog/pg_collation.h"
+#include "common/string.h"
 #include "storage/fd.h"
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_public.h"
@@ -128,6 +129,7 @@ tsearch_readline_begin(tsearch_readline_state *stp,
         return false;
     stp->filename = filename;
     stp->lineno = 0;
+    initStringInfo(&stp->buf);
     stp->curline = NULL;
     /* Setup error traceback support for ereport() */
     stp->cb.callback = tsearch_readline_callback;
@@ -146,11 +148,44 @@ char *
 tsearch_readline(tsearch_readline_state *stp)
 {
     char       *result;
+    int            len;

+    /* Advance line number to use in error reports */
     stp->lineno++;
-    stp->curline = NULL;
-    result = t_readline(stp->fp);
-    stp->curline = result;
+
+    /* Clear curline, it's no longer relevant */
+    if (stp->curline)
+    {
+        pfree(stp->curline);
+        stp->curline = NULL;
+    }
+
+    /* Collect next line, if there is one */
+    if (!pg_get_line_buf(stp->fp, &stp->buf))
+        return NULL;
+
+    /* Make sure the input is valid UTF-8 */
+    len = stp->buf.len;
+    (void) pg_verify_mbstr(PG_UTF8, stp->buf.data, len, false);
+
+    /* And convert */
+    result = pg_any_to_server(stp->buf.data, len, PG_UTF8);
+    if (result == stp->buf.data)
+    {
+        /*
+         * Conversion didn't pstrdup, so we must.  We can use the length of
+         * the original string, because no conversion was done.
+         */
+        result = pnstrdup(result, len);
+    }
+
+    /*
+     * Save a copy of the line for possible use in error reports.  (We cannot
+     * just save "result", since it's likely to get pfree'd at some point by
+     * the caller; an error after that would try to access freed data.)
+     */
+    stp->curline = pstrdup(result);
+
     return result;
 }

@@ -160,7 +195,16 @@ tsearch_readline(tsearch_readline_state *stp)
 void
 tsearch_readline_end(tsearch_readline_state *stp)
 {
+    /* Suppress use of curline in any error reported below */
+    if (stp->curline)
+    {
+        pfree(stp->curline);
+        stp->curline = NULL;
+    }
+
+    pfree(stp->buf.data);
     FreeFile(stp->fp);
+
     /* Pop the error context stack */
     error_context_stack = stp->cb.previous;
 }
@@ -176,8 +220,7 @@ tsearch_readline_callback(void *arg)

     /*
      * We can't include the text of the config line for errors that occur
-     * during t_readline() itself.  This is only partly a consequence of our
-     * arms-length use of that routine: the major cause of such errors is
+     * during tsearch_readline() itself.  The major cause of such errors is
      * encoding violations, and we daren't try to print error messages
      * containing badly-encoded data.
      */
@@ -193,43 +236,6 @@ tsearch_readline_callback(void *arg)
 }


-/*
- * Read the next line from a tsearch data file (expected to be in UTF-8), and
- * convert it to database encoding if needed. The returned string is palloc'd.
- * NULL return means EOF.
- *
- * Note: direct use of this function is now deprecated.  Go through
- * tsearch_readline() to provide better error reporting.
- */
-char *
-t_readline(FILE *fp)
-{
-    int            len;
-    char       *recoded;
-    char        buf[4096];        /* lines must not be longer than this */
-
-    if (fgets(buf, sizeof(buf), fp) == NULL)
-        return NULL;
-
-    len = strlen(buf);
-
-    /* Make sure the input is valid UTF-8 */
-    (void) pg_verify_mbstr(PG_UTF8, buf, len, false);
-
-    /* And convert */
-    recoded = pg_any_to_server(buf, len, PG_UTF8);
-    if (recoded == buf)
-    {
-        /*
-         * conversion didn't pstrdup, so we must. We can use the length of the
-         * original string, because no conversion was done.
-         */
-        recoded = pnstrdup(recoded, len);
-    }
-
-    return recoded;
-}
-
 /*
  * lowerstr --- fold null-terminated string to lower case
  *
diff --git a/src/include/tsearch/ts_locale.h b/src/include/tsearch/ts_locale.h
index cc4bd9ab20..3c2e635f86 100644
--- a/src/include/tsearch/ts_locale.h
+++ b/src/include/tsearch/ts_locale.h
@@ -15,6 +15,7 @@
 #include <ctype.h>
 #include <limits.h>

+#include "lib/stringinfo.h"
 #include "mb/pg_wchar.h"
 #include "utils/pg_locale.h"

@@ -33,7 +34,8 @@ typedef struct
     FILE       *fp;
     const char *filename;
     int            lineno;
-    char       *curline;
+    StringInfoData buf;            /* current input line, in UTF-8 */
+    char       *curline;        /* input line in DB's encoding, or NULL */
     ErrorContextCallback cb;
 } tsearch_readline_state;

@@ -57,6 +59,4 @@ extern bool tsearch_readline_begin(tsearch_readline_state *stp,
 extern char *tsearch_readline(tsearch_readline_state *stp);
 extern void tsearch_readline_end(tsearch_readline_state *stp);

-extern char *t_readline(FILE *fp);
-
 #endif                            /* __TSLOCALE_H__ */

Re: Lift line-length limit for pg_service.conf

От

Tom Lane

Дата:

23 сентября 2020 г., 19:33:51

I wrote:
> So the attached adds a pstrdup/pfree to ensure that "curline"
> has its own storage, putting us right back at two palloc/pfree
> cycles per line.  I don't think there's a lot of choice though;
> in fact, I'm leaning to the idea that we need to back-patch
> that part of this.  The odds of trouble in a production build
> probably aren't high, but still...

So I did that, but while looking at the main patch I realized that
a few things were still being left on the table:

1. We can save a palloc/pfree cycle in the case where no encoding
conversion need be performed, by allowing "curline" to point at
the StringInfo buffer instead of necessarily being a separate
palloc'd string.  (This seems safe since no other code should
manipulate the StringInfo before the next tsearch_readline call;
so we can't get confused about whether "curline" has its own storage.)

2. In the case where encoding conversion is performed, we still have
to pstrdup the result to have a safe copy for "curline".  But I
realized that we're making a poor choice of which copy to return to
the caller.  The output of encoding conversion is likely to be a good
bit bigger than necessary, cf. pg_do_encoding_conversion.  So if the
caller is one that saves the output string directly into a long-lived
dictionary structure, this wastes space.  We should return the pstrdup
result instead, and keep the conversion result as "curline", where
we'll free it next time through.

3. AFAICT, it's completely useless for tsearch_readline to call
pg_verify_mbstr, because pg_any_to_server will either do that exact
thing itself, or verify the input encoding as part of conversion.
Some quick testing says that you don't even get different error
messages.  So we should just drop that step.  (It's likely that the
separate call was actually needed when this code was written; I think
we tightened up the expectations for verification within encoding
conversion somewhere along the line.  But we demonstrably don't need
it now.)

So those considerations lead me to the attached.

            regards, tom lane

diff --git a/src/backend/tsearch/dict_thesaurus.c b/src/backend/tsearch/dict_thesaurus.c
index cb0835982d..64c979086d 100644
--- a/src/backend/tsearch/dict_thesaurus.c
+++ b/src/backend/tsearch/dict_thesaurus.c
@@ -286,11 +286,6 @@ thesaurusRead(const char *filename, DictThesaurus *d)
                     (errcode(ERRCODE_CONFIG_FILE_ERROR),
                      errmsg("unexpected end of line")));

-        /*
-         * Note: currently, tsearch_readline can't return lines exceeding 4KB,
-         * so overflow of the word counts is impossible.  But that may not
-         * always be true, so let's check.
-         */
         if (nwrd != (uint16) nwrd || posinsubst != (uint16) posinsubst)
             ereport(ERROR,
                     (errcode(ERRCODE_CONFIG_FILE_ERROR),
diff --git a/src/backend/tsearch/ts_locale.c b/src/backend/tsearch/ts_locale.c
index 9b199d0ac1..deaafe57e6 100644
--- a/src/backend/tsearch/ts_locale.c
+++ b/src/backend/tsearch/ts_locale.c
@@ -14,6 +14,7 @@
 #include "postgres.h"

 #include "catalog/pg_collation.h"
+#include "common/string.h"
 #include "storage/fd.h"
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_public.h"
@@ -128,6 +129,7 @@ tsearch_readline_begin(tsearch_readline_state *stp,
         return false;
     stp->filename = filename;
     stp->lineno = 0;
+    initStringInfo(&stp->buf);
     stp->curline = NULL;
     /* Setup error traceback support for ereport() */
     stp->cb.callback = tsearch_readline_callback;
@@ -145,7 +147,7 @@ tsearch_readline_begin(tsearch_readline_state *stp,
 char *
 tsearch_readline(tsearch_readline_state *stp)
 {
-    char       *result;
+    char       *recoded;

     /* Advance line number to use in error reports */
     stp->lineno++;
@@ -153,23 +155,31 @@ tsearch_readline(tsearch_readline_state *stp)
     /* Clear curline, it's no longer relevant */
     if (stp->curline)
     {
-        pfree(stp->curline);
+        if (stp->curline != stp->buf.data)
+            pfree(stp->curline);
         stp->curline = NULL;
     }

     /* Collect next line, if there is one */
-    result = t_readline(stp->fp);
-    if (!result)
+    if (!pg_get_line_buf(stp->fp, &stp->buf))
         return NULL;

+    /* Validate the input as UTF-8, then convert to DB encoding if needed */
+    recoded = pg_any_to_server(stp->buf.data, stp->buf.len, PG_UTF8);
+
+    /* Save the correctly-encoded string for possible error reports */
+    stp->curline = recoded;        /* might be equal to buf.data */
+
     /*
-     * Save a copy of the line for possible use in error reports.  (We cannot
-     * just save "result", since it's likely to get pfree'd at some point by
-     * the caller; an error after that would try to access freed data.)
+     * We always return a freshly pstrdup'd string.  This is clearly necessary
+     * if pg_any_to_server() returned buf.data, and it is desirable even if
+     * conversion occurred, because the palloc'd result of conversion is
+     * likely to be much bigger than minimally necessary.  Since some callers
+     * save the result string directly into a long-lived dictionary structure,
+     * we don't want it to be a larger palloc chunk than necessary.  We'll
+     * reclaim the conversion result on the next call.
      */
-    stp->curline = pstrdup(result);
-
-    return result;
+    return pstrdup(recoded);
 }

 /*
@@ -181,11 +191,13 @@ tsearch_readline_end(tsearch_readline_state *stp)
     /* Suppress use of curline in any error reported below */
     if (stp->curline)
     {
-        pfree(stp->curline);
+        if (stp->curline != stp->buf.data)
+            pfree(stp->curline);
         stp->curline = NULL;
     }

     /* Release other resources */
+    pfree(stp->buf.data);
     FreeFile(stp->fp);

     /* Pop the error context stack */
@@ -203,8 +215,7 @@ tsearch_readline_callback(void *arg)

     /*
      * We can't include the text of the config line for errors that occur
-     * during t_readline() itself.  This is only partly a consequence of our
-     * arms-length use of that routine: the major cause of such errors is
+     * during tsearch_readline() itself.  The major cause of such errors is
      * encoding violations, and we daren't try to print error messages
      * containing badly-encoded data.
      */
@@ -220,43 +231,6 @@ tsearch_readline_callback(void *arg)
 }


-/*
- * Read the next line from a tsearch data file (expected to be in UTF-8), and
- * convert it to database encoding if needed. The returned string is palloc'd.
- * NULL return means EOF.
- *
- * Note: direct use of this function is now deprecated.  Go through
- * tsearch_readline() to provide better error reporting.
- */
-char *
-t_readline(FILE *fp)
-{
-    int            len;
-    char       *recoded;
-    char        buf[4096];        /* lines must not be longer than this */
-
-    if (fgets(buf, sizeof(buf), fp) == NULL)
-        return NULL;
-
-    len = strlen(buf);
-
-    /* Make sure the input is valid UTF-8 */
-    (void) pg_verify_mbstr(PG_UTF8, buf, len, false);
-
-    /* And convert */
-    recoded = pg_any_to_server(buf, len, PG_UTF8);
-    if (recoded == buf)
-    {
-        /*
-         * conversion didn't pstrdup, so we must. We can use the length of the
-         * original string, because no conversion was done.
-         */
-        recoded = pnstrdup(recoded, len);
-    }
-
-    return recoded;
-}
-
 /*
  * lowerstr --- fold null-terminated string to lower case
  *
diff --git a/src/include/tsearch/ts_locale.h b/src/include/tsearch/ts_locale.h
index cc4bd9ab20..f1669fda21 100644
--- a/src/include/tsearch/ts_locale.h
+++ b/src/include/tsearch/ts_locale.h
@@ -15,6 +15,7 @@
 #include <ctype.h>
 #include <limits.h>

+#include "lib/stringinfo.h"
 #include "mb/pg_wchar.h"
 #include "utils/pg_locale.h"

@@ -33,7 +34,9 @@ typedef struct
     FILE       *fp;
     const char *filename;
     int            lineno;
-    char       *curline;
+    StringInfoData buf;            /* current input line, in UTF-8 */
+    char       *curline;        /* current input line, in DB's encoding */
+    /* curline may be NULL, or equal to buf.data, or a palloc'd string */
     ErrorContextCallback cb;
 } tsearch_readline_state;

@@ -57,6 +60,4 @@ extern bool tsearch_readline_begin(tsearch_readline_state *stp,
 extern char *tsearch_readline(tsearch_readline_state *stp);
 extern void tsearch_readline_end(tsearch_readline_state *stp);

-extern char *t_readline(FILE *fp);
-
 #endif                            /* __TSLOCALE_H__ */

Re: Lift line-length limit for pg_service.conf

От

Daniel Gustafsson

Дата:

23 сентября 2020 г., 22:47:32

> On 23 Sep 2020, at 21:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
>> So the attached adds a pstrdup/pfree to ensure that "curline"
>> has its own storage, putting us right back at two palloc/pfree
>> cycles per line.  I don't think there's a lot of choice though;
>> in fact, I'm leaning to the idea that we need to back-patch
>> that part of this.  The odds of trouble in a production build
>> probably aren't high, but still...
>
> So I did that, but while looking at the main patch I realized that
> a few things were still being left on the table:

> 2. In the case where encoding conversion is performed, we still have
> to pstrdup the result to have a safe copy for "curline".  But I
> realized that we're making a poor choice of which copy to return to
> the caller.  The output of encoding conversion is likely to be a good
> bit bigger than necessary, cf. pg_do_encoding_conversion.  So if the
> caller is one that saves the output string directly into a long-lived
> dictionary structure, this wastes space.  We should return the pstrdup
> result instead, and keep the conversion result as "curline", where
> we'll free it next time through.

I wonder if we have other callsites of pg_any_to_server which could benefit
from lowering the returned allocation, a quick look didn't spot any but today
has become yesterday here and tiredness might interfere.

> So those considerations lead me to the attached.

Eyeing memory used by callers of tsearch_readline validates your observations,
I don't see any case where this patch isn't safe and nothing sticks out.  +1 on
this, nice one!

cheers ./daniel

Re: Lift line-length limit for pg_service.conf

От

Tom Lane

Дата:

23 сентября 2020 г., 23:37:54

Daniel Gustafsson <daniel@yesql.se> writes:
> On 23 Sep 2020, at 21:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 2. In the case where encoding conversion is performed, we still have
>> to pstrdup the result to have a safe copy for "curline".  But I
>> realized that we're making a poor choice of which copy to return to
>> the caller.  The output of encoding conversion is likely to be a good
>> bit bigger than necessary, cf. pg_do_encoding_conversion.  So if the
>> caller is one that saves the output string directly into a long-lived
>> dictionary structure, this wastes space.  We should return the pstrdup
>> result instead, and keep the conversion result as "curline", where
>> we'll free it next time through.

> I wonder if we have other callsites of pg_any_to_server which could benefit
> from lowering the returned allocation, a quick look didn't spot any but today
> has become yesterday here and tiredness might interfere.

After looking more closely, I've realized that actually none of the
existing core-code callers will save the returned string directly.
readstoplist() could do so, depending on what "wordop" is, but
all its existing callers pass lowerstr() which will always make a
new output string.  (Which itself could be a bit bloated :-()

So the concern I expressed above is really just hypothetical.
Still, the code is simpler this way and no slower, so I still
think it's an improvement.

(The bigger picture here is that the API for dictionary init
methods is pretty seriously misdesigned from a memory-consumption
standpoint.  Running the entire init process in the dictionary's
long-lived context goes against everything we've learned about
avoiding memory leaks.  We should run that code in a short-lived
command execution context, and explicitly copy just the data we
want into the long-lived context.  But changing that would be
a pretty big deal, breaking third-party dictionaries.  So I'm
not sure it's enough of a problem to justify the change.)

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Lift line-length limit for pg_service.conf

Вложения

Вложения