Обсуждение: pgsql: Refactor pg_get_line() to expose an alternative StringInfo-based

Поиск
Список
Период
Сортировка

pgsql: Refactor pg_get_line() to expose an alternative StringInfo-based

От
Tom Lane
Дата:
Refactor pg_get_line() to expose an alternative StringInfo-based API.

Letting the caller provide a StringInfo to read into is helpful when
the caller needs to merge lines or otherwise modify the data after
it's been read.  Notably, now the code added by commit 8f8154a50
can use pg_get_line_append() instead of having its own copy of that
logic.  A follow-on commit will also make use of this.

Also, since StringInfo buffers are a minimum of 1KB long, blindly
using pg_get_line() in a loop can eat a lot more memory than one would
expect.  I discovered for instance that commit e0f05cd5b caused initdb
to consume circa 10MB to read postgres.bki, even though that's under
1MB worth of data.  A less memory-hungry alternative is to re-use the
same StringInfo for all lines and pg_strdup the results.

Discussion: https://postgr.es/m/1315832.1599345736@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/8e3c58e6e459b285d37edb6129e412ed25cd90c1

Modified Files
--------------
src/backend/libpq/hba.c     | 40 +++++++++-----------------
src/bin/initdb/initdb.c     | 12 ++++++--
src/common/pg_get_line.c    | 70 +++++++++++++++++++++++++++++++--------------
src/include/common/string.h |  3 ++
4 files changed, 75 insertions(+), 50 deletions(-)


Re: pgsql: Refactor pg_get_line() to expose an alternative StringInfo-based

От
Pavel Stehule
Дата:
Hi

ne 6. 9. 2020 v 20:13 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Refactor pg_get_line() to expose an alternative StringInfo-based API.

Letting the caller provide a StringInfo to read into is helpful when
the caller needs to merge lines or otherwise modify the data after
it's been read.  Notably, now the code added by commit 8f8154a50
can use pg_get_line_append() instead of having its own copy of that
logic.  A follow-on commit will also make use of this.

Also, since StringInfo buffers are a minimum of 1KB long, blindly
using pg_get_line() in a loop can eat a lot more memory than one would
expect.  I discovered for instance that commit e0f05cd5b caused initdb
to consume circa 10MB to read postgres.bki, even though that's under
1MB worth of data.  A less memory-hungry alternative is to re-use the
same StringInfo for all lines and pg_strdup the results.


I tried  to reuse this new API in pg_dump.c, and I had a problem with private struct StringInfo.

Inside initdb.c there is a access to this structure via #include "access/xlog_internal.h" and it is little bit unreadable

maybe there should be included directly with #include "lib/stringinfo.h" ?

Regards

Pavel


 
Discussion: https://postgr.es/m/1315832.1599345736@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/8e3c58e6e459b285d37edb6129e412ed25cd90c1

Modified Files
--------------
src/backend/libpq/hba.c     | 40 +++++++++-----------------
src/bin/initdb/initdb.c     | 12 ++++++--
src/common/pg_get_line.c    | 70 +++++++++++++++++++++++++++++++--------------
src/include/common/string.h |  3 ++
4 files changed, 75 insertions(+), 50 deletions(-)

Re: pgsql: Refactor pg_get_line() to expose an alternative StringInfo-based

От
Alvaro Herrera
Дата:
On 2020-Sep-07, Pavel Stehule wrote:

> I tried  to reuse this new API in pg_dump.c, and I had a problem with
> private struct StringInfo.

> maybe there should be included directly with #include "lib/stringinfo.h" ?

That's the right thing to do, yes.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Refactor pg_get_line() to expose an alternative StringInfo-based

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2020-Sep-07, Pavel Stehule wrote:
>> I tried  to reuse this new API in pg_dump.c, and I had a problem with
>> private struct StringInfo.

>> maybe there should be included directly with #include "lib/stringinfo.h" ?

> That's the right thing to do, yes.

Yeah, if you want to use pg_get_line_append, you need to include
both those headers.

            regards, tom lane



Re: pgsql: Refactor pg_get_line() to expose an alternative StringInfo-based

От
Pavel Stehule
Дата:


po 7. 9. 2020 v 16:48 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2020-Sep-07, Pavel Stehule wrote:
>> I tried  to reuse this new API in pg_dump.c, and I had a problem with
>> private struct StringInfo.

>> maybe there should be included directly with #include "lib/stringinfo.h" ?

> That's the right thing to do, yes.

Yeah, if you want to use pg_get_line_append, you need to include
both those headers.

here is a patch

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 37e0d7ceab..0dcc518beb 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -72,6 +72,7 @@
 #include "fe_utils/string_utils.h"
 #include "getaddrinfo.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 
Regards

Pavel




                        regards, tom lane

Re: pgsql: Refactor pg_get_line() to expose an alternative StringInfo-based

От
Tom Lane
Дата:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> here is a patch

What exactly is this supposed to fix?

I didn't bother tracking down exactly where initdb.c is getting
stringinfo.h from, but it clearly does import it somewhere in
our rat's nest of headers.

            regards, tom lane



Re: pgsql: Refactor pg_get_line() to expose an alternative StringInfo-based

От
Pavel Stehule
Дата:


po 7. 9. 2020 v 18:58 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com> napsal:
On 2020-Sep-07, Tom Lane wrote:

> Pavel Stehule <pavel.stehule@gmail.com> writes:
> > here is a patch
>
> What exactly is this supposed to fix?
>
> I didn't bother tracking down exactly where initdb.c is getting
> stringinfo.h from, but it clearly does import it somewhere in
> our rat's nest of headers.

You can see it here: https://doxygen.postgresql.org/initdb_8c.html
It's through xlog_internal.h (not that it matters much).

We do the indirect-header thing all over the place.  There's no point in
trying to avoid it completely.

ok

Pavel


--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Refactor pg_get_line() to expose an alternative StringInfo-based

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2020-Sep-07, Tom Lane wrote:
>> I didn't bother tracking down exactly where initdb.c is getting
>> stringinfo.h from, but it clearly does import it somewhere in
>> our rat's nest of headers.

> You can see it here: https://doxygen.postgresql.org/initdb_8c.html
> It's through xlog_internal.h (not that it matters much).

If there were something to be on the warpath about, it's that initdb
is pulling in such an obviously backend-only header as that.  I wonder
if we should refactor to fix that.

Quick experimentation says that the symbols initdb actually needs out
of that header are

DEFAULT_MIN_WAL_SEGS
DEFAULT_MAX_WAL_SEGS
IsValidWalSegSize()

            regards, tom lane



Re: pgsql: Refactor pg_get_line() to expose an alternative StringInfo-based

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2020-Sep-07, Tom Lane wrote:
>> If there were something to be on the warpath about, it's that initdb
>> is pulling in such an obviously backend-only header as that.  I wonder
>> if we should refactor to fix that.

> Hmm, if we wanted to clean things up I think we should look at what
> other things from xlog_internal.h are being used by other frontend
> programs;

Yeah, I was thinking along the same lines, but hadn't done the legwork
to see what else is needed by frontend code.

            regards, tom lane