Обсуждение: pgsql: Refactor pg_get_line() to expose an alternative StringInfo-based
Refactor pg_get_line() to expose an alternative StringInfo-based API. Letting the caller provide a StringInfo to read into is helpful when the caller needs to merge lines or otherwise modify the data after it's been read. Notably, now the code added by commit 8f8154a50 can use pg_get_line_append() instead of having its own copy of that logic. A follow-on commit will also make use of this. Also, since StringInfo buffers are a minimum of 1KB long, blindly using pg_get_line() in a loop can eat a lot more memory than one would expect. I discovered for instance that commit e0f05cd5b caused initdb to consume circa 10MB to read postgres.bki, even though that's under 1MB worth of data. A less memory-hungry alternative is to re-use the same StringInfo for all lines and pg_strdup the results. Discussion: https://postgr.es/m/1315832.1599345736@sss.pgh.pa.us Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/8e3c58e6e459b285d37edb6129e412ed25cd90c1 Modified Files -------------- src/backend/libpq/hba.c | 40 +++++++++----------------- src/bin/initdb/initdb.c | 12 ++++++-- src/common/pg_get_line.c | 70 +++++++++++++++++++++++++++++++-------------- src/include/common/string.h | 3 ++ 4 files changed, 75 insertions(+), 50 deletions(-)
Hi
ne 6. 9. 2020 v 20:13 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Refactor pg_get_line() to expose an alternative StringInfo-based API.
Letting the caller provide a StringInfo to read into is helpful when
the caller needs to merge lines or otherwise modify the data after
it's been read. Notably, now the code added by commit 8f8154a50
can use pg_get_line_append() instead of having its own copy of that
logic. A follow-on commit will also make use of this.
Also, since StringInfo buffers are a minimum of 1KB long, blindly
using pg_get_line() in a loop can eat a lot more memory than one would
expect. I discovered for instance that commit e0f05cd5b caused initdb
to consume circa 10MB to read postgres.bki, even though that's under
1MB worth of data. A less memory-hungry alternative is to re-use the
same StringInfo for all lines and pg_strdup the results.
I tried to reuse this new API in pg_dump.c, and I had a problem with private struct StringInfo.
Inside initdb.c there is a access to this structure via #include "access/xlog_internal.h" and it is little bit unreadable
maybe there should be included directly with #include "lib/stringinfo.h" ?
Regards
Pavel
Discussion: https://postgr.es/m/1315832.1599345736@sss.pgh.pa.us
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/8e3c58e6e459b285d37edb6129e412ed25cd90c1
Modified Files
--------------
src/backend/libpq/hba.c | 40 +++++++++-----------------
src/bin/initdb/initdb.c | 12 ++++++--
src/common/pg_get_line.c | 70 +++++++++++++++++++++++++++++++--------------
src/include/common/string.h | 3 ++
4 files changed, 75 insertions(+), 50 deletions(-)
On 2020-Sep-07, Pavel Stehule wrote: > I tried to reuse this new API in pg_dump.c, and I had a problem with > private struct StringInfo. > maybe there should be included directly with #include "lib/stringinfo.h" ? That's the right thing to do, yes. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Sep-07, Pavel Stehule wrote: >> I tried to reuse this new API in pg_dump.c, and I had a problem with >> private struct StringInfo. >> maybe there should be included directly with #include "lib/stringinfo.h" ? > That's the right thing to do, yes. Yeah, if you want to use pg_get_line_append, you need to include both those headers. regards, tom lane
po 7. 9. 2020 v 16:48 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2020-Sep-07, Pavel Stehule wrote:
>> I tried to reuse this new API in pg_dump.c, and I had a problem with
>> private struct StringInfo.
>> maybe there should be included directly with #include "lib/stringinfo.h" ?
> That's the right thing to do, yes.
Yeah, if you want to use pg_get_line_append, you need to include
both those headers.
here is a patch
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 37e0d7ceab..0dcc518beb 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -72,6 +72,7 @@
#include "fe_utils/string_utils.h"
#include "getaddrinfo.h"
#include "getopt_long.h"
+#include "lib/stringinfo.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
index 37e0d7ceab..0dcc518beb 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -72,6 +72,7 @@
#include "fe_utils/string_utils.h"
#include "getaddrinfo.h"
#include "getopt_long.h"
+#include "lib/stringinfo.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
Regards
Pavel
regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: > here is a patch What exactly is this supposed to fix? I didn't bother tracking down exactly where initdb.c is getting stringinfo.h from, but it clearly does import it somewhere in our rat's nest of headers. regards, tom lane
po 7. 9. 2020 v 18:58 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com> napsal:
On 2020-Sep-07, Tom Lane wrote:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
> > here is a patch
>
> What exactly is this supposed to fix?
>
> I didn't bother tracking down exactly where initdb.c is getting
> stringinfo.h from, but it clearly does import it somewhere in
> our rat's nest of headers.
You can see it here: https://doxygen.postgresql.org/initdb_8c.html
It's through xlog_internal.h (not that it matters much).
We do the indirect-header thing all over the place. There's no point in
trying to avoid it completely.
ok
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Sep-07, Tom Lane wrote: >> I didn't bother tracking down exactly where initdb.c is getting >> stringinfo.h from, but it clearly does import it somewhere in >> our rat's nest of headers. > You can see it here: https://doxygen.postgresql.org/initdb_8c.html > It's through xlog_internal.h (not that it matters much). If there were something to be on the warpath about, it's that initdb is pulling in such an obviously backend-only header as that. I wonder if we should refactor to fix that. Quick experimentation says that the symbols initdb actually needs out of that header are DEFAULT_MIN_WAL_SEGS DEFAULT_MAX_WAL_SEGS IsValidWalSegSize() regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Sep-07, Tom Lane wrote: >> If there were something to be on the warpath about, it's that initdb >> is pulling in such an obviously backend-only header as that. I wonder >> if we should refactor to fix that. > Hmm, if we wanted to clean things up I think we should look at what > other things from xlog_internal.h are being used by other frontend > programs; Yeah, I was thinking along the same lines, but hadn't done the legwork to see what else is needed by frontend code. regards, tom lane