Hi,
On 2022-02-15 23:30:07 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > When I'd looked at this last I apparently (looking at git stash, I ended up
> > discarding this) decided that the best way would be to change replace_token's
> > API to one that just processes one line at a time, with an outer loop that
> > processes all tokens in a line. I'm not really sure why anymore.
>
> Hmm, I did it the other way, as attached.
My goal when I did this was to improve the performance, rather than reduce the
memory usage (this was a few months back). It's clearly better to perform all
the replacements of a line soon after each other, rather than iterate over all
the lines, once for each replacement. The latter is guaranteed to not have the
data in L2/L1 anymore.
But if we move to not needing replacement for postgres.bki anymore, that's not
an important goal anymore.
Not that it matters anymore: At least for postgres.bki, it doesn't seem to
make sense to use a line based approach in the first place? Seems like it
should really be a processing the input file as a whole, doing all the
replacements, into a resizable output buffer.
I didn't review it in detail, but I think your approach makes sense.
> This gets it down to
> ==3254266== LEAK SUMMARY:
> ==3254266== definitely lost: 342 bytes in 17 blocks
> ==3254266== indirectly lost: 152 bytes in 2 blocks
> ==3254266== possibly lost: 0 bytes in 0 blocks
> ==3254266== still reachable: 2,403 bytes in 22 blocks
> ==3254266== suppressed: 0 bytes in 0 blocks
That's presumably not about the approach, but about doing it for all
replacements, rather than just bootstrap, like I did :)
> It seems that actually the pointer arrays *are* a big chunk of
> the leakage, because the individual strings get freed in the
> output loops!
Right, isn't that what I had said?
Greetings,
Andres Freund