On 07/12/2017 06:30 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> Yes, I can see that happening here too. The problem seems to be that the
>> analyze-function detoasts every row in the sample. Tsvectors can be very
>> large, so it adds up.
>
>> That's pretty easy to fix, the analyze function needs to free the
>> detoasted copies as it goes. But in order to do that, it needs to make
>> copies of all the lexemes stored in the hash table, instead of pointing
>> directly to the detoasted copies.
>
>> Patch attached. I think this counts as a bug, and we should backport this.
>
> +1. I didn't test the patch, but it looks sane to the eyeball.
Ok, committed.
In some quick testing on my laptop, and the extra palloc+pfree adds
about 10% of overhead, in the worst case scenario that every tsvector in
the sample consists of totally unique lexemes. That's a bit unfortunate,
but it's a lot better than consuming gigabytes of memory.
- Heikki
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs