On 01/21/2014 04:02 AM, Tomas Vondra wrote:
> On 20.1.2014 19:30, Heikki Linnakangas wrote:
>>
>> Attached is a yet another version, with more bugs fixed and more
>> comments added and updated. I would appreciate some heavy-testing of
>> this patch now. If you could re-run the tests you've been using,
>> that could be great. I've tested the WAL replay by replicating GIN
>> operations over streaming replication. That doesn't guarantee it's
>> correct, but it's a good smoke test.
>
> I gave it a try - the OOM error seems to be gone, but now get this
>
> PANIC: cannot insert duplicate items to GIN index page
>
> This only happens when building the index incrementally (i.e. using a
> sequence of INSERT statements into a table with GIN index). When I
> create a new index on a table (already containing the same dataset) it
> works just fine.
>
> Also, I tried to reproduce the issue by running a simple plpgsql loop
> (instead of a complex python script):
>
> DO LANGUAGE plpgsql $$
> DECLARE
> r tsvector;
> BEGIN
> FOR r IN SELECT body_tsvector FROM data_table LOOP
> INSERT INTO idx_table (body_tsvector) VALUES (r);
> END LOOP;
> END$$;
>
> where data_table is the table with imported data (the same data I
> mentioned in the post about OOM errors), and index_table is an empty
> table with a GIN index. And indeed it fails, but only if I run the block
> in multiple sessions in parallel.
Oh, I see what's going on. I had assumed that there cannot be duplicate
insertions into the posting tree, but that's dead wrong. The fast
insertion mechanism depends on a duplicate insertion to do nothing.
Will fix, thanks for the testing!
- Heikki