Обсуждение: [GENERAL] gin index trouble

Поиск
Список
Период
Сортировка

[GENERAL] gin index trouble

От
Rob Sargent
Дата:
I’ve hit this same message 
Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN page is of different type
in a couple of contexts and I’m starting to get worried. 
  • I’ve rebuilt the index, but will that help? 
  • Is there a way to see what the ‘different type’ is? 
  • Is it caught/clean-up by vacuum analyse or some such?

I’ve had good results using “<@" and “@>” and believe I've defended the use of an array, but I can’t loose three days worth of simulations to this dang wrong sibling.

select version(); — will use production release of 10 next week.
                                                  version                                                   
------------------------------------------------------------------------------------------------------------
 PostgreSQL 10beta3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit


The only gin index I have is in this table definition:
\d sui.probandset
           Table "sui.probandset"
   Column    |       Type       | Modifiers 
-------------+------------------+-----------
 id          | uuid             | not null
 name        | text             | 
 probands    | uuid[]           | not null
 meioses     | integer          | 
 min_kincoef | double precision | 
 max_kincoef | double precision | 
 people_id   | uuid             | not null
Indexes:
    "probandset_pkey" PRIMARY KEY, btree (id)
    "probandsetunique" gin (probands)
Check constraints:
    "sortedset" CHECK (issorteduuids(probands))
Foreign-key constraints:
    "probandset_people_id_fkey" FOREIGN KEY (people_id) REFERENCES base.people(id)
Referenced by:
    TABLE "sui.probandset_group_member" CONSTRAINT "probandset_group_member_member_id_fkey" FOREIGN KEY (member_id) REFERENCES sui.probandset(id)
    TABLE "sui.segment" CONSTRAINT "segment_probandset_id_fkey" FOREIGN KEY (probandset_id) REFERENCES sui.probandset(id)

and I um, er, enabled gin on uuid by copying from a thread in this list, as follows:
create operator class _uuid_ops
default for type _uuid
using gin as
operator 1 &&(anyarray, anyarray)
,operator 2 @>(anyarray, anyarray)
,operator 3 <@(anyarray, anyarray)
,operator 4 =(anyarray, anyarray)
,function 1 uuid_cmp(uuid, uuid)
,function 2 ginarrayextract(anyarray, internal, internal)
,function 3 ginqueryarrayextract(anyarray, internal, smallint, internal, internal, internal, internal)
,function 4 ginarrayconsistent(internal, smallint, anyarray, integer, internal, internal, internal, internal)
,storage uuid;

Re: [GENERAL] gin index trouble

От
Tom Lane
Дата:
Rob Sargent <rsargent@xmission.com> writes:
> I’ve hit this same message
> Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN page is of different type
> in a couple of contexts and I’m starting to get worried.

If you can make a test case that (eventually) hits that, we'd be
interested to see it ...

> and I um, er, enabled gin on uuid by copying from a thread in this list, as follows:
> create operator class _uuid_ops
> default for type _uuid
> using gin as
> operator 1 &&(anyarray, anyarray)
> ,operator 2 @>(anyarray, anyarray)
> ,operator 3 <@(anyarray, anyarray)
> ,operator 4 =(anyarray, anyarray)
> ,function 1 uuid_cmp(uuid, uuid)
> ,function 2 ginarrayextract(anyarray, internal, internal)
> ,function 3 ginqueryarrayextract(anyarray, internal, smallint, internal, internal, internal, internal)
> ,function 4 ginarrayconsistent(internal, smallint, anyarray, integer, internal, internal, internal, internal)
> ,storage uuid;

You should not have needed to do that, I think, as the standard
anyarray GIN opclass should've handled it.  Having said that,
I don't immediately see anything broken about this definition,
so it seems like it should've worked.
        regards, tom lane


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] gin index trouble

От
Rob Sargent
Дата:
> If you can make a test case that (eventually) hits that, we'd be
> interested to see it ...
>

Any hint(s) on what might trigger this sort of thing?  I could duplicate 
the upload, but I doubt you want the 800K records, 200M input file even 
if it did regenerate the problem.

Would select * from <table> order by <gin'd column> show the message?
>> and I um, er, enabled gin on uuid by copying from a thread in this list, as follows:
>> create operator class _uuid_ops
>> default for type _uuid
>> using gin as
>> operator 1 &&(anyarray, anyarray)
>> ,operator 2 @>(anyarray, anyarray)
>> ,operator 3 <@(anyarray, anyarray)
>> ,operator 4 =(anyarray, anyarray)
>> ,function 1 uuid_cmp(uuid, uuid)
>> ,function 2 ginarrayextract(anyarray, internal, internal)
>> ,function 3 ginqueryarrayextract(anyarray, internal, smallint, internal, internal, internal, internal)
>> ,function 4 ginarrayconsistent(internal, smallint, anyarray, integer, internal, internal, internal, internal)
>> ,storage uuid;
> You should not have needed to do that, I think, as the standard
> anyarray GIN opclass should've handled it.  Having said that,
> I don't immediately see anything broken about this definition,
> so it seems like it should've worked.

Good to hear.

Thanks.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] gin index trouble

От
Peter Geoghegan
Дата:
On Mon, Oct 30, 2017 at 7:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Rob Sargent <rsargent@xmission.com> writes:
>> I’ve hit this same message
>> Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN page is of different type
>> in a couple of contexts and I’m starting to get worried.
>
> If you can make a test case that (eventually) hits that, we'd be
> interested to see it ...

I suspect that this is the 9.6 bug that I described on that recent
-bugs thread [1]. It's just another symptom of the same problem.

It's certainly true that we saw a mix of undetectable
deadlocks/lock-ups (as seen on that -bugs thread) and corruption (as
seen on this thread) before commit e2c79e14 tried to address those
problems. Jeff Janes reported both symptoms in the thread leading up
to that commit during the beta period for 9.6. My guess is that that
commit was insufficient, and that we now continue to see the same mix
of symptoms for what is essentially the same bug.

[1] https://postgr.es/m/CAH2-WzmtLXbs8+c19t1T=Rj0KyP7vK9q8hQJULgDLdVMuEeeUw@mail.gmail.com
--
Peter Geoghegan


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] gin index trouble

От
Tom Lane
Дата:
Rob Sargent <robjsargent@gmail.com> writes:
>> If you can make a test case that (eventually) hits that, we'd be
>> interested to see it ...

> Any hint(s) on what might trigger this sort of thing?  I could duplicate 
> the upload, but I doubt you want the 800K records, 200M input file even 
> if it did regenerate the problem.

It's possible you could duplicate the failure with synthetic data
generated by a not-very-long script.  That would beat uploading
a large data file, not to mention possibly needing to sanitize
your data.
        regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] gin index trouble

От
Peter Geoghegan
Дата:
On Fri, Oct 27, 2017 at 3:15 PM, Rob Sargent <rsargent@xmission.com> wrote:
> I’ve hit this same message
>
> Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN
> page is of different type
>
> in a couple of contexts and I’m starting to get worried.
>
> I’ve rebuilt the index, but will that help?
> Is there a way to see what the ‘different type’ is?
> Is it caught/clean-up by vacuum analyse or some such?

Is there a lot of churn on this table? Do you either heavily update or
heavily delete rows in the table? Does vacuum tend to run on the table
rather frequently?

--
Peter Geoghegan


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] gin index trouble

От
Rob Sargent
Дата:

On 10/30/2017 10:32 AM, Peter Geoghegan wrote:
> On Fri, Oct 27, 2017 at 3:15 PM, Rob Sargent <rsargent@xmission.com> wrote:
>> I’ve hit this same message
>>
>> Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN
>> page is of different type
>>
>> in a couple of contexts and I’m starting to get worried.
>>
>> I’ve rebuilt the index, but will that help?
>> Is there a way to see what the ‘different type’ is?
>> Is it caught/clean-up by vacuum analyse or some such?
> Is there a lot of churn on this table? Do you either heavily update or
> heavily delete rows in the table? Does vacuum tend to run on the table
> rather frequently?
>
Peter, you beat me to the punch.  I was just about to say "Having read 
the referenced message I thought I would add that we never delete from 
this table."  In this particular case it was written to record by 
record, in a previous execution and at the time of the error it was only 
being read.  (In case you've been following, the failed execution would 
have added ~1M "segments", each which references an entry in the gin'd 
table "probandsets" - but like a rookie I'm looking up each 
probandset(2^16) individually. Re-working that NOW.)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] gin index trouble

От
Peter Geoghegan
Дата:
On Mon, Oct 30, 2017 at 9:45 AM, Rob Sargent <robjsargent@gmail.com> wrote:
> Peter, you beat me to the punch.  I was just about to say "Having read the
> referenced message I thought I would add that we never delete from this
> table."  In this particular case it was written to record by record, in a
> previous execution and at the time of the error it was only being read.  (In
> case you've been following, the failed execution would have added ~1M
> "segments", each which references an entry in the gin'd table "probandsets"
> - but like a rookie I'm looking up each probandset(2^16) individually.
> Re-working that NOW.)

It's not surprising that only a SELECT statement could see this
problem. I guess that it's possible that only page deletions used for
the pending list are involved here.

I'm not sure how reliably you can recreate the problem, but if it
doesn't take too long then it would be worth seeing what effect
turning off the FASTUPDATE storage parameter for the GIN index has.
That could prevent the problem from recurring, and would support my
theory about what's up here. (It wouldn't fix the corruption, though.)

Of course, what I'd much prefer is a self-contained test case. But if
you can't manage that, or if reproducing the issue takes hours, then
this simpler experiment might be worthwhile.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] gin index trouble

От
Rob Sargent
Дата:

On 10/30/2017 10:56 AM, Peter Geoghegan wrote:
> On Mon, Oct 30, 2017 at 9:45 AM, Rob Sargent <robjsargent@gmail.com> wrote:
>> Peter, you beat me to the punch.  I was just about to say "Having read the
>> referenced message I thought I would add that we never delete from this
>> table."  In this particular case it was written to record by record, in a
>> previous execution and at the time of the error it was only being read.  (In
>> case you've been following, the failed execution would have added ~1M
>> "segments", each which references an entry in the gin'd table "probandsets"
>> - but like a rookie I'm looking up each probandset(2^16) individually.
>> Re-working that NOW.)
> It's not surprising that only a SELECT statement could see this
> problem. I guess that it's possible that only page deletions used for
> the pending list are involved here.
>
> I'm not sure how reliably you can recreate the problem, but if it
> doesn't take too long then it would be worth seeing what effect
> turning off the FASTUPDATE storage parameter for the GIN index has.
> That could prevent the problem from recurring, and would support my
> theory about what's up here. (It wouldn't fix the corruption, though.)
>
> Of course, what I'd much prefer is a self-contained test case. But if
> you can't manage that, or if reproducing the issue takes hours, then
> this simpler experiment might be worthwhile.
>
I can reload the gin'd table repeatedly in a dev environment. Does 
select * from <table> order by <gin'd column> expose the corruption or 
does the load itself necessarily fail at the moment of corruption?



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] gin index trouble

От
Rob Sargent
Дата:



On 10/30/2017 10:56 AM, Peter Geoghegan wrote:
On Mon, Oct 30, 2017 at 9:45 AM, Rob Sargent <robjsargent@gmail.com> wrote:
Peter, you beat me to the punch.  I was just about to say "Having read the
referenced message I thought I would add that we never delete from this
table."  In this particular case it was written to record by record, in a
previous execution and at the time of the error it was only being read.  (In
case you've been following, the failed execution would have added ~1M
"segments", each which references an entry in the gin'd table "probandsets"
- but like a rookie I'm looking up each probandset(2^16) individually.
Re-working that NOW.)
It's not surprising that only a SELECT statement could see this
problem. I guess that it's possible that only page deletions used for
the pending list are involved here.

I'm not sure how reliably you can recreate the problem, but if it
doesn't take too long then it would be worth seeing what effect
turning off the FASTUPDATE storage parameter for the GIN index has.
That could prevent the problem from recurring, and would support my
theory about what's up here. (It wouldn't fix the corruption, though.)

Of course, what I'd much prefer is a self-contained test case. But if
you can't manage that, or if reproducing the issue takes hours, then
this simpler experiment might be worthwhile.

My test database machine is:
Not virtual
Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz (quad core, hyperthreaded)
MemTotal:       16272548 kB
default postgres.conf from yum install postgresql-10*
I've loaded thrice the number of records (190K) into the problem table, but no sign yet of the problem. But unlike the production lookup-notfind-insert (anti)pattern, these were all loaded in a single transaction. I think the following query has to read the gin'd column of every record:
select array_length(probands,1) as heads,
       count(*) as occurs
from tld.probandset
where probands @> '{65fe3b60-1c86-4b14-a85d-21abdf68f9e2,f0963403-3f3c-426d-a828-b5bfff914bb4}' group by array_length(probands,1)
order by array_length(probands,1);

 heads | occurs
-------+--------
     2 |      1
     3 |     14
     4 |     91
     5 |    364
     6 |   1001
     7 |   2002
     8 |   3003
     9 |   3432
    10 |   3003
    11 |   2002
    12 |   1001
    13 |    364
    14 |     91
    15 |     14
    16 |      1
(15 rows)

Time: 17.125 ms
Happy as a clam.

I'll try a run of the antipattern.  I have NOT diddled
FASTUPDATE at all.