Обсуждение: [GENERAL] gin index trouble
I’ve hit this same message
Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN page is of different type
in a couple of contexts and I’m starting to get worried.
- I’ve rebuilt the index, but will that help?
- Is there a way to see what the ‘different type’ is?
- Is it caught/clean-up by vacuum analyse or some such?
I’ve had good results using “<@" and “@>” and believe I've defended the use of an array, but I can’t loose three days worth of simulations to this dang wrong sibling.
select version(); — will use production release of 10 next week.
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 10beta3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 10beta3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
The only gin index I have is in this table definition:
and I um, er, enabled gin on uuid by copying from a thread in this list, as follows:\d sui.probandsetTable "sui.probandset"Column | Type | Modifiers-------------+------------------+-----------id | uuid | not nullname | text |probands | uuid[] | not nullmeioses | integer |min_kincoef | double precision |max_kincoef | double precision |people_id | uuid | not nullIndexes:"probandset_pkey" PRIMARY KEY, btree (id)"probandsetunique" gin (probands)Check constraints:"sortedset" CHECK (issorteduuids(probands))Foreign-key constraints:"probandset_people_id_fkey" FOREIGN KEY (people_id) REFERENCES base.people(id)Referenced by:TABLE "sui.probandset_group_member" CONSTRAINT "probandset_group_member_member_id_fkey" FOREIGN KEY (member_id) REFERENCES sui.probandset(id)TABLE "sui.segment" CONSTRAINT "segment_probandset_id_fkey" FOREIGN KEY (probandset_id) REFERENCES sui.probandset(id)
create operator class _uuid_opsdefault for type _uuidusing gin asoperator 1 &&(anyarray, anyarray),operator 2 @>(anyarray, anyarray),operator 3 <@(anyarray, anyarray),operator 4 =(anyarray, anyarray),function 1 uuid_cmp(uuid, uuid),function 2 ginarrayextract(anyarray, internal, internal),function 3 ginqueryarrayextract(anyarray, internal, smallint, internal, internal, internal, internal),function 4 ginarrayconsistent(internal, smallint, anyarray, integer, internal, internal, internal, internal),storage uuid;
Rob Sargent <rsargent@xmission.com> writes: > I’ve hit this same message > Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN page is of different type > in a couple of contexts and I’m starting to get worried. If you can make a test case that (eventually) hits that, we'd be interested to see it ... > and I um, er, enabled gin on uuid by copying from a thread in this list, as follows: > create operator class _uuid_ops > default for type _uuid > using gin as > operator 1 &&(anyarray, anyarray) > ,operator 2 @>(anyarray, anyarray) > ,operator 3 <@(anyarray, anyarray) > ,operator 4 =(anyarray, anyarray) > ,function 1 uuid_cmp(uuid, uuid) > ,function 2 ginarrayextract(anyarray, internal, internal) > ,function 3 ginqueryarrayextract(anyarray, internal, smallint, internal, internal, internal, internal) > ,function 4 ginarrayconsistent(internal, smallint, anyarray, integer, internal, internal, internal, internal) > ,storage uuid; You should not have needed to do that, I think, as the standard anyarray GIN opclass should've handled it. Having said that, I don't immediately see anything broken about this definition, so it seems like it should've worked. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
> If you can make a test case that (eventually) hits that, we'd be > interested to see it ... > Any hint(s) on what might trigger this sort of thing? I could duplicate the upload, but I doubt you want the 800K records, 200M input file even if it did regenerate the problem. Would select * from <table> order by <gin'd column> show the message? >> and I um, er, enabled gin on uuid by copying from a thread in this list, as follows: >> create operator class _uuid_ops >> default for type _uuid >> using gin as >> operator 1 &&(anyarray, anyarray) >> ,operator 2 @>(anyarray, anyarray) >> ,operator 3 <@(anyarray, anyarray) >> ,operator 4 =(anyarray, anyarray) >> ,function 1 uuid_cmp(uuid, uuid) >> ,function 2 ginarrayextract(anyarray, internal, internal) >> ,function 3 ginqueryarrayextract(anyarray, internal, smallint, internal, internal, internal, internal) >> ,function 4 ginarrayconsistent(internal, smallint, anyarray, integer, internal, internal, internal, internal) >> ,storage uuid; > You should not have needed to do that, I think, as the standard > anyarray GIN opclass should've handled it. Having said that, > I don't immediately see anything broken about this definition, > so it seems like it should've worked. Good to hear. Thanks. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Mon, Oct 30, 2017 at 7:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Rob Sargent <rsargent@xmission.com> writes: >> I’ve hit this same message >> Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN page is of different type >> in a couple of contexts and I’m starting to get worried. > > If you can make a test case that (eventually) hits that, we'd be > interested to see it ... I suspect that this is the 9.6 bug that I described on that recent -bugs thread [1]. It's just another symptom of the same problem. It's certainly true that we saw a mix of undetectable deadlocks/lock-ups (as seen on that -bugs thread) and corruption (as seen on this thread) before commit e2c79e14 tried to address those problems. Jeff Janes reported both symptoms in the thread leading up to that commit during the beta period for 9.6. My guess is that that commit was insufficient, and that we now continue to see the same mix of symptoms for what is essentially the same bug. [1] https://postgr.es/m/CAH2-WzmtLXbs8+c19t1T=Rj0KyP7vK9q8hQJULgDLdVMuEeeUw@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Rob Sargent <robjsargent@gmail.com> writes: >> If you can make a test case that (eventually) hits that, we'd be >> interested to see it ... > Any hint(s) on what might trigger this sort of thing? I could duplicate > the upload, but I doubt you want the 800K records, 200M input file even > if it did regenerate the problem. It's possible you could duplicate the failure with synthetic data generated by a not-very-long script. That would beat uploading a large data file, not to mention possibly needing to sanitize your data. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Fri, Oct 27, 2017 at 3:15 PM, Rob Sargent <rsargent@xmission.com> wrote: > I’ve hit this same message > > Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN > page is of different type > > in a couple of contexts and I’m starting to get worried. > > I’ve rebuilt the index, but will that help? > Is there a way to see what the ‘different type’ is? > Is it caught/clean-up by vacuum analyse or some such? Is there a lot of churn on this table? Do you either heavily update or heavily delete rows in the table? Does vacuum tend to run on the table rather frequently? -- Peter Geoghegan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On 10/30/2017 10:32 AM, Peter Geoghegan wrote: > On Fri, Oct 27, 2017 at 3:15 PM, Rob Sargent <rsargent@xmission.com> wrote: >> I’ve hit this same message >> >> Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN >> page is of different type >> >> in a couple of contexts and I’m starting to get worried. >> >> I’ve rebuilt the index, but will that help? >> Is there a way to see what the ‘different type’ is? >> Is it caught/clean-up by vacuum analyse or some such? > Is there a lot of churn on this table? Do you either heavily update or > heavily delete rows in the table? Does vacuum tend to run on the table > rather frequently? > Peter, you beat me to the punch. I was just about to say "Having read the referenced message I thought I would add that we never delete from this table." In this particular case it was written to record by record, in a previous execution and at the time of the error it was only being read. (In case you've been following, the failed execution would have added ~1M "segments", each which references an entry in the gin'd table "probandsets" - but like a rookie I'm looking up each probandset(2^16) individually. Re-working that NOW.) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Mon, Oct 30, 2017 at 9:45 AM, Rob Sargent <robjsargent@gmail.com> wrote: > Peter, you beat me to the punch. I was just about to say "Having read the > referenced message I thought I would add that we never delete from this > table." In this particular case it was written to record by record, in a > previous execution and at the time of the error it was only being read. (In > case you've been following, the failed execution would have added ~1M > "segments", each which references an entry in the gin'd table "probandsets" > - but like a rookie I'm looking up each probandset(2^16) individually. > Re-working that NOW.) It's not surprising that only a SELECT statement could see this problem. I guess that it's possible that only page deletions used for the pending list are involved here. I'm not sure how reliably you can recreate the problem, but if it doesn't take too long then it would be worth seeing what effect turning off the FASTUPDATE storage parameter for the GIN index has. That could prevent the problem from recurring, and would support my theory about what's up here. (It wouldn't fix the corruption, though.) Of course, what I'd much prefer is a self-contained test case. But if you can't manage that, or if reproducing the issue takes hours, then this simpler experiment might be worthwhile. -- Peter Geoghegan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On 10/30/2017 10:56 AM, Peter Geoghegan wrote: > On Mon, Oct 30, 2017 at 9:45 AM, Rob Sargent <robjsargent@gmail.com> wrote: >> Peter, you beat me to the punch. I was just about to say "Having read the >> referenced message I thought I would add that we never delete from this >> table." In this particular case it was written to record by record, in a >> previous execution and at the time of the error it was only being read. (In >> case you've been following, the failed execution would have added ~1M >> "segments", each which references an entry in the gin'd table "probandsets" >> - but like a rookie I'm looking up each probandset(2^16) individually. >> Re-working that NOW.) > It's not surprising that only a SELECT statement could see this > problem. I guess that it's possible that only page deletions used for > the pending list are involved here. > > I'm not sure how reliably you can recreate the problem, but if it > doesn't take too long then it would be worth seeing what effect > turning off the FASTUPDATE storage parameter for the GIN index has. > That could prevent the problem from recurring, and would support my > theory about what's up here. (It wouldn't fix the corruption, though.) > > Of course, what I'd much prefer is a self-contained test case. But if > you can't manage that, or if reproducing the issue takes hours, then > this simpler experiment might be worthwhile. > I can reload the gin'd table repeatedly in a dev environment. Does select * from <table> order by <gin'd column> expose the corruption or does the load itself necessarily fail at the moment of corruption? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On 10/30/2017 10:56 AM, Peter Geoghegan wrote:
My test database machine is:On Mon, Oct 30, 2017 at 9:45 AM, Rob Sargent <robjsargent@gmail.com> wrote:Peter, you beat me to the punch. I was just about to say "Having read the referenced message I thought I would add that we never delete from this table." In this particular case it was written to record by record, in a previous execution and at the time of the error it was only being read. (In case you've been following, the failed execution would have added ~1M "segments", each which references an entry in the gin'd table "probandsets" - but like a rookie I'm looking up each probandset(2^16) individually. Re-working that NOW.)It's not surprising that only a SELECT statement could see this problem. I guess that it's possible that only page deletions used for the pending list are involved here. I'm not sure how reliably you can recreate the problem, but if it doesn't take too long then it would be worth seeing what effect turning off the FASTUPDATE storage parameter for the GIN index has. That could prevent the problem from recurring, and would support my theory about what's up here. (It wouldn't fix the corruption, though.) Of course, what I'd much prefer is a self-contained test case. But if you can't manage that, or if reproducing the issue takes hours, then this simpler experiment might be worthwhile.
Not virtualI've loaded thrice the number of records (190K) into the problem table, but no sign yet of the problem. But unlike the production lookup-notfind-insert (anti)pattern, these were all loaded in a single transaction. I think the following query has to read the gin'd column of every record:
Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz (quad core, hyperthreaded)
MemTotal: 16272548 kB
default postgres.conf from yum install postgresql-10*
select array_length(probands,1) as heads,Happy as a clam.
count(*) as occurs
from tld.probandset
where probands @> '{65fe3b60-1c86-4b14-a85d-21abdf68f9e2,f0963403-3f3c-426d-a828-b5bfff914bb4}' group by array_length(probands,1)
order by array_length(probands,1);
heads | occurs
-------+--------
2 | 1
3 | 14
4 | 91
5 | 364
6 | 1001
7 | 2002
8 | 3003
9 | 3432
10 | 3003
11 | 2002
12 | 1001
13 | 364
14 | 91
15 | 14
16 | 1
(15 rows)
Time: 17.125 ms
I'll try a run of the antipattern. I have NOT diddled FASTUPDATE at all.