Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
От | Pawel Kudzia |
---|---|
Тема | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Дата | |
Msg-id | CAJYBUS-DHFWJLAYLRSsJ=ZpEOvZNnL=VQd2puPQvhaKiJWvheQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows (Heikki Linnakangas <hlinnaka@iki.fi>) |
Ответы |
Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
|
Список | pgsql-bugs |
On Fri, Jul 23, 2021 at 11:47 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 23 July 2021 18:04:50 EEST, Pawel Kudzia <kudzia@gmail.com> wrote: > >On Fri, Jul 23, 2021 at 3:46 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >> > >> Ok, so the index is missing entries for the correct key. Looks like the > >> index entries were inserted into the wrong subtree, under wrong key. But > >> *how* did that happen? I'm out of ideas, I'm afraid :-(. > > Can you tell more about the workload? What INSERT/UPDATE/DELETE commands do you run? How many concurrent clients? Do yourun vacuum manually or how often does autovacuum kick in? How long did it take for the problem to appear? What queriesdid you run to find the corruption, and how often? > It's a mix with low percentage of simple INSERT INTO entity [ inserting single row within a transaction ], DELETE FROM entity [ deleting single row, selected by primary key ] and high number of UPDATE entity SET .. WHERE primname=?. There are few millions such updates a day. Single transaction has just one such update, update sets most of the table columns although in fact most of their values remain unchanged. Concurrency level - less than 50 parallel connections. Vacuum is run manually extremely rarely [ maybe 2-3x a year ]. Based on recent logs - autovacuum happens every ~ 7 days. Problem was discovered by coincidence - one of users noticed incorrect results returned, this provoked me to run a script that hammered database every hour with set of queries in this form: SELECT primname FROM entity WHERE ( attribute_name_ids && '{{$ani}}' ) AND NOT ( (attribute_name_ids||0) && '{{$ani}}') It would discover ~ 1-5 problematic entries each week, but there were periods of few weeks where new problems were not found [ note that this does not mean there were no such problems, i was hammering only one of GIN columns - where i had smallest universe of values to iterate over ]. > I know it's a big ask, but would it be possible to simplify the test case further, to make it reproduce faster? Ideallyas a self-contained test script with any sensitive data removed. > Sadly - I've put quite a lot of effort to reproduce the issue in test environment and was never able to do it. Problem occured on the production only, after prolonged execution of live update traffic. Note that in the I've moved database to another physical server to eliminate potential hardware malfunction - that did not help, problem continued to occur. I've also upgraded from 12.5 to 13.3. I no longer have a production setup that uses index built in this way: CREATE INDEX entity_attribute_name_ids_gin ON public.entity USING gin (attribute_name_ids public.gin__int_ops); 2 weeks ago i've moved to CREATE INDEX entity_attribute_name_ids_gin ON public.entity USING gin (attribute_name_ids); And have not seen the problem manifest again. Nevertheless - I have 2 file-level dumps of databases where we caught the inconsistency and would like to help with diagnosing the problem. -- regards, Pawel Kudzia
В списке pgsql-bugs по дате отправления: