Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Дата
Msg-id 293815bb-3304-a3a7-01a9-f3033ede0a74@iki.fi
обсуждение исходный текст
Ответ на Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows  (Pawel Kudzia <kudzia@gmail.com>)
Ответы Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows  (Pawel Kudzia <kudzia@gmail.com>)
Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-bugs
On 23/07/2021 18:04, Pawel Kudzia wrote:
> Thanks a lot for your patience and multiple patches that you've
> provided. Please pardon my ignorance - I don't have low-level
> understanding of how the query is being executed - but are you sure
> that index is missing entries and not the other way around - that it
> has too many entries?

To be precise, the index has an extra entry for row (4002784,1) with key 
1373, and it's missing the entry with key 38048120. And for row 
(4002869,14), it has an extra row for key 1373, and it's missing the 
entry for key 95333744.

> To recap - SELECT, answered based on the GIN, reports rows that
> actually do not match the criteria provided in WHERE. Just lowering
> work_mem makes the problem go away, whith GIN still being used.

The reason that lowering work_mem hides the problem is that GIN collects 
all the matches in a so called TID bitmap, and if the bitmap grows too 
large compared to work_mem, it becomes lossy to save memory. When it's 
lossy, it only stores the heap block numbers of the matches. For a 
regular, non-lossy, match, the Bitmap Heap Scan just returns the row 
that the index says is a match. For the lossy matches, the Bitmap Heap 
Scan node needs to check every row on the page to see which ones 
actually match. This re-checking hides the problem that some of the 
matches that the index reported were not real.

Note that you are also getting incorrect results with missing row for 
other queries. You can try it with e.g.:

-- Using the index
set enable_seqscan=off;
set enable_bitmapscan=on;
SELECT count(*) FROM entity WHERE attribute_name_ids && '{95333744}';
SELECT count(*) FROM entity WHERE attribute_name_ids && '{38048120}';

-- Without index
set enable_seqscan=on;
set enable_bitmapscan=off;
SELECT count(*) FROM entity WHERE attribute_name_ids && '{95333744}';
SELECT count(*) FROM entity WHERE attribute_name_ids && '{38048120}';


I'll work on a patch to add more sanity checks to the GIN code when it 
traverses the tree, to catch the case that it accidentally steps on a 
wrong kind of a page (I'm pretty busy next week, so might not get to 
that until the week after though). I don't think that will help here, 
but who knows, and at least we can rule out some kinds of bugs. 
Alexander, can you finish the fixes to the ternary logic? That doesn't 
explain this corruption either, but we should fix it anyway.

- Heikki



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andrey Borodin
Дата:
Сообщение: Re: BUG #17122: panic on prepare with subsequent pg_advisory_lock() and pg_advisory_xact_lock_shared()
Следующее
От: Pawel Kudzia
Дата:
Сообщение: Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows