IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
От | Heikki Linnakangas |
---|---|
Тема | IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Дата | |
Msg-id | efe09893-8e6d-9a34-8351-3ea3c6ab202a@iki.fi обсуждение исходный текст |
Ответ на | Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Список | pgsql-bugs |
On 17/06/2021 22:49, Tom Lane wrote: > I wrote: >> Pawel Kudzia <kudzia@gmail.com> writes: >>> with help from IRC we've found that decreasing work_mem from 1MB to 256kB >>> or less makes the problem go away: > >> Hmm. So that suggests that the index itself is *not* corrupt, >> but the problem is associated with a bug in the indexscan >> algorithms. > > After staring at the code I think there is at least one bug, and > possibly two, in shimTriConsistentFn. That's likely to be implicated > here because intarray's GIN opclass only provides a bool consistent > function. I'm not very clear on the circumstances that lead to the scan > code inventing GIN_MAYBE inputs, so I haven't been able to construct a > test case. > > The definite bug is triggered because intarray relies on the API > specification that says that if the search mode is > GIN_SEARCH_MODE_DEFAULT, then the consistentFn will only be called > when there's at least one TRUE key: > > case RTOverlapStrategyNumber: > /* result is not lossy */ > *recheck = false; > /* at least one element in check[] is true, so result = true */ > res = true; > break; > > shimTriConsistentFn ignores this and calls it on all-FALSE inputs, > for which it'll incorrectly get a TRUE result, as it will also for > all the following checks. The upshot is that shimTriConsistentFn > will convert any case with a MAYBE input to a hard TRUE result, > with no recheck requirement. This'd easily explain the reported > misbehavior. That's subtle. The documentation says: > If <literal>*searchMode</literal> is set to > <literal>GIN_SEARCH_MODE_DEFAULT</literal> (which is the value it is > initialized to before call), only items that match at least one of > the returned keys are considered candidate matches. I guess you can read that as "consistentFn will only be called when there is at least one matching item", but that didn't occur to me when I read that at first. But I agree we need to fix shimTriConsistentFn to respect that. > The other thing that I'm unsure about, because the code is horribly > underdocumented, is that it's not very clear whether > shimTriConsistentFn is accurately converting between the bool and > the tristate APIs. That's because it's not very clear what the > tristate API actually *is*. In particular, is the end state of > key->recheckCurItem meaningful in the tristate case? If it's not, > then the short-circuit case for no MAYBE inputs is broken, because > it'll return TRUE when the bool consistentFn is trying to tell us > to recheck. But if it is meaningful, then the multiway case is broken, > because it will return the recheckCurItem result set by the last call to > the bool consistentfn; which might be false even though other calls said > true. (So in that case I think we'd need "key->recheckCurItem = recheck" > at the end.) I also wonder how any of that works correctly for real > triconsistent functions, which don't have access to the recheckCurItem > flag. > > Anyway, I'm punting this to the code authors. I'd like to see > some comments clarifying what the API really is, not just a > quick-n-dirty code fix. I've been trying to create a test case for this, but no luck so far. I cannot find a codepath that would hit these bugs with the kind of query that Pawel used. The search condition is very simple: "column && '{constant}'", with only one key and one constant to search for. There are three calls to triConsistentFn: 1. In startScanKey, but that's only reached if (key->nentries > 1), so not reachable with Pawel's query. 2. In keyGetItem, when haveLossyEntry==true. But AFAICS, haveLossyEntry is never true with Pawel's query. That would require curItem to be a lossy pointer, and that in turn can only happen when matchBitmap is used, and that's only used with partial match queries and with queries that need a full-index scan. 3. The second call in keyGetItem is reached, but it is only passed GIN_MAYBE when curItem is lossy. Which isn't possible with this query, see point 2. AFAICS the recheckCurItem bug should also not cause wrong results with that query. I must be missing something. I could probably construct a test case with a different query, one involving multiple elements in the search key, but I'd like to have a solid explanation for the original report. Pawel, is this a production system, or would it be possible for you to build from sources with a patch with some extra debugging output? - Heikki
В списке pgsql-bugs по дате отправления:
Предыдущее
От: dhanabakeeswari vДата:
Сообщение: Re: BUG #17060: ERROR: column "rownum" does not exist
Следующее
От: Pawel KudziaДата:
Сообщение: Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows