Re: [PATCH] fix GIN index search sometimes losing results

Поиск

Список

Период

Сортировка

От	Pavel Borisov
Тема	Re: [PATCH] fix GIN index search sometimes losing results
Дата	17 мая 2020 г. 22:53:21
Msg-id	CALT9ZEEW37sR0rP8K7r1Zjxc2e60xXSv7F8kgsJu5tC-AXyK0g@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PATCH] fix GIN index search sometimes losing results (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: [PATCH] fix GIN index search sometimes losing results
Список	pgsql-hackers

Дерево обсуждения

Hi, all!

Below is my variant how to patch Gin-Gist weights issue:

1. First of all I propose to shift from previously Gin's own TS_execute variant and leave only two: TS_execute with bool result and bool type callback and ternary TS_execute_recurse with ternary callback. I suppose all legacy consistent callers can still use bool via provided wrapper.

2. I integrated logic for indexes which do not support weights and positions inside (which gives MAYBE in certain cases on negation) inside previous TS_execute_recurse function called with additional flag for this class of indexes.

3. Check function for GIST and GIN now gives ternary result and is called with ternary type callback. I think in future nothing prevents smoothly shifting callback functions, check functions and even TS_execute result to ternary.

So I also send my variant patch for review and discussion.

Regards,

Pavel Borisov

вс, 17 мая 2020 г. в 03:14, Tom Lane <tgl@sss.pgh.pa.us>:

I wrote:
> I think the root of the problem is that if we have a query using
> weights, and we are testing tsvector data that lacks positions/weights,
> we can never say there's definitely a match. I don't see any decently
> clean way to fix this without redefining the TSExecuteCallback API
> to return a tri-state YES/NO/MAYBE result, because really we need to
> decide that it's MAYBE at the level of processing the QI_VAL node,
> not later on. I'd tried to avoid that in e81e5741a, but maybe we
> should just bite that bullet, and not worry about whether there's
> any third-party code providing its own TSExecuteCallback routine.
> codesearch.debian.net suggests that there are no external callers
> of TS_execute, so maybe we can get away with that.

0001 attached is a proposed patch that does it that way. Given the
API break involved, it's not quite clear what to do with this.
ISTM we have three options:

1. Ignore the API issue and back-patch. Given the apparent lack of
external callers of TS_execute, maybe we can get away with that;
but I wonder if we'd get pushback from distros that have automatic
ABI-break detectors in place.

2. Assume we can't backpatch, but it's still OK to slip this into
v13. (This option clearly has a limited shelf life, but I think
we could get away with it until late beta.)

3. Assume we'd better hold this till v14.

I find #3 unduly conservative, seeing that this is clearly a bug
fix, but on the other hand #1 is a bit scary. Aside from the API
issue, it's not impossible that this has introduced some corner
case behavioral changes that we'd consider to be new bugs rather
than bug fixes.

Anyway, some notes for reviewers:

* The core idea of the patch is to make the TS_execute callbacks
have ternary results and to insist they return TS_MAYBE in any
case where the correct result is uncertain.

* That fixes the bug at hand, and it also allows getting rid of
some kluges at higher levels. The GIN code no longer needs its
own TS_execute_ternary implementation, and the GIST code no longer
needs to suppose that it can't trust NOT results.

* I put some effort into not leaking memory within tsvector_op.c's
checkclass_str and checkcondition_str. (The final output array
can still get leaked, I believe. Fixing that seems like material
for a different patch, and it might not be worth any trouble.)

* The new test cases in tstypes.sql are to verify that we didn't
change behavior of the basic tsvector @@ tsquery code. There wasn't
any coverage of these cases before, and the logic for checkclass_str
without position info had to be tweaked to preserve this behavior.

* The new cases in tsearch verify that the GIN and GIST code gives
the same results as the basic operator.

Now, as for the 0002 patch attached: after 0001, the only TS_execute()
callers that are not specifying TS_EXEC_CALC_NOT are hlCover(),
which I'd already complained is probably a bug, and the first of
the two calls in tsrank.c's Cover(). It seems difficult to me to
argue that it's not a bug for Cover() to process NOT in one call
but not the other --- moreover, if there was any argument for that
once upon a time, it probably falls to the ground now that (a) we
have a less buggy implementation of NOT and (b) the presence of
phrase queries significantly raises the importance of not taking
short-cuts. Therefore, 0002 attached rips out the TS_EXEC_CALC_NOT
flag and has TS_execute compute NOT expressions accurately all the
time.

As it stands, 0002 changes no regression test results, which I'm
afraid speaks more to our crummy test coverage than anything else;
tests that exercise those two functions with NOT-using queries
would easily show that there is a difference.

Even if we decide to back-patch 0001, I would not suggest
back-patching 0002, as it's more nearly a definitional change
than a bug fix. But I think it's a good idea anyway.

I'll stick this in the queue for the July commitfest, in case
anybody wants to review it.

regards, tom lane

Вложения

gin-gist-weight-patch-v3.diff

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Alvaro Herrera
Дата: 17 мая 2020 г., 18:28:51
Сообщение: Re: Add A Glossary

Следующее

От: "Joe Wildish"
Дата: 18 мая 2020 г., 02:29:56
Сообщение: [PATCH] Add support to psql for edit-and-execute-command

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCH] fix GIN index search sometimes losing results

Вложения

Предыдущее

Следующее