Обсуждение: BUG #16595: Reads fail with "lost saved point in index" error after writes
BUG #16595: Reads fail with "lost saved point in index" error after writes
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 16595 Logged by: Jesse Kinkead Email address: jesse@falkon.ai PostgreSQL version: 11.6 Operating system: Linux Description: We've seen this on a few different servers running Postgres 11.6. The table in question has a JSONB column with a GIN index, defined as: "index_name_idx" gin (jsonb_column_name jsonb_path_ops, integer_column, timestamp_with_time_zone_column) Under load (especially writes followed immediately by reads), we occasionally see reads failing an internal exception (error code XX000) with the message "lost saved point in index". This seems to come from this source file: https://doxygen.postgresql.org/ginget_8c_source.html To further complicate matters, this server is using TimescaleDB (https://www.timescale.com/), and has this table configured as a hypertable (https://docs.timescale.com/latest/using-timescaledb/hypertables), which I believe means it's actually multiple tables masquerading as a single table. The errors are quite frequent when running with slow CPUs / disks, but more rare when running with more resources.
PG Bug reporting form <noreply@postgresql.org> writes: > The table in question has a JSONB column with a GIN index, defined as: > "index_name_idx" gin (jsonb_column_name jsonb_path_ops, integer_column, > timestamp_with_time_zone_column) > Under load (especially writes followed immediately by reads), we > occasionally see reads failing an internal exception (error code XX000) with > the message "lost saved point in index". [ stares at that for awhile... ] Hm, I think this is just broken for multi-column GIN indexes. The code is expecting that it can re-find an index item that was there before, and that the item must either be where it was before in the index or somewhere to the right. Both of those assumptions seem valid, since ginvacuum never removes items; but insertions could push the item to the right. But this: if (gintuple_get_attrnum(btree->ginstate, itup) != attnum) elog(ERROR, "lost saved point in index"); /* must not happen !!! */ amounts to an assumption that things can't get pushed so far to the right that our pointer is now pointing at an item with lower attnum. There's no reason for that to be true --- AFAICS, the attnum is the first sort key for the items, but there's not physical segregation into different index pages or anything like that. We could throw an error if we reach a higher attnum, but since the whole thing is supposed to be can't-happen, I doubt it's worth expending code on. What we should do is just advance over any item that hasn't got the right attnum. regards, tom lane
Re: BUG #16595: Reads fail with "lost saved point in index" error after writes
От
Jesse Kinkead
Дата:
Thanks for the quick reply!
It sounds like this isn't a known problem, and that the only workaround might be just dropping the index. Hmm!
So far, we've had fine luck simply retrying on error, and retries typically succeed, so that's also a fine path forward.
Thanks again!
- jesse
On Thu, Aug 27, 2020 at 8:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> The table in question has a JSONB column with a GIN index, defined as:
> "index_name_idx" gin (jsonb_column_name jsonb_path_ops, integer_column,
> timestamp_with_time_zone_column)
> Under load (especially writes followed immediately by reads), we
> occasionally see reads failing an internal exception (error code XX000) with
> the message "lost saved point in index".
[ stares at that for awhile... ] Hm, I think this is just broken for
multi-column GIN indexes. The code is expecting that it can re-find
an index item that was there before, and that the item must either be
where it was before in the index or somewhere to the right. Both
of those assumptions seem valid, since ginvacuum never removes items;
but insertions could push the item to the right.
But this:
if (gintuple_get_attrnum(btree->ginstate, itup) != attnum)
elog(ERROR, "lost saved point in index"); /* must not happen !!! */
amounts to an assumption that things can't get pushed so far to the
right that our pointer is now pointing at an item with lower attnum.
There's no reason for that to be true --- AFAICS, the attnum is the
first sort key for the items, but there's not physical segregation
into different index pages or anything like that.
We could throw an error if we reach a higher attnum, but since the
whole thing is supposed to be can't-happen, I doubt it's worth
expending code on. What we should do is just advance over any
item that hasn't got the right attnum.
regards, tom lane
Jesse Kinkead <jesse@falkon.ai> writes: > It sounds like this isn't a known problem, Well, now it is :-). I'm just finishing up testing a patch. > and that the only workaround > might be just dropping the index. Hmm! Single-column GIN indexes aren't subject to the issue, so maybe you could work around it by replacing the multicolumn index with some single-column ones? Also, now that I've studied the code some more, the code path with the issue is only hit for queries that use (a) partial match or (b) an empty query that forces a full-index scan, similar to "arraycol @> '{}'". That might explain how come the bug went undetected for so long. I dunno if that info will help you dodge the bug, but maybe. If you can't find any acceptable workaround, another idea is to build your own server version with the patch applied, to tide you over until there's an official release with the fix. IMO one of the main benefits of using open-source code is that that's normally not very difficult. regards, tom lane
Re: BUG #16595: Reads fail with "lost saved point in index" error after writes
От
Jesse Kinkead
Дата:
On Thu, Aug 27, 2020 at 1:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Jesse Kinkead <jesse@falkon.ai> writes: > > It sounds like this isn't a known problem, > > Well, now it is :-). I'm just finishing up testing a patch. Awesome! > > and that the only workaround > > might be just dropping the index. Hmm! > > Single-column GIN indexes aren't subject to the issue, so maybe you > could work around it by replacing the multicolumn index with some > single-column ones? This might be an option for us to try out; thank you. > Also, now that I've studied the code some more, the code path with > the issue is only hit for queries that use (a) partial match or > (b) an empty query that forces a full-index scan, similar to > "arraycol @> '{}'". That might explain how come the bug went > undetected for so long. I dunno if that info will help you dodge > the bug, but maybe. Interestingly, we're not querying on empty values, but we ARE querying on multiple values at the same time (jsonb_column @> "value1" AND jsonb_column @> "value2"). > If you can't find any acceptable workaround, another idea is to > build your own server version with the patch applied, to tide > you over until there's an official release with the fix. IMO > one of the main benefits of using open-source code is that that's > normally not very difficult. Indeed! And they frequently have very responsive maintainers. :)