Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Дата
Msg-id d4f4d5e2-35ef-58e4-4f31-5fdfb2c146a0@iki.fi
обсуждение исходный текст
Ответ на Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows  (Pawel Kudzia <kudzia@gmail.com>)
Ответы Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows  (Pawel Kudzia <kudzia@gmail.com>)
Список pgsql-bugs
On 15/07/2021 12:38, Pawel Kudzia wrote:
>>> How can I identify relevant pages to make selective dump of them for
>>> you, using method proposed by Peter?
>>
>> It's a bit hard without knowing which index pages are affected. But the
>> attached script (dump-gin-page-info.sql) dumps some summary information
>> about the kind of pages there are. That's a start.
> 
> I'm attaching output of the proposed commands.

Thanks!

>> Let's try one thing first: I rebased a patch to add GIN support to the
>> amcheck extension [1]. I kind of doubt that it will catch the corruption
>> in this case, but you never know. So please apply the attached
>> v2-0001-Amcheck-for-GIN-13stable.patch patch. It is the same patch I
>> posted at [1], but the installation script is slightly different to make
>> it work on PostgreSQL v13. Then compile install it:
>>
>>     cd contrib/amcheck; make install
>>
>> Then in psql:
>>
>>     CREATE EXTENSION amcheck;
>>     gin_index_parent_check('entity_attribute_name_ids_gin');
> 
> Sadly i'm getting this when executing CREATE EXTENSION amcheck;
> 
> ERROR:  extension "amcheck" has no installation script nor update path
> for version "1.2.1"

Hmm, that's odd, it worked for me. Did "make install" copy the 
"amcheck--1.2--1.2.1.sql" file to the right place? It should be 
installed the same directory as "amcheck.control".

Anyway, let's try a different tack. Here's another debugging patch. It 
causes a LOG message to be printed whenever a GIN index page is 
accessed, like this:

2021-07-15 21:22:20.119 EEST [972708] LOG:  ReadBuffer 1663/12678/27035 
read gin blk 0 (ginget.c:1832 scanPendingInsert)
2021-07-15 21:22:20.119 EEST [972708] STATEMENT:  explain analyze select 
* from test_intarray where a @> '{2}';
2021-07-15 21:22:20.119 EEST [972708] LOG:  ReadBuffer 1663/12678/27035 
read gin blk 1 (ginbtree.c:89 ginFindLeafPage)
2021-07-15 21:22:20.119 EEST [972708] STATEMENT:  explain analyze select 
* from test_intarray where a @> '{2}';

Please apply this patch, run the query again, and copy the resulting 
log. I'm also interested in the contents of the pages that are accessed, 
so if you could then run these commands to dump the contents of those 
index pages:

# extract just the block numbers from the log
cat <path to postgres log> | perl -ne '/.*read gin blk (\d+)/ && print 
"$1\n" ' | sort -n |uniq > /tmp/blocknums

# dump contents of each of those index pages to '/tmp/block-contents'
psql postgres -c "create temp table blocknums(blknum int); copy 
blocknums from '/tmp/blocknums'; copy (select blknum, 
get_raw_page('entity_attribute_name_ids_gin', blknum) from blocknums) to 
'/tmp/block-contents';"

Please send over the resulting PostgreSQL log, and 
'/tmp/block-contents', and I'll take a closer look into the contents to 
see if there's something funny. (A private email is OK if you don't feel 
comfortable sharing it with the world).

One more thing, please also add the 'ctid' virtual column to the query, 
like this:

SELECT ctid, entity_id FROM entity WHERE
( attribute_name_ids && '{1737}' ) AND NOT ( (attribute_name_ids||0)
&& '{1737}') LIMIT 10;

Also, did you happen to keep the WAL log from around the time that the 
query started producing wrong results? That would also be useful 
evidence for how it got into this situation.

Thanks in advance! We might need a few more round-trips, but we'll get 
there.

- Heikki

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size