Re: 8.4b2 tsearch2 strange error

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: 8.4b2 tsearch2 strange error
Дата
Msg-id 13882.1244242148@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: 8.4b2 tsearch2 strange error  (Tatsuo Ishii <ishii@postgresql.org>)
Ответы Re: 8.4b2 tsearch2 strange error  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I wrote:
>> I poked around in the dump for awhile.  I still can't reproduce the
>> failure from a standing start.

I've had a bit of success with reproducing the failure from scratch.
After loading the SQL dump that Tatsuo previously supplied into a UTF8,
C-locale database, generate a data-only dump of it, along the lines of

pg_dump -a ... >ishii-data.sql

Then run this psql script in the database (the function is designed to
exercise every entry in the GIN index):

create table msgstats as select * from ts_stat('select body_index from msginfo');

create language plpgsql;
create function testit () returns int as $$
declare r record;       ifail int := 0;
begin for r in select * from msgstats loop   begin   perform count(*) from msginfo where plainto_tsquery(r.word) @@
body_index;  exception when others then     raise notice 'failed for "%": %', r.word, sqlerrm;     ifail := ifail + 1;
end; end loop; return ifail;
 
end $$ language plpgsql;

delete from msginfo;
truncate msg_folderinfo;
\i ishii-data.sql
reset client_min_messages;

select testit();

The results are not very stable, but I fairly often get "tuple offset out
of range" errors, and my last try resulted in an Assert failure in
autovacuum.  What I think is happening is that there's some inadequately
locked page-changing logic in there, and if you can get autovacuum to
happen on msginfo concurrently with the data reload step, it'll mess up.
Given that we've not seen reports of this on 8.3, I'll bet lunch that
the bug is somewhere in the "fast insert" code.  I haven't found it
yet, though.

BTW, I also did more analysis of Tatsuo's binary dump using this
function.  Of some 38000 distinct lexemes in the body_index column,
about 1500 trigger an error in testit(); the errors are either
"tuple offset out of range" or something like "could not open segment 1
of relation base/16384/16497 (target block 955254869): No such file or
directory".  There are about 440 distinct messages that show up,
suggesting that approximately that many pages of the index contain
corrupted TIDs.  Manual examination of the index is consistent with
this, and also indicates that all of the corrupt pages are in the second
half of the table.  That's not too hard to square with the
concurrent-update theory --- autovac might have already done about half
the index before a large data load started.  However it means that
corruption is *very* probable if data insertion is happening
concurrently with a vacuum, else not such a large number of pages would
have gotten messed up.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: blocking referencing system catalogs in 8.4 breaks my code
Следующее
От: Emmanuel Cecchet
Дата:
Сообщение: Unique namespace per session for temp tables?