Re: Direct I/O

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Direct I/O
Дата
Msg-id 20230408212337.t2uua7lfo6qcjfge@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Direct I/O  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Direct I/O  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Direct I/O  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers
Hi,

On 2023-04-08 17:10:19 -0400, Tom Lane wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> Now crake is doing this:
> 
> 2023-04-08 16:50:03.177 EDT [2023-04-08 16:50:03 EDT 3257645:3] 004_io_direct.pl LOG:  statement: select count(*)
fromt1
 
> 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:1] ERROR:  invalid page in block 56 of relation
base/5/16384
> 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:2] STATEMENT:  select count(*) from t1
> 2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:4] 004_io_direct.pl ERROR:  invalid page in block 56 of
relationbase/5/16384
 
> 2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:5] 004_io_direct.pl STATEMENT:  select count(*) from t1
> 2023-04-08 16:50:03.319 EDT [2023-04-08 16:50:02 EDT 3257591:4] LOG:  background worker "parallel worker" (PID
3257646)exited with exit code 1
 
> 
> The fact that the error is happening in a parallel worker seems
> interesting ...

There were a few prior instances of that error. One that I hadn't seen before
is this:

[11:35:07.190](0.001s) #   Failed test 'read back from shared'
#   at /home/andrew/bf/root/HEAD/pgsql/src/test/modules/test_misc/t/004_io_direct.pl line 43.
[11:35:07.190](0.000s) #          got: '10000'
#     expected: '10098'

For one it points to the arguments to is() being switched around, but that's a
sideshow.


> (BTW, why are the log lines doubly timestamped?)

It's odd.

It's also odd that it's just crake having the issue. It's just a linux host,
afaics. Andrew, is there any chance you can run that test in isolation and see
whether it reproduces? If so, does the problem vanish, if you comment out the
io_direct= in the test? Curious whether this is actually an O_DIRECT issue, or
whether it's an independent issue exposed by the new test.


I wonder if we should make the test use data checksum - if we continue to see
the wrong query results, the corruption is more likely to be in memory.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: Direct I/O
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Direct I/O