Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create().

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create().
Дата
Msg-id CAEepm=2GAgjcnLGCHOCVTD04eVR+rP+8_-6Z4c391-AFo1sL7g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create().  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create().  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Wed, Jun 28, 2017 at 5:19 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Aug 24, 2016 at 2:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Now, for bigger segment sizes, I think there actually could be a
>> little bit of a noticeable performance hit here, because it's not just
>> about total elapsed time.  Even if the code eventually touches all of
>> the memory, it might not touch it all before starting to fire up
>> workers or whatever else it wants to do with the DSM segment.  But I'm
>> thinking we still need to bite the bullet and pay the expense, because
>> crash-and-restart cycles are *really* bad.
>
> Here is a new rebased version of this patch, primarily to poke this
> thread as an unresolved question.  This patch is not committable as is
> though: I discovered that parallel query can cause fallocate to return
> with errno == EINTR.  I haven't yet investigated whether fallocate is
> supposed to be restartable, or signals should be blocked, or something
> else is wrong.  Another question is whether the call to ftruncate() is
> actually necessary before the call to fallocate().

I think this line is saying that it won't restart automatically:

https://github.com/torvalds/linux/blob/590dce2d4934fb909b112cd80c80486362337744/mm/shmem.c#L2884

Compare this patch (not in the kernel tree) that suggests that line
should be changed to cause restart:

https://lkml.org/lkml/2016/3/3/987

- error = -EINTR;
+ error = -ERESTARTSYS;

So I think we either need to mask signals with or put in an explicit
retry loop, as shown in the attached version of the patch.  With the
v3 patch I posted earlier, I see interrupted system call failures in
the select_parallel regression test, but with the v4 it passes.
Thoughts?

> Unfounded speculation: fallocate() might actually *improve*
> performance of DSM segments if your access pattern involves random
> access (just to pick an example out of the air, something like...
> building a hash table), since it's surely easier to allocate a big
> contiguous chunk than a squillion random pages most of which divide an
> existing hole into two smaller holes...

Bleugh... I retract this, of course we initialise the hash table in
order anyway so this doesn't make any sense.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Rahila Syed
Дата:
Сообщение: Re: [HACKERS] Default Partition for Range
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: [HACKERS] psql - add special variable to reflect the last querystatus