Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create().

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create().
Дата
Msg-id 20170628230458.n5ehizmvhoerr5yq@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create().  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create().  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
On 2017-06-28 19:07:50 +1200, Thomas Munro wrote:
> I think this line is saying that it won't restart automatically:
> 
> https://github.com/torvalds/linux/blob/590dce2d4934fb909b112cd80c80486362337744/mm/shmem.c#L2884

Indeed.


> So I think we either need to mask signals with or put in an explicit
> retry loop, as shown in the attached version of the patch.  With the
> v3 patch I posted earlier, I see interrupted system call failures in
> the select_parallel regression test, but with the v4 it passes.
> Thoughts?

I think a retry loop is a lot better than blocking signals.


> > Unfounded speculation: fallocate() might actually *improve*
> > performance of DSM segments if your access pattern involves random
> > access (just to pick an example out of the air, something like...
> > building a hash table), since it's surely easier to allocate a big
> > contiguous chunk than a squillion random pages most of which divide an
> > existing hole into two smaller holes...
> 
> Bleugh... I retract this, of course we initialise the hash table in
> order anyway so this doesn't make any sense.

It's still faster to create larger mappings at once, rather than through
subsequent page faults...


> diff --git a/configure.in b/configure.in
> index 11eb9c8acfc..47452bbac43 100644
> --- a/configure.in
> +++ b/configure.in
> @@ -1429,7 +1429,7 @@ PGAC_FUNC_WCSTOMBS_L
>  LIBS_including_readline="$LIBS"
>  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
>  
> -AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat
pthread_is_threaded_npreadlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs
wcstombs_l])
> +AC_CHECK_FUNCS([cbrt clock_gettime dlopen fallocate fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove
pollpstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes
wcstombswcstombs_l])
 

Why are we going for fallocate rather than posix_fallocate()? There's
plenty use cases that can only be done with the former, but this doesn't
seem like one of them?

Currently this is a linux specific path - but I don't actually see any
reason to keep it that way? It seems far from unlikely that this is just
an issue with linux...

>
>  #ifdef USE_DSM_POSIX
>  /*
> + * Set the size of a virtual memory region associate with a file descriptor.
> + * On Linux, also ensure that virtual memory is actually allocated by the
> + * operating system to avoid nasty surprises later.
> + *
> + * Returns non-zero if either truncation or allocation fails, and sets errno.
> + */
> +static int
> +dsm_impl_posix_resize(int fd, off_t size)
> +{
> +        int rc;
> +
> +        /* Truncate (or extend) the file to the requested size. */
> +        rc = ftruncate(fd, size);
> +
> +#ifdef HAVE_FALLOCATE
> +#ifdef __linux__
> +        /*
> +         * On Linux, a shm_open fd is backed by a tmpfs file.  After resizing
> +         * with ftruncate it may contain a hole.  Accessing memory backed by a
> +         * hole causes tmpfs to allocate pages, which fails with SIGBUS if
> +         * there is no virtual memory available.  So we ask tmpfs to allocate
> +         * pages here, so we can fail gracefully with ENOSPC now rather than
> +         * risking SIGBUS later.
> +         */
> +        if (rc == 0)
> +        {
> +            do
> +            {
> +                rc = fallocate(fd, 0, 0, size);
> +            } while (rc == -1 && errno == EINTR);
> +            if (rc != 0 && errno == ENOSYS)
> +            {
> +                /* Kernel too old (< 2.6.23). */
> +                rc = 0;
> +            }
> +        }
> +#endif
> +#endif

I'd rather fall-back to just write() initializing the buffer, and do
either of those in all implementations rather than just linux.

> diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
> index 7a05c7e5b85..dcb9a846c7b 100644
> --- a/src/include/pg_config.h.in
> +++ b/src/include/pg_config.h.in
> @@ -167,6 +167,9 @@
>  /* Define to 1 if you have the <editline/readline.h> header file. */
>  #undef HAVE_EDITLINE_READLINE_H
>  
> +/* Define to 1 if you have the `fallocate' function. */
> +#undef HAVE_FALLOCATE
> +
>  /* Define to 1 if you have the `fdatasync' function. */
>  #undef HAVE_FDATASYNC

Needs pg_config.h.win32 adjustment...

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: [HACKERS] Race conditions with WAL sender PID lookups
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create().