Обсуждение: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing

Поиск
Список
Период
Сортировка

[HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing

От
Andreas Seltenreich
Дата:
Hi,

doing low-memory testing with REL_10_STABLE at 1f19550a87 also produced
a couple of parallel worker core dumps with the backtrace below.
Although most of the backtrace is inside the dynamic linker, it looks
like it was passed a pointer to gone-away shared memory.

regards,
Andreas

Core was generated by `postgres: bgworker: parallel worker for PID 24326                '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x00007f5184852a36 in fillin_rpath (rpath=<optimized out>, rpath@entry=0x55b692f0d360
"/home/smith/postgres/inst/master/lib",result=result@entry=0x55b692f1b380, sep=sep@entry=0x7f5184868060 ":",
check_trusted=check_trusted@entry=0,what=what@entry=0x7f51848683bd "RUNPATH", where=where@entry=0x55b692f2d2f0
"/home/smith/postgres/inst/master/lib/pgcrypto.so",l=0x55b692f2d330) at dl-load.c:444
 
#2  0x00007f5184852daf in decompose_rpath (sps=sps@entry=0x55b692f2d6d8, rpath=<optimized out>,
l=l@entry=0x55b692f2d330,what=what@entry=0x7f51848683bd "RUNPATH") at dl-load.c:618
 
#3  0x00007f5184852ef7 in cache_rpath (l=l@entry=0x55b692f2d330, sp=sp@entry=0x55b692f2d6d8, tag=tag@entry=29,
what=what@entry=0x7f51848683bd"RUNPATH") at dl-load.c:652
 
#4  0x00007f5184853c62 in cache_rpath (what=0x7f51848683bd "RUNPATH", tag=29, sp=0x55b692f2d6d8, l=0x55b692f2d330) at
dl-load.c:2307
#5  _dl_map_object (loader=0x55b692f2d330, name=0x7f517f300cc3 "libz.so.1", type=2, trace_mode=0, mode=<optimized out>,
nsid=<optimizedout>) at dl-load.c:2314
 
#6  0x00007f5184857e70 in openaux (a=a@entry=0x7ffd4f686130) at dl-deps.c:63
#7  0x00007f518485a4f4 in _dl_catch_error (objname=objname@entry=0x7ffd4f686128,
errstring=errstring@entry=0x7ffd4f686120,mallocedp=mallocedp@entry=0x7ffd4f68611f, operate=operate@entry=0x7f5184857e40
<openaux>,args=args@entry=0x7ffd4f686130) at dl-error.c:187
 
#8  0x00007f51848580df in _dl_map_object_deps (map=map@entry=0x55b692f2d330, preloads=preloads@entry=0x0,
npreloads=npreloads@entry=0,trace_mode=trace_mode@entry=0, open_mode=open_mode@entry=-2147483648) at dl-deps.c:254
 
#9  0x00007f518485ea02 in dl_open_worker (a=a@entry=0x7ffd4f6863c0) at dl-open.c:280
#10 0x00007f518485a4f4 in _dl_catch_error (objname=objname@entry=0x7ffd4f6863b0,
errstring=errstring@entry=0x7ffd4f6863b8,mallocedp=mallocedp@entry=0x7ffd4f6863af, operate=operate@entry=0x7f518485e8f0
<dl_open_worker>,args=args@entry=0x7ffd4f6863c0) at dl-error.c:187
 
#11 0x00007f518485e489 in _dl_open (file=0x55b692f2d2b0 "/home/smith/postgres/inst/master/lib/pgcrypto.so",
mode=-2147483390,caller_dlopen=0x55b691cb4c7e <internal_load_library+286>, nsid=-2, argc=<optimized out>,
argv=<optimizedout>, env=0x55b692eef880) at dl-open.c:660
 
#12 0x00007f5184020ee9 in dlopen_doit (a=a@entry=0x7ffd4f6865f0) at dlopen.c:66
#13 0x00007f518485a4f4 in _dl_catch_error (objname=0x55b692eef6d0, errstring=0x55b692eef6d8, mallocedp=0x55b692eef6c8,
operate=0x7f5184020e90<dlopen_doit>, args=0x7ffd4f6865f0) at dl-error.c:187
 
#14 0x00007f5184021521 in _dlerror_run (operate=operate@entry=0x7f5184020e90 <dlopen_doit>,
args=args@entry=0x7ffd4f6865f0)at dlerror.c:163
 
#15 0x00007f5184020f82 in __dlopen (file=<optimized out>, mode=mode@entry=258) at dlopen.c:87
#16 0x000055b691cb4c7e in internal_load_library (libname=libname@entry=0x7f51848be7f8 <error: Cannot access memory at
address0x7f51848be7f8>) at dfmgr.c:231
 
#17 0x000055b691cb5928 in RestoreLibraryState (start_address=0x7f51848be7f8 <error: Cannot access memory at address
0x7f51848be7f8>)at dfmgr.c:754
 
#18 0x000055b6919459d9 in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1030
#19 0x000055b691b23746 in StartBackgroundWorker () at bgworker.c:835
#20 0x000055b691b2faf5 in do_start_bgworker (rw=0x55b692f0e050) at postmaster.c:5680
#21 maybe_start_bgworkers () at postmaster.c:5884
#22 0x000055b691b305c8 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5073
#23 <signal handler called>
#24 0x00007f5183a5f273 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84
#25 0x000055b6918b8c0b in ServerLoop () at postmaster.c:1717
#26 0x000055b691b31c65 in PostmasterMain (argc=3, argv=0x55b692eea5f0) at postmaster.c:1361
#27 0x000055b6918bac4d in main (argc=3, argv=0x55b692eea5f0) at main.c:228


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState duringlow-memory testing

От
Amit Kapila
Дата:
On Tue, Oct 3, 2017 at 3:04 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> Hi,
>
> doing low-memory testing with REL_10_STABLE at 1f19550a87 also produced
> a couple of parallel worker core dumps with the backtrace below.
> Although most of the backtrace is inside the dynamic linker, it looks
> like it was passed a pointer to gone-away shared memory.
>

It appears to be some dangling pointer, but not sure how it is
possible.  Can you provide some more details, like do you have any
other library which you want to get loaded in the backend (like by
using shared_preload_libraries or by some other way)?  I think without
that we shouldn't try to load anything in the parallel worker.  Also,
if you can get the failed query (check in server log), it would be
great.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState duringlow-memory testing

От
Amit Kapila
Дата:
On Tue, Oct 3, 2017 at 8:31 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Oct 3, 2017 at 3:04 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
>> Hi,
>>
>> doing low-memory testing with REL_10_STABLE at 1f19550a87 also produced
>> a couple of parallel worker core dumps with the backtrace below.
>> Although most of the backtrace is inside the dynamic linker, it looks
>> like it was passed a pointer to gone-away shared memory.
>>
>
> It appears to be some dangling pointer, but not sure how it is
> possible.  Can you provide some more details, like do you have any
> other library which you want to get loaded in the backend (like by
> using shared_preload_libraries or by some other way)?  I think without
> that we shouldn't try to load anything in the parallel worker.
>

Another possibility could be that the memory for library space has
been overwritten either in master backend or in worker backend.  I
think that is possible in low-memory conditions if in someplace we try
to write in the memory without ensuring if space is allocated.  I have
browsed the nearby code and didn't find any such instance.  One idea
to narrow down the problem is to see if the other members in worker
backend are sane, for ex. can you try printing the value of
MyFixedParallelState as we get that value from shared memory similar
to libraryspace.  It seems from call stack that the memory of
libraryspace is corrupted, so we can move the call to
lookup/RestoreLibraryState immediately after we assign
MyFixedParallelState.  I think if after this also the memory for
libraryspace is corrupted, then probably something bad has happened in
master backend.

Any other ideas?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing

От
Tom Lane
Дата:
Amit Kapila <amit.kapila16@gmail.com> writes:
> Any other ideas?

Given that the crash is so far down inside __dlopen(), and that there's
a clear reference to the string we presumably passed to that:

#11 0x00007f518485e489 in _dl_open (file=0x55b692f2d2b0 "/home/smith/postgres/inst/master/lib/pgcrypto.so",
mode=-2147483390,caller_dlopen=0x55b691cb4c7e < 

I don't actually believe that this is Postgres' fault.  I suspect that
what we're looking at here is a low-memory bug in dlopen itself, probably
something strdup'ing an input string and forgetting to check for a null
result.

Presumably somebody could dig into the libc source code and prove or
disprove this, though it would sure help to know exactly what platform
and version Andreas is testing on.
        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing

От
Andreas Seltenreich
Дата:
Tom Lane writes:

> Presumably somebody could dig into the libc source code and prove or
> disprove this, though it would sure help to know exactly what platform
> and version Andreas is testing on.

This is the code in glibc-2.24 around the crash site:

,----[ glibc-2.24/elf/dl-load.c:442 ]
|       to_free = cp = expand_dynamic_string_token (l, cp, 1);
|
|       size_t len = strlen (cp);
`----

…while expand_dynamic_string_token will indeed return NULL on a failed
malloc.  Code in the most recent glibc looks the same, so I'll carry
this issue over to the glibc bugzilla then.

Sorry about the noise…
Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState duringlow-memory testing

От
Robert Haas
Дата:
On Tue, Oct 3, 2017 at 3:04 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> Tom Lane writes:
>> Presumably somebody could dig into the libc source code and prove or
>> disprove this, though it would sure help to know exactly what platform
>> and version Andreas is testing on.
>
> This is the code in glibc-2.24 around the crash site:
>
> ,----[ glibc-2.24/elf/dl-load.c:442 ]
> |       to_free = cp = expand_dynamic_string_token (l, cp, 1);
> |
> |       size_t len = strlen (cp);
> `----
>
> …while expand_dynamic_string_token will indeed return NULL on a failed
> malloc.  Code in the most recent glibc looks the same, so I'll carry
> this issue over to the glibc bugzilla then.

You know, I was pretty impressed with sqlsmith when it was only
finding bugs in PostgreSQL.  Finding bugs in glibc is even more
impressive.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers