Обсуждение: Storage Manager crash at mdwrite()
I'm implementing an extention to mdwrite() at backend/storage/smgr/md.c
When a block is written to the local storage using mdwrite(), I'm sending this block to an HDFS storage. 
 So far I don't need to read back the values I'm writing to HDFS. This approach is working fine in the initDB phase.
However, when I'm running postgres (bin/pg_ctl start), the first few write operations run successfully, and then suddenly (after writing exactly 3 files to HDFS), I get a 130 exit code with the following message showing the JVM thread dump of HDFS:
LOG:  background writer process (PID 29347) exited with exit code 130
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2012-03-15 13:27:52
Full thread dump OpenJDK Server VM (16.0-b13 mixed mode):
"IPC Client (47) connection to localhost/127.0.0.1:8020 from taljab1" daemon prio=10 tid=0x8994d400 nid=0x72e4 in Object.wait() [0x898ad000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x8a3c3050> (a org.apache.hadoop.ipc.Client$Connection)
 at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403)
 - locked <0x8a3c3050> (a org.apache.hadoop.ipc.Client$Connection)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:445)
"IPC Client (47) connection to localhost/127.0.0.1:8020 from taljab1" daemon prio=10 tid=0x89b87c00 nid=0x72e3 in Object.wait() [0x898fe000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x8a2ff268> (a org.apache.hadoop.ipc.Client$Connection)
 at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403)
 - locked <0x8a2ff268> (a org.apache.hadoop.ipc.Client$Connection)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:445)
"Low Memory Detector" daemon prio=10 tid=0x09daa400 nid=0x72cd runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE
"CompilerThread1" daemon prio=10 tid=0x09da8400 nid=0x72cc waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE
"CompilerThread0" daemon prio=10 tid=0x09da6000 nid=0x72cb waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x09da4800 nid=0x72ca waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x09d94800 nid=0x72c9 in Object.wait() [0x89db4000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x8a7202b0> (a java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133)
 - locked <0x8a7202b0> (a java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149)
 at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)
 This shows the JVM thread dump of HDFS.
This shows the JVM thread dump of HDFS.
"Reference Handler" daemon prio=10 tid=0x09d8fc00 nid=0x72c8 in Object.wait() [0x89e05000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x8a720338> (a java.lang.ref.Reference$Lock)
 at java.lang.Object.wait(Object.java:502)
 at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
 - locked <0x8a720338> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x09d3bc00 nid=0x72c6 runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE
"VM Thread" prio=10 tid=0x09d8d000 nid=0x72c7 runnable 
"VM Periodic Task Thread" prio=10 tid=0x09dac800 nid=0x72ce waiting on condition 
JNI global references: 763
Heap
 def new generation   total 4800K, used 1844K [0x8a270000, 0x8a7a0000, 0x94870000)
  eden space 4288K,  34% used [0x8a270000, 0x8a3e0418, 0x8a6a0000)
  from space 512K,  72% used [0x8a720000, 0x8a77cdd8, 0x8a7a0000)
  to   space 512K,   0% used [0x8a6a0000, 0x8a6a0000, 0x8a720000)
 tenured generation   total 10624K, used 0K [0x94870000, 0x952d0000, 0xa9470000)
   the space 10624K,   0% used [0x94870000, 0x94870000, 0x94870200, 0x952d0000)
 compacting perm gen  total 16384K, used 5765K [0xa9470000, 0xaa470000, 0xb1470000)
   the space 16384K,  35% used [0xa9470000, 0xa9a11480, 0xa9a11600, 0xaa470000)
No shared spaces configured.
This seems like an HDFS issue, but the same code worked properly in initDB(). I replaced this HDFS write code with a code that writes always the same block (empty one) to HDFS regardless from the value received by mdwrite().. Kept getting the same issue after writing 3 files.
I copied this exact code to a separate C application and ran it there and it executed without any problems (I wrote/deleted 100 files). That's why I'm doubting that it's something related to postgreSQL.
Any ideas on what's going wrong?
			
		On Thu, 2012-03-15 at 13:49 -0400, Tareq Aljabban wrote: > I'm implementing an extention to mdwrite() at > backend/storage/smgr/md.c > When a block is written to the local storage using mdwrite(), I'm > sending this block to an HDFS storage. > So far I don't need to read back the values I'm writing to HDFS. This > approach is working fine in the initDB phase. > However, when I'm running postgres (bin/pg_ctl start), the first few > write operations run successfully, and then suddenly (after writing > exactly 3 files to HDFS), I get a 130 exit code with the following > message showing the JVM thread dump of HDFS: > > > LOG: background writer process (PID 29347) exited with exit code 130 > LOG: terminating any other active server processes ... > This seems like an HDFS issue, but the same code worked properly in > initDB(). I replaced this HDFS write code with a code that writes > always the same block (empty one) to HDFS regardless from the value > received by mdwrite().. Kept getting the same issue after writing 3 > files. > I copied this exact code to a separate C application and ran it there > and it executed without any problems (I wrote/deleted 100 files). > That's why I'm doubting that it's something related to postgreSQL. > > > Any ideas on what's going wrong? What code, exactly, did you change in md.c, and anywhere else in postgres? Are you linking in new libraries/code from somewhere into the postgres backend? Regards,Jeff Davis
When configuring postgreSQL, I'm adding the libraries needed to run HDFS C API (libhdfs).
 ./configure --prefix=/diskless/myUser/Workspace/EclipseWS1/pgsql --enable-depend --enable-cassert --enable-debug CFLAGS="$CFLAGS -I/diskless/myUser/Workspace/HDFS_Append/hdfs/src/c++/libhdfs -I/usr/lib/jvm/default-java/include" LDFLAGS="$LDFLAGS -L/diskless/myUser/Workspace/HDFS_Append/hdfs/src/c++/libhdfs -L/diskless/myUser/Workspace/HDFS_Append/build/c++/Linux-i386-32/lib -L/usr/lib/jvm/default-java/jre/lib/i386/server -ljvm -lhdfs"
I have done lots of changes so far on how the storage manager works. In fact, I changed smgr.c so instead of calling regular md.c functions, that it would call my functions .
For simplicity, you can say that whenever mdwrite() was supposed to be called, another function is also called beside it. so now what is being called is:
mdwrite(param1, param2, buffer, ....);
hdfswrite(param1, param2, buffer, ....);
where hdfswrite() is the function where I write the buffer to HDFS.
I changed hdfswrite() so it will always write only the same (dummy) buffer down to HDFS storage. Let's say "dummyBufferFile". After writing this file 3 times to HDFS, I'm getting the message that I've shown in my first email.
The same hdfswrite() code works without any issues when I run it in a separate application.
Hope it's clear now.
On Thu, Mar 15, 2012 at 5:28 PM, Jeff Davis <pgsql@j-davis.com> wrote:
On Thu, 2012-03-15 at 13:49 -0400, Tareq Aljabban wrote:...
> I'm implementing an extention to mdwrite() at
> backend/storage/smgr/md.c
> When a block is written to the local storage using mdwrite(), I'm
> sending this block to an HDFS storage.
> So far I don't need to read back the values I'm writing to HDFS. This
> approach is working fine in the initDB phase.
> However, when I'm running postgres (bin/pg_ctl start), the first few
> write operations run successfully, and then suddenly (after writing
> exactly 3 files to HDFS), I get a 130 exit code with the following
> message showing the JVM thread dump of HDFS:
>
>
> LOG: background writer process (PID 29347) exited with exit code 130
> LOG: terminating any other active server processesWhat code, exactly, did you change in md.c, and anywhere else in
> This seems like an HDFS issue, but the same code worked properly in
> initDB(). I replaced this HDFS write code with a code that writes
> always the same block (empty one) to HDFS regardless from the value
> received by mdwrite().. Kept getting the same issue after writing 3
> files.
> I copied this exact code to a separate C application and ran it there
> and it executed without any problems (I wrote/deleted 100 files).
> That's why I'm doubting that it's something related to postgreSQL.
>
>
> Any ideas on what's going wrong?
postgres? Are you linking in new libraries/code from somewhere into the
postgres backend?
Regards,
Jeff Davis
On Thu, 2012-03-15 at 19:36 -0400, Tareq Aljabban wrote: > When configuring postgreSQL, I'm adding the libraries needed to run > HDFS C API (libhdfs). > >From the information below, it looks like C++. > > ./configure --prefix=/diskless/myUser/Workspace/EclipseWS1/pgsql > --enable-depend --enable-cassert --enable-debug CFLAGS="$CFLAGS > -I/diskless/myUser/Workspace/HDFS_Append/hdfs/src/c++/libhdfs > -I/usr/lib/jvm/default-java/include" LDFLAGS="$LDFLAGS > -L/diskless/myUser/Workspace/HDFS_Append/hdfs/src/c++/libhdfs > -L/diskless/myUser/Workspace/HDFS_Append/build/c++/Linux-i386-32/lib > -L/usr/lib/jvm/default-java/jre/lib/i386/server -ljvm -lhdfs" > > > > > > I have done lots of changes so far on how the storage manager works. > In fact, I changed smgr.c so instead of calling regular md.c > functions, that it would call my functions . > For simplicity, you can say that whenever mdwrite() was supposed to be > called, another function is also called beside it. so now what is > being called is: > mdwrite(param1, param2, buffer, ....); > hdfswrite(param1, param2, buffer, ....); > > > where hdfswrite() is the function where I write the buffer to HDFS. > I changed hdfswrite() so it will always write only the same (dummy) > buffer down to HDFS storage. Let's say "dummyBufferFile". After > writing this file 3 times to HDFS, I'm getting the message that I've > shown in my first email. > The same hdfswrite() code works without any issues when I run it in a > separate application. > > > Hope it's clear now. Well, it's clear that there's a lot going on ;) In other words, is there a reason you think that these would all work together nicely? I don't know the specifics, but I understand there are numerous perils to linking complex C++ code into complex C code, particularly around exception handling. Look in the archives for previous discussions around C++. Regards,Jeff Davis
On Thu, Mar 15, 2012 at 7:58 PM, Jeff Davis <pgsql@j-davis.com> wrote:
 
			
		On Thu, 2012-03-15 at 19:36 -0400, Tareq Aljabban wrote:>From the information below, it looks like C++.
> When configuring postgreSQL, I'm adding the libraries needed to run
> HDFS C API (libhdfs).
>Well, it's clear that there's a lot going on ;)
>
> ./configure --prefix=/diskless/myUser/Workspace/EclipseWS1/pgsql
> --enable-depend --enable-cassert --enable-debug CFLAGS="$CFLAGS
> -I/diskless/myUser/Workspace/HDFS_Append/hdfs/src/c++/libhdfs
> -I/usr/lib/jvm/default-java/include" LDFLAGS="$LDFLAGS
> -L/diskless/myUser/Workspace/HDFS_Append/hdfs/src/c++/libhdfs
> -L/diskless/myUser/Workspace/HDFS_Append/build/c++/Linux-i386-32/lib
> -L/usr/lib/jvm/default-java/jre/lib/i386/server -ljvm -lhdfs"
>
>
>
>
>
> I have done lots of changes so far on how the storage manager works.
> In fact, I changed smgr.c so instead of calling regular md.c
> functions, that it would call my functions .
> For simplicity, you can say that whenever mdwrite() was supposed to be
> called, another function is also called beside it. so now what is
> being called is:
> mdwrite(param1, param2, buffer, ....);
> hdfswrite(param1, param2, buffer, ....);
>
>
> where hdfswrite() is the function where I write the buffer to HDFS.
> I changed hdfswrite() so it will always write only the same (dummy)
> buffer down to HDFS storage. Let's say "dummyBufferFile". After
> writing this file 3 times to HDFS, I'm getting the message that I've
> shown in my first email.
> The same hdfswrite() code works without any issues when I run it in a
> separate application.
>
>
> Hope it's clear now.
In other words, is there a reason you think that these would all work
together nicely?
I don't know the specifics, but I understand there are numerous perils
to linking complex C++ code into complex C code, particularly around
exception handling. Look in the archives for previous discussions around
C++.
Thanks for your response Jeff.. 
It's true that libhdfs code resides under the c++ folder, but in all of the documentation, libhdfs is referred to as the C interface of HDFS.
Now what you're saying makes sense.. that nothing guarantees this will work well.. but in the phase of initDB, the function calls are done nicely without any exceptions.. when starting the postmaster, something wrong happens after 3 calls to libhdfs.. that's what I'm confused about..
What it's the difference between the two processes (initDB, start postmaster), that might cause this crash to happen?
Regards 
On Fri, 2012-03-16 at 18:02 -0400, Tareq Aljabban wrote: > Thanks for your response Jeff.. > It's true that libhdfs code resides under the c++ folder, but in all > of the documentation, libhdfs is referred to as the C interface of > HDFS. > Now what you're saying makes sense.. that nothing guarantees this will > work well.. but in the phase of initDB, the function calls are done > nicely without any exceptions.. when starting the postmaster, > something wrong happens after 3 calls to libhdfs.. that's what I'm > confused about.. > What it's the difference between the two processes (initDB, start > postmaster), that might cause this crash to happen? There is a lot of difference between those two. In particular, it looks like the problem you are seeing is coming from the background writer, which is not running during initdb. I think the next step would be for you to compile in debug mode with -O0, and attach to the bgwriter process with gdb, and step through it. Then, you can see the exact path which causes the bgwriter to exit, and that will give you a better idea where the problem is. Regards,Jeff Davis
On Fri, Mar 16, 2012 at 11:29 PM, Jeff Davis <pgsql@j-davis.com> wrote: > There is a lot of difference between those two. In particular, it looks > like the problem you are seeing is coming from the background writer, > which is not running during initdb. The difference that comes to mind is that the postmaster forks. If the library opens any connections prior to forking and then uses them after forking that would work at first but it would get confused quickly once more than one backend tries to use the same connection. The data being sent would all be mixed together and they would see responses to requests other processes sent. You need to ensure that any network connections are opened up *after* the new processes are forked. There are other things going on that could cause problems. initdb probably doesn't deal with many errors so it might not be casing any longjmps that happen when Postgres handles errors. I suspect it doesn't create any temporary files, etc. There's also a long history of threads not playing well with Postgres. I think that's overblown and I believe they should work fine. Most of it was caused by a bug in an old version of libc. But you do have to ensure that the other threads don't call any Postgres functions because the Postgres code is very much not thread-aware. -- greg
On Fri, Mar 16, 2012 at 8:34 PM, Greg Stark <stark@mit.edu> wrote:
 
 
On Fri, Mar 16, 2012 at 11:29 PM, Jeff Davis <pgsql@j-davis.com> wrote:The difference that comes to mind is that the postmaster forks. If the
> There is a lot of difference between those two. In particular, it looks
> like the problem you are seeing is coming from the background writer,
> which is not running during initdb.
library opens any connections prior to forking and then uses them
after forking that would work at first but it would get confused
quickly once more than one backend tries to use the same connection.
The data being sent would all be mixed together and they would see
responses to requests other processes sent.
The opened connections thing was the first thing I thought of.. the HDFS documentation claims the C interface is thread-safe..
However, since I noticed that different processes (having different process IDs) are calling the mdwrite, then it might be a possibility.
I tried reconnecting to HDFS on every mdwrite request, but it didn't work out.
You need to ensure that any network connections are opened up *after*
the new processes are forked.
There are other things going on that could cause problems. initdb
probably doesn't deal with many errors so it might not be casing any
longjmps that happen when Postgres handles errors. I suspect it
doesn't create any temporary files, etc.
There's also a long history of threads not playing well with Postgres.
I think that's overblown and I believe they should work fine. Most of
it was caused by a bug in an old version of libc. But you do have to
ensure that the other threads don't call any Postgres functions
because the Postgres code is very much not thread-aware.
--
greg
I'm novice in PG, and if I got this one running then I'll have achieved what I wanted to do without further digging in PG. So I was thinking if there was a configuration option (or something similar) that will eliminate (or reduce) the role the background writer process.. I believe it can be one of the WAL options but I'm not sure.. Any suggestions?
On Fri, Mar 16, 2012 at 8:34 PM, Greg Stark <stark@mit.edu> wrote:
			
		On Fri, Mar 16, 2012 at 11:29 PM, Jeff Davis <pgsql@j-davis.com> wrote:The difference that comes to mind is that the postmaster forks. If the
> There is a lot of difference between those two. In particular, it looks
> like the problem you are seeing is coming from the background writer,
> which is not running during initdb.
library opens any connections prior to forking and then uses them
after forking that would work at first but it would get confused
quickly once more than one backend tries to use the same connection.
The data being sent would all be mixed together and they would see
responses to requests other processes sent.
You need to ensure that any network connections are opened up *after*
the new processes are forked.
It's true.. it turned out that the reason of the problem is that HDFS has problems when dealing with forked processes.. However, there's no clear suggestion on how to fix this.
I attached gdb to the writer process and got the following backtrace:
I attached gdb to the writer process and got the following backtrace:
#0  0xb76f0430 in __kernel_vsyscall ()
#1  0xb6b2893d in ___newselect_nocancel () at ../sysdeps/unix/syscall-template.S:82
#2  0x0840ab46 in pg_usleep (microsec=200000) at pgsleep.c:43
#3  0x0829ca9a in BgWriterNap () at bgwriter.c:642
#4  0x0829c882 in BackgroundWriterMain () at bgwriter.c:540
#5  0x0811b0ec in AuxiliaryProcessMain (argc=2, argv=0xbf982308) at bootstrap.c:417
#6  0x082a9af1 in StartChildProcess (type=BgWriterProcess) at postmaster.c:4427
#7  0x082a75de in reaper (postgres_signal_arg=17) at postmaster.c:2390
#8  <signal handler called>
 #9  0xb76f0430 in __kernel_vsyscall ()
#10 0xb6b2893d in ___newselect_nocancel () at ../sysdeps/unix/syscall-template.S:82
#11 0x082a5b62 in ServerLoop () at postmaster.c:1391
#12 0x082a53e2 in PostmasterMain (argc=3, argv=0xa525c28) at postmaster.c:1092
#13 0x0822dfa8 in main (argc=3, argv=0xa525c28) at main.c:188
Any ideas?