Обсуждение: errors on restoring postgresql binary dump to glusterfs
Hi There, While trying to restore a ~700GM binary dump by command pg_restore -d dbdata < sampledbdata-20120327.pgdump I encountered following errors repeatedly pg_restore: [archiver (db)] Error from TOC entry 2882463; 2613 10267347 BLOB 10267347 sdmcleod pg_restore: [archiver (db)] could not execute query: ERROR: unexpected data beyond EOF in block 500 of relation base/16386/11743 HINT: This has been seen to occur with buggy kernels; consider updating your system. Command was: SELECT pg_catalog.lo_create('10267347'); pg_restore: [archiver (db)] could not execute query: ERROR: large object 10267347 does not exist Command was: ALTER LARGE OBJECT 10267347 OWNER TO sdmcleod; pg_restore: [archiver (db)] Error from TOC entry 2882464; 2613 10267348 BLOB 10267348 sdmcleod pg_restore: [archiver (db)] could not execute query: ERROR: unexpected data beyond EOF in block 500 of relation base/16386/11743 HINT: This has been seen to occur with buggy kernels; consider updating your system. Command was: SELECT pg_catalog.lo_create('10267348'); pg_restore: [archiver (db)] could not execute query: ERROR: large object 10267348 does not exist Command was: ALTER LARGE OBJECT 10267348 OWNER TO sdmcleod; ...... ...... pg_restore: [archiver (db)] Error from TOC entry 53398; 0 16503 TABLE DATA l1aaux_sci sdmcleod pg_restore: [archiver (db)] COPY failed for table "l1aaux_sci": ERROR: unexpected data beyond EOF in block 9391 of relation base/16386/17043 HINT: This has been seen to occur with buggy kernels; consider updating your system. CONTEXT: COPY l1aaux_sci, line 319329: "1854661 \N 1.05156717906094999 1378796678.44843268 2012-02-01 07:04:39.5+00 2012-02-01 07:04:38.4484..." pg_restore: [archiver (db)] Error from TOC entry 53399; 0 16528 TABLE DATA l1afts_dbl sdmcleod pg_restore: [archiver (db)] COPY failed for table "l1afts_dbl": ERROR: unexpected data beyond EOF in block 10097 of relation base/16386/17068 HINT: This has been seen to occur with buggy kernels; consider updating your system. CONTEXT: COPY l1afts_dbl, line 454411: "459755 2012-03-23 05:31:02.185562+00 ace.sr45190 52867958 299 2591429 FTS 1.1.0 1376321941.75799..." The server runs Ubuntu server 10.04 LTS with postgresql upgraded to version 9.1.3-1~lucid. The postgresql data directory is located in a glusterfs mounted directory to a replicated volume vol-2 192.168.244.101:/vol-2 5731222400 3041313920 2398779136 56% /mnt/gluster-2 Here is the gluster info for vol-2: Volume Name: vol-2 Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 192.168.244.101:/data/glbrk-2 Brick2: 192.168.244.102:/data/glbrk-2 The version of glusterfs is 3.2.6. I think this may have someting to do with glusterfs, because when I restore the same dump to a same ubuntu 10.04 server with postgresql upgraded to the same 9.1.3-1~lucid located in a local ext4 filesystem, the pg_restore went well without a single error. Has anyone seen something similar before? Thank you in advance. Liang Ma
On Mon, Apr 30, 2012 at 8:34 PM, Liang Ma <ma.satops@gmail.com> wrote: > Hi There, > > While trying to restore a ~700GM binary dump by command > > pg_restore -d dbdata < sampledbdata-20120327.pgdump > > I encountered following errors repeatedly > > pg_restore: [archiver (db)] Error from TOC entry 2882463; 2613 > 10267347 BLOB 10267347 sdmcleod > pg_restore: [archiver (db)] could not execute query: ERROR: > unexpected data beyond EOF in block 500 of relation base/16386/11743 > HINT: This has been seen to occur with buggy kernels; consider > updating your system. Note the message right here... There may be further indications in the server log about what's wrong. > The server runs Ubuntu server 10.04 LTS with postgresql upgraded to > version 9.1.3-1~lucid. The postgresql data directory is located in a > glusterfs mounted directory to a replicated volume vol-2 I assume you don't have more than one node actually *accessing* the data directory at the same time, right? Even with that said, I haven't heard of anybody running PostgreSQL on glusterfs, and I'm not sure it fulfills the basic requirements that PostgreSQL has on a filesystem. In particular, the messages above about a buggy kernel certainly indicates that there is a problem with the filesystem. > I think this may have someting to do with glusterfs, because when I > restore the same dump to a same ubuntu 10.04 server with postgresql > upgraded to the same 9.1.3-1~lucid located in a local ext4 filesystem, > the pg_restore went well without a single error. Yes, it certainly sounds like that. You probably need to bring it up with the glusterfs folks... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Hi Magnus, Thank you for answering my post. Please see comments on your answer below. On Fri, May 4, 2012 at 3:58 AM, Magnus Hagander <magnus@hagander.net> wrote: > On Mon, Apr 30, 2012 at 8:34 PM, Liang Ma <ma.satops@gmail.com> wrote: >> Hi There, >> >> While trying to restore a ~700GM binary dump by command >> >> pg_restore -d dbdata < sampledbdata-20120327.pgdump >> >> I encountered following errors repeatedly >> >> pg_restore: [archiver (db)] Error from TOC entry 2882463; 2613 >> 10267347 BLOB 10267347 sdmcleod >> pg_restore: [archiver (db)] could not execute query: ERROR: >> unexpected data beyond EOF in block 500 of relation base/16386/11743 >> HINT: This has been seen to occur with buggy kernels; consider >> updating your system. > > Note the message right here... > > There may be further indications in the server log about what's wrong. > The server's logs in message file were clean. >> The server runs Ubuntu server 10.04 LTS with postgresql upgraded to >> version 9.1.3-1~lucid. The postgresql data directory is located in a >> glusterfs mounted directory to a replicated volume vol-2 > > I assume you don't have more than one node actually *accessing* the > data directory at the same time, right? > Yes, you are right. I just set up this glusterfs and postgresql server with two nodes for testing purpose. There was no other gluster filesystem access activity at the time I tried to restore the postgresql dump. Do you know if postgresql recommends any other cluster filesystem, or it may not like cluster filesystem at all? > Even with that said, I haven't heard of anybody running PostgreSQL on > glusterfs, and I'm not sure it fulfills the basic requirements that > PostgreSQL has on a filesystem. In particular, the messages above > about a buggy kernel certainly indicates that there is a problem with > the filesystem. > >> I think this may have someting to do with glusterfs, because when I >> restore the same dump to a same ubuntu 10.04 server with postgresql >> upgraded to the same 9.1.3-1~lucid located in a local ext4 filesystem, >> the pg_restore went well without a single error. > > Yes, it certainly sounds like that. You probably need to bring it up > with the glusterfs folks... > I posted to glusterfs mailing list at the same time but haven't got any feedback yet. I think it is more likely related to glusterfs, but would like to know if any other postgresql users have similar experience or ideas. > -- > Magnus Hagander > Me: http://www.hagander.net/ > Work: http://www.redpill-linpro.com/ Thanks. Liang
On Mon, May 7, 2012 at 5:02 PM, Liang Ma <ma.satops@gmail.com> wrote: > On Fri, May 4, 2012 at 3:58 AM, Magnus Hagander <magnus@hagander.net> wrote: >> On Mon, Apr 30, 2012 at 8:34 PM, Liang Ma <ma.satops@gmail.com> wrote: >>> Hi There, >>> >>> While trying to restore a ~700GM binary dump by command >>> >>> pg_restore -d dbdata < sampledbdata-20120327.pgdump >>> >>> I encountered following errors repeatedly >>> >>> pg_restore: [archiver (db)] Error from TOC entry 2882463; 2613 >>> 10267347 BLOB 10267347 sdmcleod >>> pg_restore: [archiver (db)] could not execute query: ERROR: >>> unexpected data beyond EOF in block 500 of relation base/16386/11743 >>> HINT: This has been seen to occur with buggy kernels; consider >>> updating your system. >> >> Note the message right here... >> >> There may be further indications in the server log about what's wrong. >> > > The server's logs in message file were clean. Then your logging is incorrectly configured, because it should *at least* have the same message as the one that showed up in the client. >>> The server runs Ubuntu server 10.04 LTS with postgresql upgraded to >>> version 9.1.3-1~lucid. The postgresql data directory is located in a >>> glusterfs mounted directory to a replicated volume vol-2 >> >> I assume you don't have more than one node actually *accessing* the >> data directory at the same time, right? >> > > Yes, you are right. I just set up this glusterfs and postgresql server > with two nodes for testing purpose. There was no other gluster > filesystem access activity at the time I tried to restore the > postgresql dump. Do you know if postgresql recommends any other > cluster filesystem, or it may not like cluster filesystem at all? Did you have PostgreSQL started on both nodes? That is *not* supported. If PostgreSQL only runs on one node at a time it should in theory work, provided the cluster filesystem provides all the services that a normal filesystem does, such as respecting fsync. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Mon, May 7, 2012 at 12:54 PM, Magnus Hagander <magnus@hagander.net> wrote: > On Mon, May 7, 2012 at 5:02 PM, Liang Ma <ma.satops@gmail.com> wrote: >> On Fri, May 4, 2012 at 3:58 AM, Magnus Hagander <magnus@hagander.net> wrote: >>> On Mon, Apr 30, 2012 at 8:34 PM, Liang Ma <ma.satops@gmail.com> wrote: >>>> Hi There, >>>> >>>> While trying to restore a ~700GM binary dump by command >>>> >>>> pg_restore -d dbdata < sampledbdata-20120327.pgdump >>>> >>>> I encountered following errors repeatedly >>>> >>>> pg_restore: [archiver (db)] Error from TOC entry 2882463; 2613 >>>> 10267347 BLOB 10267347 sdmcleod >>>> pg_restore: [archiver (db)] could not execute query: ERROR: >>>> unexpected data beyond EOF in block 500 of relation base/16386/11743 >>>> HINT: This has been seen to occur with buggy kernels; consider >>>> updating your system. >>> >>> Note the message right here... >>> >>> There may be further indications in the server log about what's wrong. >>> >> >> The server's logs in message file were clean. > > Then your logging is incorrectly configured, because it should *at > least* have the same message as the one that showed up in the client. > Oh, yes, the same error messages were logged in the postgresql log file but no further information. I thought you implied that there may be some indication in server's system logs, which I couldn't find any. > >>>> The server runs Ubuntu server 10.04 LTS with postgresql upgraded to >>>> version 9.1.3-1~lucid. The postgresql data directory is located in a >>>> glusterfs mounted directory to a replicated volume vol-2 >>> >>> I assume you don't have more than one node actually *accessing* the >>> data directory at the same time, right? >>> >> >> Yes, you are right. I just set up this glusterfs and postgresql server >> with two nodes for testing purpose. There was no other gluster >> filesystem access activity at the time I tried to restore the >> postgresql dump. Do you know if postgresql recommends any other >> cluster filesystem, or it may not like cluster filesystem at all? > > > Did you have PostgreSQL started on both nodes? That is *not* > supported. If PostgreSQL only runs on one node at a time it should in > theory work, provided the cluster filesystem provides all the services > that a normal filesystem does, such as respecting fsync. > Postgresql are installed in both nodes, but only one node's postgresql data directory points to glusterfs filesystem. Another one's data directory is in its default location in the local ext4 filesystem. This is the one I used to prove the dump file can be restored without any problem when glusterfs is not involved. According to its introduction and document, glusterfs is supposed to appear as a normal filesystem when being mounted, although I don't know how well it respects things like fsync. > -- > Magnus Hagander > Me: http://www.hagander.net/ > Work: http://www.redpill-linpro.com/ Liang
On Mon, May 7, 2012 at 7:34 PM, Liang Ma <ma.satops@gmail.com> wrote: > On Mon, May 7, 2012 at 12:54 PM, Magnus Hagander <magnus@hagander.net> wrote: >> On Mon, May 7, 2012 at 5:02 PM, Liang Ma <ma.satops@gmail.com> wrote: >>> On Fri, May 4, 2012 at 3:58 AM, Magnus Hagander <magnus@hagander.net> wrote: >>>> On Mon, Apr 30, 2012 at 8:34 PM, Liang Ma <ma.satops@gmail.com> wrote: >>>>> Hi There, >>>>> >>>>> While trying to restore a ~700GM binary dump by command >>>>> >>>>> pg_restore -d dbdata < sampledbdata-20120327.pgdump >>>>> >>>>> I encountered following errors repeatedly >>>>> >>>>> pg_restore: [archiver (db)] Error from TOC entry 2882463; 2613 >>>>> 10267347 BLOB 10267347 sdmcleod >>>>> pg_restore: [archiver (db)] could not execute query: ERROR: >>>>> unexpected data beyond EOF in block 500 of relation base/16386/11743 >>>>> HINT: This has been seen to occur with buggy kernels; consider >>>>> updating your system. >>>> >>>> Note the message right here... >>>> >>>> There may be further indications in the server log about what's wrong. >>>> >>> >>> The server's logs in message file were clean. >> >> Then your logging is incorrectly configured, because it should *at >> least* have the same message as the one that showed up in the client. >> > > Oh, yes, the same error messages were logged in the postgresql log > file but no further information. I thought you implied that there may > be some indication in server's system logs, which I couldn't find any. Well, there might be, I wasn't sure :-) I guess there wasn't. >>>>> The server runs Ubuntu server 10.04 LTS with postgresql upgraded to >>>>> version 9.1.3-1~lucid. The postgresql data directory is located in a >>>>> glusterfs mounted directory to a replicated volume vol-2 >>>> >>>> I assume you don't have more than one node actually *accessing* the >>>> data directory at the same time, right? >>>> >>> >>> Yes, you are right. I just set up this glusterfs and postgresql server >>> with two nodes for testing purpose. There was no other gluster >>> filesystem access activity at the time I tried to restore the >>> postgresql dump. Do you know if postgresql recommends any other >>> cluster filesystem, or it may not like cluster filesystem at all? >> >> >> Did you have PostgreSQL started on both nodes? That is *not* >> supported. If PostgreSQL only runs on one node at a time it should in >> theory work, provided the cluster filesystem provides all the services >> that a normal filesystem does, such as respecting fsync. >> > > Postgresql are installed in both nodes, but only one node's postgresql > data directory points to glusterfs filesystem. Another one's data > directory is in its default location in the local ext4 filesystem. > This is the one I used to prove the dump file can be restored without > any problem when glusterfs is not involved. ok. That should in theory be safe. Having two active notes against th efilesystem is never safe. > According to its introduction and document, glusterfs is supposed to > appear as a normal filesystem when being mounted, although I don't > know how well it respects things like fsync. It certainly looks like it's failing at some point. So yeah, I'm pretty sure you need to get in touch with the glusterfs folks - hopefully you get a response from them soon. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Thank you Magnus for all the inputs. If I get any comments from gluster community, I will update here. Liang On Mon, May 7, 2012 at 3:27 PM, Magnus Hagander <magnus@hagander.net> wrote: > On Mon, May 7, 2012 at 7:34 PM, Liang Ma <ma.satops@gmail.com> wrote: >> On Mon, May 7, 2012 at 12:54 PM, Magnus Hagander <magnus@hagander.net> wrote: >>> On Mon, May 7, 2012 at 5:02 PM, Liang Ma <ma.satops@gmail.com> wrote: >>>> On Fri, May 4, 2012 at 3:58 AM, Magnus Hagander <magnus@hagander.net> wrote: >>>>> On Mon, Apr 30, 2012 at 8:34 PM, Liang Ma <ma.satops@gmail.com> wrote: >>>>>> Hi There, >>>>>> >>>>>> While trying to restore a ~700GM binary dump by command >>>>>> >>>>>> pg_restore -d dbdata < sampledbdata-20120327.pgdump >>>>>> >>>>>> I encountered following errors repeatedly >>>>>> >>>>>> pg_restore: [archiver (db)] Error from TOC entry 2882463; 2613 >>>>>> 10267347 BLOB 10267347 sdmcleod >>>>>> pg_restore: [archiver (db)] could not execute query: ERROR: >>>>>> unexpected data beyond EOF in block 500 of relation base/16386/11743 >>>>>> HINT: This has been seen to occur with buggy kernels; consider >>>>>> updating your system. >>>>> >>>>> Note the message right here... >>>>> >>>>> There may be further indications in the server log about what's wrong. >>>>> >>>> >>>> The server's logs in message file were clean. >>> >>> Then your logging is incorrectly configured, because it should *at >>> least* have the same message as the one that showed up in the client. >>> >> >> Oh, yes, the same error messages were logged in the postgresql log >> file but no further information. I thought you implied that there may >> be some indication in server's system logs, which I couldn't find any. > > Well, there might be, I wasn't sure :-) I guess there wasn't. > > >>>>>> The server runs Ubuntu server 10.04 LTS with postgresql upgraded to >>>>>> version 9.1.3-1~lucid. The postgresql data directory is located in a >>>>>> glusterfs mounted directory to a replicated volume vol-2 >>>>> >>>>> I assume you don't have more than one node actually *accessing* the >>>>> data directory at the same time, right? >>>>> >>>> >>>> Yes, you are right. I just set up this glusterfs and postgresql server >>>> with two nodes for testing purpose. There was no other gluster >>>> filesystem access activity at the time I tried to restore the >>>> postgresql dump. Do you know if postgresql recommends any other >>>> cluster filesystem, or it may not like cluster filesystem at all? >>> >>> >>> Did you have PostgreSQL started on both nodes? That is *not* >>> supported. If PostgreSQL only runs on one node at a time it should in >>> theory work, provided the cluster filesystem provides all the services >>> that a normal filesystem does, such as respecting fsync. >>> >> >> Postgresql are installed in both nodes, but only one node's postgresql >> data directory points to glusterfs filesystem. Another one's data >> directory is in its default location in the local ext4 filesystem. >> This is the one I used to prove the dump file can be restored without >> any problem when glusterfs is not involved. > > ok. That should in theory be safe. Having two active notes against th > efilesystem is never safe. > > >> According to its introduction and document, glusterfs is supposed to >> appear as a normal filesystem when being mounted, although I don't >> know how well it respects things like fsync. > > It certainly looks like it's failing at some point. So yeah, I'm > pretty sure you need to get in touch with the glusterfs folks - > hopefully you get a response from them soon. > > -- > Magnus Hagander > Me: http://www.hagander.net/ > Work: http://www.redpill-linpro.com/