Обсуждение: could not create lock file postmaster.pid: No such file or directory, but file does exist

Поиск
Список
Период
Сортировка

could not create lock file postmaster.pid: No such file or directory, but file does exist

От
Rob Goethals / SNP
Дата:

Hi,

 

This is my first post to this list, so I hope I am posting it to the correct lists. But I am really stuck and getting pretty desperate at the moment.

This weekend my database crashed while importing some Openstreetmapdata and I can’t get it back to work again. It happened before and normally I would reset the WAL-dir with the pg_resetxlog command. I would loose some data but that would be all.

This time it is somehow different because he doesn’t recognize any of the important files anymore. For example when I try to start Postgresql again with the command:

/usr/lib/postgresql/9.1/bin/pg_ctl -D OSM/ start

 

I get the following error:

FATAL:  could not create lock file "postmaster.pid": No such file or directory

 

But when I do a ls –l on the directory I can see the file exists.

drwx------ 0 postgres postgres     0 Jan 24 10:07 backup

drwx------ 0 postgres postgres     0 Feb 14 11:10 base

drwx------ 0 postgres postgres     0 Feb 17 09:46 global

drwx------ 0 postgres postgres     0 Oct 11 10:49 pg_clog

-rwxr-xr-x 0 postgres postgres  4476 Oct 11 10:49 pg_hba.conf

-rwxr-xr-x 0 postgres postgres  1636 Oct 11 10:49 pg_ident.conf

drwx------ 0 postgres postgres     0 Feb 17 11:29 pg_log

drwx------ 0 postgres postgres     0 Oct 11 10:49 pg_multixact

drwx------ 0 postgres postgres     0 Feb 17 08:58 pg_notify

drwx------ 0 postgres postgres     0 Oct 11 10:49 pg_serial

drwx------ 0 postgres postgres     0 Feb 12 09:58 pg_stat_tmp

drwx------ 0 postgres postgres     0 Feb 14 09:01 pg_subtrans

drwx------ 0 postgres postgres     0 Oct 11 10:49 pg_tblspc

drwx------ 0 postgres postgres     0 Oct 11 10:49 pg_twophase

-rwxr-xr-x 0 postgres postgres     4 Oct 11 10:49 PG_VERSION

drwx------ 0 postgres postgres     0 Feb 14 13:37 pg_xlog

-rwxr-xr-x 0 postgres postgres 19168 Oct 11 11:41 postgresql.conf

-rwxr-xr-x 0 postgres postgres   121 Feb 17 08:57 postmaster.opts

-rwxr-xr-x 0 postgres postgres    88 Feb 17 08:58 postmaster.pid

 

I cannot perform any action on the postmaster.pid file. I tried cp, mv and rm, but nothing works. Is there anything I can do to make the system recognize this file again? And get my database up and running? Or is all hopelessly lost?

 

I have Postgresql 9.1 installed on Ubuntu 12.04.

 

Kind regards,

Rob.

 

Re: could not create lock file postmaster.pid: No such file or directory, but file does exist

От
Albe Laurenz
Дата:
Rob Goethals wrote:
> This is my first post to this list, so I hope I am posting it to the correct lists. But I am really
> stuck and getting pretty desperate at the moment.

You should not post to more than one list.

> This weekend my database crashed while importing some Openstreetmapdata and I can’t get it back to
> work again. It happened before and normally I would reset the WAL-dir with the pg_resetxlog command. I
> would loose some data but that would be all.

That is not a good idea.  PostgreSQL should recover from a crash automatically.
If you run pg_resetxlog your database cluster is damaged, and all you should
do is pg_dump all the data you can, run initdb and import the data.

> This time it is somehow different because he doesn’t recognize any of the important files anymore. For
> example when I try to start Postgresql again with the command:
> 
> /usr/lib/postgresql/9.1/bin/pg_ctl -D OSM/ start
> 
> I get the following error:
> 
> FATAL:  could not create lock file "postmaster.pid": No such file or directory
> 
> But when I do a ls –l on the directory I can see the file exists.
[...]
> -rwxr-xr-x 0 postgres postgres    88 Feb 17 08:58 postmaster.pid
> 
> I cannot perform any action on the postmaster.pid file. I tried cp, mv and rm, but nothing works. Is
> there anything I can do to make the system recognize this file again? And get my database up and
> running? Or is all hopelessly lost?
> 
> I have Postgresql 9.1 installed on Ubuntu 12.04.

What is the error message you get for cp, mv or rm?

Can you describe the crash of your machine in greater detail?
What was the cause?

One wild guess: could it be that the OS automatically remounted the file system
read-only because it encountered a problem?  Check your /var/log/messages (I hope
the location is the same on Ubuntu and on RHEL).
In that case unmount, fsck and remount should solve the problem.

Yours,
Laurenz Albe

Re: could not create lock file postmaster.pid: No such file or directory, but file does exist

От
Rob Goethals / SNP
Дата:

> -----Oorspronkelijk bericht-----
> Van: Albe Laurenz [mailto:laurenz.albe@wien.gv.at]
> Verzonden: maandag 17 februari 2014 14:22
> Aan: Rob Goethals
> Onderwerp: RE: could not create lock file postmaster.pid: No such file or
> directory, but file does exist
> 
> Dear Rob,
> 
> you should send your reply to the list.
> This way
> a) people know that your problem is solved and won't spend their time trying
> to help you.
> b) others can benefit from the information.

OK, clear. I hereby send this reply also to the list.

> 
> >>> This weekend my database crashed while importing some
> >>> Openstreetmapdata and I can’t get it back to work again. It happened
> >>> before and normally I would reset the WAL-dir with the pg_resetxlog
> >> command. I would loose some data but that would be all.
> >>
> >> That is not a good idea.  PostgreSQL should recover from a crash
> >> automatically.
> >> If you run pg_resetxlog your database cluster is damaged, and all you
> >> should do is pg_dump all the data you can, run initdb and import the data.
> >
> > But what if Postgresql doesn't recover automatically? When my database
> > crashed and I try to restart it, I most of the time get a message like:
> > LOG:  could not open file "pg_xlog/0000000100000114000000D2" (log file
> > 276, segment 210): No such file or directory
> > LOG:  invalid primary checkpoint record
> > LOG:  invalid secondary checkpoint link in control file
> > PANIC:  could not locate a valid checkpoint record
> > LOG:  startup process (PID 3604) was terminated by signal 6: Aborted
> > LOG:  aborting startup due to startup process failure
> 
> Interesting.
> How did you get PostgreSQL into this state?  Did you set fsync=off or similar?
> Which storage did you put pg_xlog on?
> 

I am adding OSM-changefiles to my database with the command:
osm2pgsql --append --database $database --username $user --slim --cache 3000 --number-processes 6 --style
/usr/share/osm2pgsql/default.style--extra-attributes changes.osc.gz
 

At the moment of the crash the postgresql-log says:
2014-02-15 00:49:04 CET  LOG:  WAL writer process (PID 1127) was terminated by signal 6: Aborted
2014-02-15 00:49:04 CET  LOG:  terminating any other active server processes
2014-02-15 00:49:04 CET [unknown] WARNING:  terminating connection because of crash of another server process
2014-02-15 00:49:04 CET [unknown] DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory.
 

So what exactly is happening, I don't know. 

When it is trying to startup again this is the logfile output:
2014-02-15 00:49:08 CET  LOG:  could not open temporary statistics file "global/pgstat.tmp": Input/output error
2014-02-15 00:49:14 CET  LOG:  all server processes terminated; reinitializing
2014-02-15 00:49:17 CET  LOG:  database system was interrupted; last known up at 2014-02-15 00:32:01 CET
2014-02-15 00:49:33 CET [unknown] [unknown]LOG:  connection received: host=[local]
2014-02-15 00:49:33 CET [unknown] FATAL:  the database system is in recovery mode
2014-02-15 00:49:56 CET  LOG:  database system was not properly shut down; automatic recovery in progress
2014-02-15 00:49:57 CET [unknown] [unknown]LOG:  connection received: host=[local]
2014-02-15 00:49:57 CET [unknown] FATAL:  the database system is in recovery mode
2014-02-15 00:50:01 CET  LOG:  redo starts at 114/C8B27330
2014-02-15 00:50:02 CET  LOG:  could not open file "pg_xlog/0000000100000114000000CB" (log file 276, segment 203): No
suchfile or directory
 
2014-02-15 00:50:02 CET  LOG:  redo done at 114/CAFFFF80
2014-02-15 00:50:02 CET  LOG:  checkpoint starting: end-of-recovery immediate
2014-02-15 00:50:05 CET  PANIC:  could not create file "pg_xlog/xlogtemp.5390": Input/output error
2014-02-15 00:50:22 CET [unknown] [unknown]LOG:  connection received: host=[local]
2014-02-15 00:50:22 CET [unknown] FATAL:  the database system is in recovery mode
2014-02-15 00:50:23 CET  LOG:  startup process (PID 5390) was terminated by signal 6: Aborted
2014-02-15 00:50:23 CET  LOG:  aborting startup due to startup process failure

Furthermore I checked my conf-file and my fsync is indeed set to off.
I mounted a directory on a NTFS network-disk (because of the available size and considering the amount of OSM-data is
prettybig). This is where I put all my database data, so also the pg_xlog.
 

> > Is there a better procedure to follow when something like this
> > happens? I am fairly new at the whole Postgresql thing so I am very
> > willing to learn all about it anyway I can from experienced users. I
> > am googling all my way round the internet to try and solve all the
> > questions I have, but as with many things there's most of the time more
> than 1 answer to a problem and for me it is very hard to figure out what is the
> best solution.
> 
> No, in that case I would restore from a backup.
> 
> >> One wild guess: could it be that the OS automatically remounted the
> >> file system read-only because it encountered a problem?  Check your
> >> /var/log/messages (I hope the location is the same on Ubuntu and on
> RHEL).
> >> In that case unmount, fsck and remount should solve the problem.
> >
> > I am impressed. Your wild guess exactly did the trick. Manually
> > unmounting, checking and remounting was all it needed. Thank you very
> much!!
> 
> That would suggest that you have a hardware problem with your storage.
> It may be that your file system is corrupted.  Did you fsck it?

The fsck didn't work as it was mounted as cifs. So I guess I should let Windows do the checking.

> 
> Yours,
> Laurenz Albe

Re: Re: could not create lock file postmaster.pid: No such file or directory, but file does exist

От
Alban Hertroys
Дата:
On 17 February 2014 14:42, Rob Goethals / SNP <Rob.Goethals@snp.nl> wrote:
> 2014-02-15 00:49:04 CET  LOG:  WAL writer process (PID 1127) was terminated by signal 6: Aborted

Signal 6 is usually caused by hardware issues.

Then again, you also say:

>I mounted a directory on a NTFS network-disk (because of the available size and considering the
> amount of OSM-data is pretty big). This is where I put all my database data, so also the pg_xlog.

That will cause problems as well. SMBFS does not support all the
necessary file flags, locks and such that the database needs to
operate on those files in a safe way. That's probably worse than
running with sciss... ehr... fsync=off

Alban Hertroys.
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: Re: could not create lock file postmaster.pid: No such file or directory, but file does exist

От
Tom Lane
Дата:
Rob Goethals / SNP <Rob.Goethals@snp.nl> writes:
> When it is trying to startup again this is the logfile output:
> ...
> 2014-02-15 00:50:05 CET  PANIC:  could not create file "pg_xlog/xlogtemp.5390": Input/output error

The above PANIC is the reason for the abort that happens immediately
thereafter.

On local storage I'd think this meant disk hardware problems, but since
you say you've got the database on an NTFS volume, what it more likely
means is that there's a bug in the kernel's NTFS support.  Anyway, it's
fruitless to try to get Postgres going again until you have a stable
filesystem underneath it.

Generally speaking, longtime Postgres users are very suspicious of running
Postgres atop any kind of networked filesystem.  We find that network
filesystems are invariably less stable than local ones.  NTFS seems likely
to be a particularly unfortunate choice from this standpoint, as you get
to benefit from Windows' bugs along with Linux's.

            regards, tom lane


Re: could not create lock file postmaster.pid: No such file or directory, but file does exist

От
Albe Laurenz
Дата:
Rob Goethals wrote:
> OK, clear. I hereby send this reply also to the list.

Cool.

>> Interesting.
>> How did you get PostgreSQL into this state?  Did you set fsync=off or similar?
>> Which storage did you put pg_xlog on?

> 2014-02-15 00:49:04 CET  LOG:  WAL writer process (PID 1127) was terminated by signal 6: Aborted

Ouch.

> Furthermore I checked my conf-file and my fsync is indeed set to off.

Well, that is one reason why crash recovery is not working.

> I mounted a directory on a NTFS network-disk (because of the available size and considering the amount
> of OSM-data is pretty big). This is where I put all my database data, so also the pg_xlog.

Double ouch.
CIFS is not a supported file system.

At least that explains your problems.
Try with a local file system or NFS with hard foreground mount.

Yours,
Laurenz Albe

Re: could not create lock file postmaster.pid: No such file or directory, but file does exist

От
Rob Goethals / SNP
Дата:
OK, it is clear to me that I didn't make the best choices setting up this database. :( 
I am happy I found this list because I am learning a lot in a very short period of time. :) Thank you all for your tips
andcomments.
 

I will definitely move the database to a Linux-system and set fsync to on. I hope this will give me a more stable
environment. Furthermore I'll dive into the whole database-backup subject so next time I'll have something to restore
ifthings go wrong. 
 

Rob Goethals.

> -----Oorspronkelijk bericht-----
> Van: Albe Laurenz [mailto:laurenz.albe@wien.gv.at]
> Verzonden: maandag 17 februari 2014 16:20
> Aan: Rob Goethals
> CC: 'pgsql-general@postgresql.org'
> Onderwerp: RE: could not create lock file postmaster.pid: No such file or
> directory, but file does exist
> 
> Rob Goethals wrote:
> > OK, clear. I hereby send this reply also to the list.
> 
> Cool.
> 
> >> Interesting.
> >> How did you get PostgreSQL into this state?  Did you set fsync=off or
> similar?
> >> Which storage did you put pg_xlog on?
> 
> > 2014-02-15 00:49:04 CET  LOG:  WAL writer process (PID 1127) was
> > terminated by signal 6: Aborted
> 
> Ouch.
> 
> > Furthermore I checked my conf-file and my fsync is indeed set to off.
> 
> Well, that is one reason why crash recovery is not working.
> 
> > I mounted a directory on a NTFS network-disk (because of the available
> > size and considering the amount of OSM-data is pretty big). This is where I
> put all my database data, so also the pg_xlog.
> 
> Double ouch.
> CIFS is not a supported file system.
> 
> At least that explains your problems.
> Try with a local file system or NFS with hard foreground mount.
> 
> Yours,
> Laurenz Albe

Re: [ADMIN] could not create lock file postmaster.pid: No such file or directory, but file does exist

От
Cliff Pratt
Дата:
You don't give a lot of information, but try "sudo rm postmaster.pid" or "sudo -u postgres rm postmaster.pid" if you are sure that postgres is not running.

Cheers,

Cliff


On Tue, Feb 18, 2014 at 12:07 AM, Rob Goethals / SNP <Rob.Goethals@snp.nl> wrote:

Hi,

 

This is my first post to this list, so I hope I am posting it to the correct lists. But I am really stuck and getting pretty desperate at the moment.

This weekend my database crashed while importing some Openstreetmapdata and I can’t get it back to work again. It happened before and normally I would reset the WAL-dir with the pg_resetxlog command. I would loose some data but that would be all.

This time it is somehow different because he doesn’t recognize any of the important files anymore. For example when I try to start Postgresql again with the command:

/usr/lib/postgresql/9.1/bin/pg_ctl -D OSM/ start

 

I get the following error:

FATAL:  could not create lock file "postmaster.pid": No such file or directory

 

But when I do a ls –l on the directory I can see the file exists.

drwx------ 0 postgres postgres     0 Jan 24 10:07 backup

drwx------ 0 postgres postgres     0 Feb 14 11:10 base

drwx------ 0 postgres postgres     0 Feb 17 09:46 global

drwx------ 0 postgres postgres     0 Oct 11 10:49 pg_clog

-rwxr-xr-x 0 postgres postgres  4476 Oct 11 10:49 pg_hba.conf

-rwxr-xr-x 0 postgres postgres  1636 Oct 11 10:49 pg_ident.conf

drwx------ 0 postgres postgres     0 Feb 17 11:29 pg_log

drwx------ 0 postgres postgres     0 Oct 11 10:49 pg_multixact

drwx------ 0 postgres postgres     0 Feb 17 08:58 pg_notify

drwx------ 0 postgres postgres     0 Oct 11 10:49 pg_serial

drwx------ 0 postgres postgres     0 Feb 12 09:58 pg_stat_tmp

drwx------ 0 postgres postgres     0 Feb 14 09:01 pg_subtrans

drwx------ 0 postgres postgres     0 Oct 11 10:49 pg_tblspc

drwx------ 0 postgres postgres     0 Oct 11 10:49 pg_twophase

-rwxr-xr-x 0 postgres postgres     4 Oct 11 10:49 PG_VERSION

drwx------ 0 postgres postgres     0 Feb 14 13:37 pg_xlog

-rwxr-xr-x 0 postgres postgres 19168 Oct 11 11:41 postgresql.conf

-rwxr-xr-x 0 postgres postgres   121 Feb 17 08:57 postmaster.opts

-rwxr-xr-x 0 postgres postgres    88 Feb 17 08:58 postmaster.pid

 

I cannot perform any action on the postmaster.pid file. I tried cp, mv and rm, but nothing works. Is there anything I can do to make the system recognize this file again? And get my database up and running? Or is all hopelessly lost?

 

I have Postgresql 9.1 installed on Ubuntu 12.04.

 

Kind regards,

Rob.