Обсуждение: is it cool to restart servers as preventive maintenance?

Поиск
Список
Период
Сортировка

is it cool to restart servers as preventive maintenance?

От
Kiriakos Georgiou
Дата:
Hello,

Is it accepted practice to restart a production database on a regular basis (e.g.: once a month) ?

In the last 12 months I have noticed 3-4 instances of database flakiness that is cured by restarting.
I’ve been using PostgreSQL since 2007 and I haven’t seen such issues requiring a reboot, but on my current project we
dosome rather heavy duty PostGIS analysis that apparently stresses the system enough to occasionally cause this (that’s
mytheory anyway.)  I’m beginning to seriously consider restarting servers on a monthly basis. 

regards,
Kiriakos Georgiou

Re: is it cool to restart servers as preventive maintenance?

От
Jerry Sievers
Дата:
Kiriakos Georgiou <kg.postgresql@olympiakos.com> writes:

> Hello,
>
> Is it accepted practice to restart a production database on a regular basis (e.g.: once a month) ?
>
> In the last 12 months I have noticed 3-4 instances of database
> flakiness that is cured by restarting.  I’ve been using
> PostgreSQL since 2007 and I haven’t seen such issues
> requiring a reboot, but on my current project we do some rather heavy
> duty PostGIS analysis that apparently stresses the system enough to
> occasionally cause this (that’s my theory anyway.)
> I’m beginning to seriously consider restarting servers on a
> monthly basis.

I once took a new DBA position and found them doing a weekly full
restart and clustering several large table.

This went on for a short while until I brought to their attention that
the only problem was  long-open transactions :-)

We put in a fix for that problem...

Thus rather than taking production down every Friday night, we went out
partying instead.

Unsophisticated/brute force approach and nobody wins.

YMMV

> regards,
> Kiriakos Georgiou

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800


Re: is it cool to restart servers as preventive maintenance?

От
Tom Lane
Дата:
Kiriakos Georgiou <kg.postgresql@olympiakos.com> writes:
> In the last 12 months I have noticed 3-4 instances of database flakiness that is cured by restarting.
> I’ve been using PostgreSQL since 2007 and I haven’t seen such issues requiring a reboot, but on my current project we
dosome rather heavy duty PostGIS analysis that apparently stresses the system enough to occasionally cause this (that’s
mytheory anyway.)  I’m beginning to seriously consider restarting servers on a monthly basis. 

What sort of "database flakiness"?

It's possible you're encountering some kind of bug (memory leak?) in
PostGIS, but that would be a bug you ought to get them to fix, not a
reason why periodic restarts are a good idea.

            regards, tom lane


Re: is it cool to restart servers as preventive maintenance?

От
Kiriakos Georgiou
Дата:
> On Feb 10, 2016, at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Kiriakos Georgiou <kg.postgresql@olympiakos.com> writes:
>> In the last 12 months I have noticed 3-4 instances of database flakiness that is cured by restarting.
>> I’ve been using PostgreSQL since 2007 and I haven’t seen such issues requiring a reboot, but on my current project
wedo some rather heavy duty PostGIS analysis that apparently stresses the system enough to occasionally cause this
(that’smy theory anyway.)  I’m beginning to seriously consider restarting servers on a monthly basis. 
>
> What sort of "database flakiness"?
>
> It's possible you're encountering some kind of bug (memory leak?) in
> PostGIS, but that would be a bug you ought to get them to fix, not a
> reason why periodic restarts are a good idea.
>
>             regards, tom lane
>



Flaky = the database appears to be running OK (I can run queries via psql) but our app is down for no apparent reason.
Restartingthe app servers multiple times did not help.  Although the database seemed to respond fine to queries via
psql,I decided to restart it.  That was a good move, our app worked fine after the database restart. 

There is more to it: about 8 hours earlier our warm standby postgresql filled up the volume it puts the server logs by
repeatingthe following two lines in the server log, millions of times: 

WARNING:   out of shared memory
CONTEXT:   xlog redo AccessExclusive locks: xid 2002212 db 16384 rel 1079879

I had seen this on our primary about a year ago and I kept doubling max_locks_per_transaction all the way to 1024, at
whichpoint the problem did not reoccur (on the primary.) 
I still occasionally (once every 3-4 months) see the “out of shared memory” message on the standby although
max_locks_per_transactionhas the same 1024 value as the primary.  When this happens I have to rebuilt it from the
primaryvia pg_basebackup.  When the “out of shared memory” happens on the standby, it’s a coin toss whether the primary
willbehave flaky or not.  This time it did, and the restart fixed it. 

Writing this email made me realize the likely cause of our problem.  It’s the “out of shared memory” issue.  We do have
aplpgsql function that calls 30+ other plpgsql functions, some of which create temp tables.  Some of the calls are
withinloops.  So depending on data inputs we can get hundreds of locks to temp tables within a single transaction.  The
oddthing is that at max_locks_per_transaction = 1024 we no longer get any “out of shared memory” on the primary, but we
doon the stand by.  Any ideas about that?  Should I increase max_locks_per_transaction yet again? 

thanks,
Kiriakos Georgiou

Re: is it cool to restart servers as preventive maintenance?

От
Scott Whitney
Дата:
I would absolutely recommend experimenting with higher max_lock, but I would also HIGHLY recommend checking all relevant logs when this is happening to you.

If restarting the app server is failing to connect, SOMEWHERE, SOMEONE should have logged SOMETHING.


-------- Original message --------
From: Kiriakos Georgiou <kg.postgresql@olympiakos.com>
Date: 02/10/2016 3:48 PM (GMT-06:00)
To: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] is it cool to restart servers as preventive maintenance?


> On Feb 10, 2016, at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Kiriakos Georgiou <kg.postgresql@olympiakos.com> writes:
>> In the last 12 months I have noticed 3-4 instances of database flakiness that is cured by restarting.
>> I’ve been using PostgreSQL since 2007 and I haven’t seen such issues requiring a reboot, but on my current project we do some rather heavy duty PostGIS analysis that apparently stresses the system enough to occasionally cause this (that’s my theory anyway.)  I’m beginning to seriously consider restarting servers on a monthly basis.
>
> What sort of "database flakiness"?
>
> It's possible you're encountering some kind of bug (memory leak?) in
> PostGIS, but that would be a bug you ought to get them to fix, not a
> reason why periodic restarts are a good idea.
>
>                        regards, tom lane
>



Flaky = the database appears to be running OK (I can run queries via psql) but our app is down for no apparent reason.  Restarting the app servers multiple times did not help.  Although the database seemed to respond fine to queries via psql, I decided to restart it.  That was a good move, our app worked fine after the database restart.

There is more to it: about 8 hours earlier our warm standby postgresql filled up the volume it puts the server logs by repeating the following two lines in the server log, millions of times:

WARNING:   out of shared memory
CONTEXT:   xlog redo AccessExclusive locks: xid 2002212 db 16384 rel 1079879

I had seen this on our primary about a year ago and I kept doubling max_locks_per_transaction all the way to 1024, at which point the problem did not reoccur (on the primary.)
I still occasionally (once every 3-4 months) see the “out of shared memory” message on the standby although max_locks_per_transaction has the same 1024 value as the primary.  When this happens I have to rebuilt it from the primary via pg_basebackup.  When the “out of shared memory” happens on the standby, it’s a coin toss whether the primary will behave flaky or not.  This time it did, and the restart fixed it.

Writing this email made me realize the likely cause of our problem.  It’s the “out of shared memory” issue.  We do have a plpgsql function that calls 30+ other plpgsql functions, some of which create temp tables.  Some of the calls are within loops.  So depending on data inputs we can get hundreds of locks to temp tables within a single transaction.  The odd thing is that at max_locks_per_transaction = 1024 we no longer get any “out of shared memory” on the primary, but we do on the stand by.  Any ideas about that?  Should I increase max_locks_per_transaction yet again?

thanks,
Kiriakos Georgiou

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Journyx, Inc.
7600 Burnet Road #300
Austin, TX 78757
www.journyx.com

p 512.834.8888 
f 512-834-8858 

Do you receive our promotional emails? You can subscribe or unsubscribe to those emails at http://go.journyx.com/emailPreference/e/4932/714/