Обсуждение: Database size stays constant but disk space keeps shrinking -- postgres 9.1

Поиск
Список
Период
Сортировка

Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Greg Williamson
Дата:
Dear list,

I have a postgres database, 9.1.3, which shows a fairly constant amount of space used by postgres, but total disk space
keesshrinking. 
If I restart postgres the space on my file system returns.

This cluster is replicated to another; that custer does not show this problem.

There are two main databases on this cluster, both using logging. One is used to accumulate some stats on our systen.

The other is a slimmed-down version of our production database, which gets recreated hourly by a shell script which
pullsdata from remote servers, does a pg_dump of the resulting 3 gig database, and then drops it. 

I posted a message about ths a few days and got no responses. I am hoping for better luck this time as this is a
seriousissue,not relayed t vacuuming or routine admin tasks which may or may not be happening. 

Thanks for any help / advice / things to look for.

Greg

a few details:
db11:5432=# select version();
                                                   version
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit

Says one of our sysads:
"These are single disk systems. single no
n-raid sata. Using xfs as the filesystem like the rest of the dbs."


Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Steve Crawford
Дата:
On 09/27/2012 03:05 PM, Greg Williamson wrote:
> Dear list,
>
> I have a postgres database, 9.1.3, which shows a fairly constant amount of space used by postgres, but total disk
spacekees shrinking. 
> If I restart postgres the space on my file system returns.
>
> This cluster is replicated to another; that custer does not show this problem.
>
> There are two main databases on this cluster, both using logging. One is used to accumulate some stats on our systen.
>
> The other is a slimmed-down version of our production database, which gets recreated hourly by a shell script which
pullsdata from remote servers, does a pg_dump of the resulting 3 gig database, and then drops it. 
>
> I posted a message about ths a few days and got no responses. I am hoping for better luck this time as this is a
seriousissue,not relayed t vacuuming or routine admin tasks which may or may not be happening. 
>
>
No answer but some questions.

How are you determining space used by PostgreSQL? From the OS (i.e. du
on PostgreSQL's directories or similar) or from PostgreSQL itself (pg_
system tables, \dt+, etc.)?

What method of replication are you using?

Do you have any indication if the issue is only related to which
database is causing the issue? Can you stop activity on one of them and
see if the problem persists?

Have you checked to see if there are any processes that have open
handles to deleted files (lsof -X | grep deleted). Deleted files won't
show up in du but won't release their disk space until the process
exits. Perhaps a script or scripts, even one of your hourly ones, that
terminate when the server restarts? You could save the output of lsof
and ps immediately before and after a restart and compare them.

Let us know what you find.

Cheers,
Steve




Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Micky Gough
Дата:
Hi Greg,

On Fri, Sep 28, 2012 at 8:05 AM, Greg Williamson <gwilliamson39@yahoo.com> wrote:

I have a postgres database, 9.1.3, which shows a fairly constant amount of space used by postgres, but total disk space kees shrinking.
If I restart postgres the space on my file system returns.

It sounds like (maybe?) Postgres is writing to a file that has been deleted. You won't be able to see the file using ls, so it's hard to track. Restarting postgres will release the lock on the inode and the filesystem flushes it.

You can check for deleted files that are being held open using lsof |grep deleted.

Micky

Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Dinesh Bhandary
Дата:
You might also want to check to see if temp files are growing out of bounds.

Thanks.
Dinesh
On 9/27/2012 3:05 PM, Greg Williamson wrote:
> Dear list,
>
> I have a postgres database, 9.1.3, which shows a fairly constant amount of space used by postgres, but total disk
spacekees shrinking. 
> If I restart postgres the space on my file system returns.
>
> This cluster is replicated to another; that custer does not show this problem.
>
> There are two main databases on this cluster, both using logging. One is used to accumulate some stats on our systen.
>
> The other is a slimmed-down version of our production database, which gets recreated hourly by a shell script which
pullsdata from remote servers, does a pg_dump of the resulting 3 gig database, and then drops it. 
>
> I posted a message about ths a few days and got no responses. I am hoping for better luck this time as this is a
seriousissue,not relayed t vacuuming or routine admin tasks which may or may not be happening. 
>
> Thanks for any help / advice / things to look for.
>
> Greg
>
> a few details:
> db11:5432=# select version();
>                                                     version
> --------------------------------------------------------------------------------------------------------------
>   PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
>
> Says one of our sysads:
> "These are single disk systems. single no
> n-raid sata. Using xfs as the filesystem like the rest of the dbs."
>
>



Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Greg Williamson
Дата:
Steve (and others who replied):


----- Original Message -----
> From: Steve Crawford <scrawford@pinpointresearch.com>
> To: Greg Williamson <gwilliamson39@yahoo.com>
> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
> Sent: Thursday, September 27, 2012 3:48 PM
> Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
>
> On 09/27/2012 03:05 PM, Greg Williamson wrote:
>>  Dear list,
>>
>>  I have a postgres database, 9.1.3, which shows a fairly constant amount of
> space used by postgres, but total disk space kees shrinking.
>>  If I restart postgres the space on my file system returns.
>>
>>  This cluster is replicated to another; that custer does not show this
> problem.
>>
>>  There are two main databases on this cluster, both using logging. One is
> used to accumulate some stats on our systen.
>>
>>  The other is a slimmed-down version of our production database, which gets
> recreated hourly by a shell script which pulls data from remote servers, does a
> pg_dump of the resulting 3 gig database, and then drops it.
>>
>>  I posted a message about ths a few days and got no responses. I am hoping
> for better luck this time as this is a serious issue,not relayed t vacuuming or
> routine admin tasks which may or may not be happening.
>>
>>
> No answer but some questions.
>
> How are you determining space used by PostgreSQL? From the OS (i.e. du on
> PostgreSQL's directories or similar) or from PostgreSQL itself (pg_ system
> tables, \dt+, etc.)?
> 

df -h and the postgres internal functions that show size of relations.

> What method of replication are you using?


streaming replication w/ synchronous commit

>
> Do you have any indication if the issue is only related to which database is
> causing the issue? Can you stop activity on one of them and see if the problem
> persists?


If I suspend the build / drop database process the shrinking of available disk space slows dramatically.

>
> Have you checked to see if there are any processes that have open handles to
> deleted files (lsof -X | grep deleted). Deleted files won't show up in du
> but won't release their disk space until the process exits. Perhaps a script
> or scripts, even one of your hourly ones, that terminate when the server
> restarts? You could save the output of lsof and ps immediately before and after
> a restart and compare them.
> 

lsof -X | grep deleted | wc -l

shows: 835 such files.

A couple:
postgres   2540 postgres   50u      REG                8,3     409600      93429 /var/lib/postgresql/9.1/main/base/2789
200/11816 (deleted)
postgres   2540 postgres   51u      REG                8,3   18112512   49694570 /var/lib/postgresql/9.1/main/base/2789
200/2791679 (deleted)
<...>

while ls shows:
root@db11:~# ls /var/lib/postgresql/9.1/main/base/ | more
1
11938
11946
1418400
16387
16392
16402
16424
16449
2047839
pgsql_tmp

We've a planned restart scheduled soon which will let me find any scripts that might be keeping things open, and I'll
reviewmy script that creates / populates / drops the database hourly. 
 

This does not seem to be related to temp tables. Something in the script, perhaps ...

Further suggestions and  / or questions welcome.

Thanks !

Greg W.



Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Tom Lane
Дата:
Greg Williamson <gwilliamson39@yahoo.com> writes:
>> Have you checked to see if there are any processes that have open handles to
>> deleted files (lsof -X | grep deleted).

> lsof -X | grep deleted | wc -l

> shows: 835 such files.

> A couple:
> postgres   2540 postgres   50u      REG                8,3     409600      93429
/var/lib/postgresql/9.1/main/base/2789
> 200/11816 (deleted)
> postgres   2540 postgres   51u      REG                8,3   18112512   49694570
/var/lib/postgresql/9.1/main/base/2789
> 200/2791679 (deleted)
> <...>

So, which processes are holding these open, and what are they doing
exactly?  Let's see output from ps and pg_stat_activity, maybe even
attach to them with gdb and get stack traces.

> We've a planned restart scheduled soon which will let me find any
> scripts that might be keeping things open,

A restart will destroy all the evidence, so let's not be in a hurry
to do that before we've identified what's happening.

            regards, tom lane


Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Greg Williamson
Дата:
Tom --

----- Original Message -----
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Greg Williamson <gwilliamson39@yahoo.com>
> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
> Sent: Thursday, September 27, 2012 7:14 PM
> Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
>
>G reg Williamson <gwilliamson39@yahoo.com> writes:
>>>  Have you checked to see if there are any processes that have open
> handles to
>>>  deleted files (lsof -X | grep deleted).
>
>>  lsof -X | grep deleted | wc -l
>
>>  shows: 835 such files.
>
>>  A couple:
>>  postgres   2540 postgres   50u      REG                8,3     409600     
> 93429 /var/lib/postgresql/9.1/main/base/2789
>>  200/11816 (deleted)
>>  postgres   2540 postgres   51u      REG                8,3   18112512 
> 49694570 /var/lib/postgresql/9.1/main/base/2789
>>  200/2791679 (deleted)
>>  <...>
>
> So, which processes are holding these open, and what are they doing
> exactly?  Let's see output from ps and pg_stat_activity, maybe even
> attach to them with gdb and get stack traces.
>
>>  We've a planned restart scheduled soon which will let me find any
>>  scripts that might be keeping things open,
>
> A restart will destroy all the evidence, so let's not be in a hurry
> to do that before we've identified what's happening.
>
>             regards, tom lane
>

Thanks for the suggestions -- I'll post back when I have more info. Many of these do not seem to have a link to any
identifiableprocess that is still running, but some do and they have pointed me away from the hourly drop / rebuild, at
leastfor now. Looks like the stats database may be the issue. 

Greg W.



Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Tom Lane
Дата:
Greg Williamson <gwilliamson39@yahoo.com> writes:
>>> postgres   2540 postgres   50u      REG                8,3     409600      93429
/var/lib/postgresql/9.1/main/base/2789200/11816(deleted) 
>>> postgres   2540 postgres   51u      REG                8,3   18112512  49694570
/var/lib/postgresql/9.1/main/base/2789200/2791679(deleted) 

> Thanks for the suggestions -- I'll post back when I have more info. Many of these do not seem to have a link to any
identifiableprocess that is still running, but some do and they have pointed me away from the hourly drop / rebuild, at
leastfor now. Looks like the stats database may be the issue. 

BTW, looking at that again --- the filenames appear to be ordinary
tables in database 2789200, but there is something mighty odd about the
first one: 11816 is an OID that should only be handed out during initdb.
And in 9.1 what it would be handed out to is pg_shdescription.  Now it's
not impossible that pg_shdescription's original table file would get
deleted: a VACUUM FULL or CLUSTER on that catalog would do it.  But
AFAICS there is no situation in which that relfilenode number would
appear in a regular database --- it should be under the global/
subdirectory of $PGDATA.  So unless you miscopied that filename, there
is something odd going on here above and beyond the problem of open
file descriptors not getting closed.  Do you have any nonstandard
maintenance practices in this installation, such as doing database-wide
VACUUM FULL every so often?

            regards, tom lane


Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Greg Williamson
Дата:
Tom --



----- Original Message -----
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Greg Williamson <gwilliamson39@yahoo.com>
> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
> Sent: Thursday, September 27, 2012 7:55 PM
> Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
>
>G reg Williamson <gwilliamson39@yahoo.com> writes:
>>>>  postgres   2540 postgres   50u      REG                8,3   
> 409600      93429 /var/lib/postgresql/9.1/main/base/2789200/11816 (deleted)
>>>>  postgres   2540 postgres   51u      REG                8,3 
> 18112512  49694570 /var/lib/postgresql/9.1/main/base/2789200/2791679 (deleted)
>
>>  Thanks for the suggestions -- I'll post back when I have more info.
> Many of these do not seem to have a link to any identifiable process that is
> still running, but some do and they have pointed me away from the hourly drop /
> rebuild, at least for now. Looks like the stats database may be the issue.
>
> BTW, looking at that again --- the filenames appear to be ordinary
> tables in database 2789200, but there is something mighty odd about the
> first one: 11816 is an OID that should only be handed out during initdb.
> And in 9.1 what it would be handed out to is pg_shdescription.  Now it's
> not impossible that pg_shdescription's original table file would get
> deleted: a VACUUM FULL or CLUSTER on that catalog would do it.  But
> AFAICS there is no situation in which that relfilenode number would
> appear in a regular database --- it should be under the global/
> subdirectory of $PGDATA.  So unless you miscopied that filename, there
> is something odd going on here above and beyond the problem of open
> file descriptors not getting closed.

Nope -- a sim-ple copy that seems to have gotten all of the data output.
>                                                          Do you have any nonstandard
> maintenance practices in this installation, such as doing database-wide
> VACUUM FULL every so often?

None that I know of -- logs don't show any. and there are none on cronjobs. I'm asking the developers behind this app
--they may be doing something strange.  I'll post back as they answer. 

Thanks hugely,

Greg W.



Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Steve Crawford
Дата:
On 09/27/2012 07:01 PM, Greg Williamson wrote:
> Steve (and others who replied):
> ...
>   The other is a slimmed-down version of our production database, which gets
>> recreated hourly by a shell script which pulls data from remote servers, does a
>> pg_dump of the resulting 3 gig database, and then drops it.
>> ...

Could you explain this process in more detail? Are you creating a new
database, reading in data, dumping then dropping the whole database or
just manipulating tables within an existing database?

>> Have you checked to see if there are any processes that have open handles to
>> deleted files (lsof -X | grep deleted). Deleted files won't show up in du
>> but won't release their disk space until the process exits. Perhaps a script
>> or scripts, even one of your hourly ones, that terminate when the server
>> restarts? You could save the output of lsof and ps immediately before and after
>> a restart and compare them.
>>
> lsof -X | grep deleted | wc -l
>
> shows: 835 such files.
>
> A couple:
> postgres   2540 postgres   50u      REG                8,3     409600      93429
/var/lib/postgresql/9.1/main/base/2789
> 200/11816 (deleted)
> postgres   2540 postgres   51u      REG                8,3   18112512   49694570
/var/lib/postgresql/9.1/main/base/2789
> 200/2791679 (deleted)
> <...>...
I'll leave it to you and Tom to puzzle over the the postgres-related
open files. Meanwhile, I'm a bit curious about the other 800+ and
whether they are associated with scripts or processes that are connected
to PostgreSQL.

First, what is the output of "select * from pg_stat_activity;"? Are
there connections you don't expect to see? If you force any of them
closed (after checking with anyone who may be impacted), do you see any
file handles released or disk-space freed?

Second, do any of the processes associated with the other
open-but-deleted files relate to programs or scripts that connect to
PostgreSQL? Next time you do a restart, do any of the processes exit or
do any of the deleted files get closed?

I'm wondering if you have processes that connect to PostgreSQL which
terminate and release their file-handles when PG is restarted.

Cheers,
Steve



Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Dinesh Bhandary
Дата:
We had a situation where external sort was creating a humungous temp
file and the space was reclaimed when the process was completed.

Thanks.
Dinesh

On 9/28/2012 8:59 AM, Steve Crawford wrote:
> On 09/27/2012 07:01 PM, Greg Williamson wrote:
>> Steve (and others who replied):
>> ...
>>   The other is a slimmed-down version of our production database,
>> which gets
>>> recreated hourly by a shell script which pulls data from remote
>>> servers, does a
>>> pg_dump of the resulting 3 gig database, and then drops it.
>>> ...
>
> Could you explain this process in more detail? Are you creating a new
> database, reading in data, dumping then dropping the whole database or
> just manipulating tables within an existing database?
>
>>> Have you checked to see if there are any processes that have open
>>> handles to
>>> deleted files (lsof -X | grep deleted). Deleted files won't show up
>>> in du
>>> but won't release their disk space until the process exits. Perhaps
>>> a script
>>> or scripts, even one of your hourly ones, that terminate when the
>>> server
>>> restarts? You could save the output of lsof and ps immediately
>>> before and after
>>> a restart and compare them.
>> lsof -X | grep deleted | wc -l
>>
>> shows: 835 such files.
>>
>> A couple:
>> postgres   2540 postgres   50u      REG                8,3
>> 409600      93429 /var/lib/postgresql/9.1/main/base/2789
>> 200/11816 (deleted)
>> postgres   2540 postgres   51u      REG                8,3 18112512
>> 49694570 /var/lib/postgresql/9.1/main/base/2789
>> 200/2791679 (deleted)
>> <...>...
> I'll leave it to you and Tom to puzzle over the the postgres-related
> open files. Meanwhile, I'm a bit curious about the other 800+ and
> whether they are associated with scripts or processes that are
> connected to PostgreSQL.
>
> First, what is the output of "select * from pg_stat_activity;"? Are
> there connections you don't expect to see? If you force any of them
> closed (after checking with anyone who may be impacted), do you see
> any file handles released or disk-space freed?
>
> Second, do any of the processes associated with the other
> open-but-deleted files relate to programs or scripts that connect to
> PostgreSQL? Next time you do a restart, do any of the processes exit
> or do any of the deleted files get closed?
>
> I'm wondering if you have processes that connect to PostgreSQL which
> terminate and release their file-handles when PG is restarted.
>
> Cheers,
> Steve
>
>
>



Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Greg Williamson
Дата:
Steve --

----- Original Message -----
> From: Steve Crawford <scrawford@pinpointresearch.com>
> To: Greg Williamson <gwilliamson39@yahoo.com>
> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
> Sent: Friday, September 28, 2012 8:59 AM
> Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
>
> On 09/27/2012 07:01 PM, Greg Williamson wrote:
>>  Steve (and others who replied):
>>  ...
>>    The other is a slimmed-down version of our production database, which
> gets
>>>  recreated hourly by a shell script which pulls data from remote
> servers, does a
>>>  pg_dump of the resulting 3 gig database, and then drops it.
>>>  ...
>
> Could you explain this process in more detail? Are you creating a new database,
> reading in data, dumping then dropping the whole database or just manipulating
> tables within an existing database?
> 

We create the database, and then use shell scripts to get a pg_dump -s of a production server and use that to create
tables,indexes, etc. 

Then a sequence of psql commands retrieves data from production, usually a subset of whatever table is being grabbed
basedon a slimmed down set of userids of interest, recency, etc. Mostly these get loaded as is i nto the tables in the
newdatabase; occasionally we build a temp table and do some simple joins to get the final results. 

When all these commands are done, we pg_dump the new database, compress it and make a tar file, and finally issue a
DROPDATABASE command. 

Takes about 20 minutes and runs once an hour.

>>>  Have you checked to see if there are any processes that have open
> handles to
>>>  deleted files (lsof -X | grep deleted). Deleted files won't show up
> in du
>>>  but won't release their disk space until the process exits. Perhaps
> a script
>>>  or scripts, even one of your hourly ones, that terminate when the
> server
>>>  restarts? You could save the output of lsof and ps immediately before
> and after
>>>  a restart and compare them.
>>>  
>>  lsof -X | grep deleted | wc -l
>>
>>  shows: 835 such files.
>>
>>  A couple:
>>  postgres   2540 postgres   50u      REG                8,3     409600     
> 93429 /var/lib/postgresql/9.1/main/base/2789
>>  200/11816 (deleted)
>>  postgres   2540 postgres   51u      REG                8,3   18112512 
> 49694570 /var/lib/postgresql/9.1/main/base/2789
>>  200/2791679 (deleted)
>>  <...>...
> I'll leave it to you and Tom to puzzle over the the postgres-related open
> files. Meanwhile, I'm a bit curious about the other 800+ and whether they
> are associated with scripts or processes that are connected to PostgreSQL.
> 

These all seem to be from two places -- repmgr (transient ) and this stats application.

> First, what is the output of "select * from pg_stat_activity;"? Are
> there connections you don't expect to see? If you force any of them closed
> (after checking with anyone who may be impacted), do you see any file handles
> released or disk-space freed?
>


Nothing unexpected, lots of IDLE connections (20-30, depending). We just tried a round of closing half of the stat
applicationconnections and it didn't seem to make a big difference. 


> Second, do any of the processes associated with the other open-but-deleted files
> relate to programs or scripts that connect to PostgreSQL? Next time you do a
> restart, do any of the processes exit or do any of the deleted files get closed?
> 

repmgr is now also a suspect, although from what I can see it keeps things in that state for nly a short while. But we
areinvestigating further. 

Everythng else connects via pgbouncer, so we are also wondering if the tcp_keepalive we added might be hurting us.

> I'm wondering if you have processes that connect to PostgreSQL which
> terminate and release their file-handles when PG is restarted.
>
> Cheers,
> Steve
>


Thanks for questions and the time -- still digging into this.

Greg


Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Tom Lane
Дата:
Greg Williamson <gwilliamson39@yahoo.com> writes:
>> I'll leave it to you and Tom to puzzle over the the postgres-related open
>> files. Meanwhile, I'm a bit curious about the other 800+ and whether they
>> are associated with scripts or processes that are connected to PostgreSQL.

> These all seem to be from two places -- repmgr (transient ) and this stats application.

Do you have any characterization yet of which deleted files are being
held open by which processes?  In particular I'm wondering if the
held-open deleted files are in a recently-dropped database, and whether
they are being held open by regular backends or one of the background
processes such as bgwriter, and if the former what are those backends
doing exactly.

It's entirely expected that recently-deleted files might be held open
for a little while, but there are mechanisms that are supposed to
prevent them from being held open indefinitely.  I'm guessing that your
usage pattern might be tripping over some gap in those mechanisms, but
we don't have enough info yet to speculate about what.

BTW, I now think that my question about the unexpected OID value shown
for one deleted file may have been a red herring --- it seems not
implausible that lsof was just lying to you.  It has to do some
guesswork to reconstruct file paths for deleted files, and I don't think
it's always right about that.

            regards, tom lane


Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Dinesh Bhandary
Дата:
I'd also keep track of tmp directory while building. I hope it is not
creating a huge external sort file. By building you meant to say
pg_restore, I assume.
How big is your maintenance_work_mem? Index creation pools memory from
this buffer. 

n Sep 27, 2012 07:01 PM, Greg Williamson <gwilliamson39@yahoo.com>
wrote:

>Steve (and others who replied):
>
>
>----- Original Message -----
>>From: Steve Crawford <scrawford@pinpointresearch.com>
>>To: Greg Williamson <gwilliamson39@yahoo.com>
>>Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
>>Sent: Thursday, September 27, 2012 3:48 PM
>>Subject: Re: [ADMIN] Database size stays constant but disk space keeps
>>shrinking -- postgres 9.1
>>
>>On 09/27/2012 03:05 PM, Greg Williamson wrote:
>>>Dear list,
>>>
>>>I have a postgres database, 9.1.3, which shows a fairly constant
>>>amount of
>>space used by postgres, but total disk space kees shrinking.
>>>If I restart postgres the space on my file system returns.
>>>
>>>This cluster is replicated to another; that custer does not show this
>>problem.
>>>
>>>There are two main databases on this cluster, both using logging. One
>>>is
>>used to accumulate some stats on our systen.
>>>
>>>The other is a slimmed-down version of our production database, which
>>>gets
>>recreated hourly by a shell script which pulls data from remote
>>servers, does a
>>pg_dump of the resulting 3 gig database, and then drops it.
>>>
>>>I posted a message about ths a few days and got no responses. I am
>>>hoping
>>for better luck this time as this is a serious issue,not relayed t
>>vacuuming or
>>routine admin tasks which may or may not be happening.
>>>
>>>
>>No answer but some questions.
>>
>>How are you determining space used by PostgreSQL? From the OS (i.e. du
>>on
>>PostgreSQL's directories or similar) or from PostgreSQL itself (pg_
>>system
>>tables, \dt+, etc.)?
>> 
>
>df -h and the postgres internal functions that show size of relations.
>
>>What method of replication are you using?
>
>
>streaming replication w/ synchronous commit
>
>>
>>Do you have any indication if the issue is only related to which
>>database is
>>causing the issue? Can you stop activity on one of them and see if the
>>problem
>>persists?
>
>
>If I suspend the build / drop database process the shrinking of
>available disk space slows dramatically.
>
>>
>>Have you checked to see if there are any processes that have open
>>handles to
>>deleted files (lsof -X | grep deleted). Deleted files won't show up in
>>du
>>but won't release their disk space until the process exits. Perhaps a
>>script
>>or scripts, even one of your hourly ones, that terminate when the
>>server
>>restarts? You could save the output of lsof and ps immediately before
>>and after
>>a restart and compare them.
>> 
>
>lsof -X | grep deleted | wc -l
>
>shows: 835 such files.
>
>A couple:
>postgres   2540 postgres   50u      REG                8,3     409600  
>   93429 /var/lib/postgresql/9.1/main/base/2789
>200/11816 (deleted)
>postgres   2540 postgres   51u      REG                8,3   18112512  
>49694570 /var/lib/postgresql/9.1/main/base/2789
>200/2791679 (deleted)
><...>
>
>while ls shows:
>root@db11:~# ls /var/lib/postgresql/9.1/main/base/ | more
>1
>11938
>11946
>1418400
> 16387
>16392
>16402
>16424
>16449
>2047839
> pgsql_tmp
>
>We've a planned restart scheduled soon which will let me find any
>scripts that might be keeping things open, and I'll review my script
>that creates / populates / drops the database hourly.
> 
>
>This does not seem to be related to temp tables. Something in the
>script, perhaps ...
>
>Further suggestions and  / or questions welcome.
>
>Thanks !
>
>Greg W.
>
>
>
>--
>Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-admin
>
>



Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Greg Williamson
Дата:
Dinesh --


----- Original Message -----
> From: Dinesh Bhandary <bhandary@iii.com>
> To: Greg Williamson <gwilliamson39@yahoo.com>
> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
> Sent: Thursday, September 27, 2012 7:19 PM
> Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
>
>
> I'd also keep track of tmp directory while building. I hope it is not
> creating a huge external sort file. By building you meant to say
> pg_restore, I assume.

Actually everything is done a single SQL commands; the DDL from the source gets run on the new database, and then we do
aseries of SQL commands. The tmp space doesn't seem to get hit as there are only a few sorts being run as we reduce a
listof "users from" and "users to" to a single list of unique user ids. 

> How big is your maintenance_work_mem? Index creation pools memory from
> this buffer. 


maintenance_work_mem = 1500MB# min 1MB

Which seems to be enough for this task set.

postgres@db11:~/9.1/main$ ls -lrt base/pgsql_tmp
total 0

Something malign in one of our applications, perhaps interacting with this hourly drop. Still working with the
developersto see what they are doing "under the hood." 

Thanks for the suggestions!

Greg



Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Greg Williamson
Дата:
I've done some more testing and the problem seems to be repmgr itself.

A few details below...


----- Original Message -----
> From: Greg Williamson <gwilliamson39@yahoo.com>
> To: Tom Lane <tgl@sss.pgh.pa.us>
> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
> Sent: Thursday, September 27, 2012 7:23 PM
> Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
>
>T om --
>
> ----- Original Message -----
>>  From: Tom Lane <tgl@sss.pgh.pa.us>
>>  To: Greg Williamson <gwilliamson39@yahoo.com>
>>  Cc: "pgsql-admin@postgresql.org"
> <pgsql-admin@postgresql.org>
>>  Sent: Thursday, September 27, 2012 7:14 PM
>>  Subject: Re: [ADMIN] Database size stays constant but disk space keeps
> shrinking -- postgres 9.1
>>
>> G reg Williamson <gwilliamson39@yahoo.com> writes:
>>>>   Have you checked to see if there are any processes that have open
>>  handles to
>>>>   deleted files (lsof -X | grep deleted).
>>
>>>   lsof -X | grep deleted | wc -l
>>
>>>   shows: 835 such files.
>>
>>>   A couple:
>>>   postgres   2540 postgres   50u      REG                8,3     409600 
>    
>>  93429 /var/lib/postgresql/9.1/main/base/2789
>>>   200/11816 (deleted)
>>>   postgres   2540 postgres   51u      REG                8,3   18112512 
>
>>  49694570 /var/lib/postgresql/9.1/main/base/2789
>>>   200/2791679 (deleted)
>>>   <...>
>>
>>  So, which processes are holding these open, and what are they doing
>>  exactly?  Let's see output from ps and pg_stat_activity, maybe even
>>  attach to them with gdb and get stack traces.
>>
>>>   We've a planned restart scheduled soon which will let me find any
>>>   scripts that might be keeping things open,
>>
>>  A restart will destroy all the evidence, so let's not be in a hurry
>>  to do that before we've identified what's happening.
>>
>>              regards, tom lane
>>
>
> Thanks for the suggestions -- I'll post back when I have more info. Many of
> these do not seem to have a link to any identifiable process that is still
> running, but some do and they have pointed me away from the hourly drop /
> rebuild, at least for now. Looks like the stats database may be the issue.
>
> Greg W.

I turned off the cronjob that did the hourly database create / drop and am still leaking disk space, but a but slower
--only lost 2 gigs overnight. 

While running this process I see these data directories:
postgres@db11:~$ ls -lrt 9.1/main/base
total 200
drwx------ 2 postgres postgres     6 2012-09-21 16:36 pgsql_tmp
drwx------ 2 postgres postgres  8192 2012-10-01 00:26 16387
drwx------ 2 postgres postgres 16384 2012-10-01 00:26 1418400
drwx------ 2 postgres postgres  8192 2012-10-01 00:26 2047839
drwx------ 2 postgres postgres  8192 2012-10-01 00:26 11946
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 16449
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 16392
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 16402
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 11938
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 1
drwx------ 2 postgres postgres  8192 2012-10-01 08:17 16424
drwx------ 2 postgres postgres 32768 2012-10-01 19:20 3171846

When it is done (note the last directory is now gone):
postgres@db11:~$ ls -lrt 9.1/main/base
total 140
drwx------ 2 postgres postgres     6 2012-09-21 16:36 pgsql_tmp
drwx------ 2 postgres postgres  8192 2012-10-01 00:26 16387
drwx------ 2 postgres postgres 16384 2012-10-01 00:26 1418400
drwx------ 2 postgres postgres  8192 2012-10-01 00:26 2047839
drwx------ 2 postgres postgres  8192 2012-10-01 00:26 11946
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 16449
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 16392
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 16402
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 11938
drwx------ 2 postgres postgres  8192 2012-10-01 00:27 1
drwx------ 2 postgres postgres  8192 2012-10-01 08:17 16424

When I run lsof -X and grep for deleted files I see these 4 new entries added since the last database create/drop:
ase/3167420/3169915 (deleted)
postgres  21116 postgres   66u      REG                8,3   19709952  136501576
/var/lib/postgresql/9.1/main/base/3171846/3174279(deleted) 
postgres  21116 postgres   67u      REG                8,3   15450112  136501574
/var/lib/postgresql/9.1/main/base/3171846/3174278(deleted) 
postgres  21116 postgres   68u      REG                8,3   28344320  136410873
/var/lib/postgresql/9.1/main/base/3171846/3172541(deleted) 
postgres  21116 postgres   69u      REG                8,3   82452480  144333458
/var/lib/postgresql/9.1/main/base/3171846/3174341(deleted) 
root@db11:~# 
root@db11:~# ps auxww | grep 21116
postgres 21116  0.0  0.1 100416 32332 ?        Ss   00:26   0:16 postgres: repmgr repmgr 199.9.xxx.yyy(45239) idle    
          
root     25755  0.0  0.0   6440   840 pts/2    S+   19:38   0:00 grep --color=auto 21116

======

With the database create/drop suspended we still see a steady accumulation of dead file descriptors, but at a slower
rate.
< /dev/sda3              67G   28G   39G  42% /
---
> /dev/sda3              67G   29G   38G  44% /

Other than abandoning repmgr I don't see a solution. I've posted this to the repmgr discussion group but have had zero
responses(and, frankly, am not holding my breath). 

If anyone has any suggestions I'm all ears.

Thanks for the bandwidth!

Greg W.


Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Gabriele Bartolini
Дата:
 Hi Greg,

 On Tue, 2 Oct 2012 15:02:23 -0700 (PDT), Greg Williamson
 <gwilliamson39@yahoo.com> wrote:

> Other than abandoning repmgr I don't see a solution. I've posted this
> to the repmgr discussion group but have had zero responses (and,
> frankly, am not holding my breath).

 If you are 100% sure it is repmgr ... :)

 I am not 100% sure it is ... Under normal circumstances (once the
 standby has been cloned), repmgr simply controls the status of a standby
 server communicating with a master through the streaming replication
 protocol. As any other standby would do.

 Cheers,
 Gabriele
--
  Gabriele Bartolini - 2ndQuadrant Italia
  PostgreSQL Training, Services and Support
  Gabriele.Bartolini@2ndQuadrant.it - www.2ndQuadrant.it


Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Jaime Casanova
Дата:
On Wed, Oct 3, 2012 at 6:41 AM, Gabriele Bartolini
<Gabriele.Bartolini@2ndquadrant.it> wrote:
> Hi Greg,
>
>
> On Tue, 2 Oct 2012 15:02:23 -0700 (PDT), Greg Williamson
> <gwilliamson39@yahoo.com> wrote:
>
>> Other than abandoning repmgr I don't see a solution. I've posted this
>> to the repmgr discussion group but have had zero responses (and,
>> frankly, am not holding my breath).
>
>
> If you are 100% sure it is repmgr ... :)
>
> I am not 100% sure it is ... Under normal circumstances (once the standby
> has been cloned), repmgr simply controls the status of a standby server
> communicating with a master through the streaming replication protocol. As
> any other standby would do.
>

more to the point...
if this is happening on master (where you can create and drop
databases), then repmgrd has no reason to be running there... and very
limited subset of commands (master register, cluster show, cluster
cleanup) can be run on master

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566         Cell: +593 987171157


Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Greg Williamson
Дата:
Gabriele --


----- Original Message -----
> From: Gabriele Bartolini <Gabriele.Bartolini@2ndQuadrant.it>
> To: Greg Williamson <gwilliamson39@yahoo.com>
> Cc: pgsql-admin@postgresql.org
> Sent: Wednesday, October 3, 2012 4:41 AM
> Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1
>
> Hi Greg,
>
> On Tue, 2 Oct 2012 15:02:23 -0700 (PDT), Greg Williamson
> <gwilliamson39@yahoo.com> wrote:
>
>>  Other than abandoning repmgr I don't see a solution. I've posted
> this
>>  to the repmgr discussion group but have had zero responses (and,
>>  frankly, am not holding my breath).
>
> If you are 100% sure it is repmgr ... :)
>
> I am not 100% sure it is ... Under normal circumstances (once the standby has
> been cloned), repmgr simply controls the status of a standby server
> communicating with a master through the streaming replication protocol. As any
> other standby would do.
>


Indeed -- stopping repmgr has lef to a continued accumulation of the dead but not gone file handles, but almost all are
nowowned by the application, with only a few held by WAL shipping and the like. So repmgr was just getting the blame,
unfairly.

Am working on the code itself now.

Thanks for the response!

Greg W.


Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Jaime Casanova
Дата:
On Wed, Oct 3, 2012 at 4:53 PM, Greg Williamson <gwilliamson39@yahoo.com> wrote:
>
> Indeed -- stopping repmgr has lef to a continued accumulation of the dead but not gone file handles, but almost all
arenow owned by the application, with only a few held by WAL shipping and the like. So repmgr was just getting the
blame,unfairly. 
>
> Am working on the code itself now.
>

Just in case, it seems like
http://archives.postgresql.org/pgsql-committers/2012-10/msg00194.php
is a fix to your problem. It was commited a few minutes ago by Tom
Lane so you can patch your instalation or wait until the next
postgresql minor version to be released

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566         Cell: +593 987171157


Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1

От
Greg Williamson
Дата:
fantastic !
thanks.
greg


From: Jaime Casanova <jaime@2ndquadrant.com>
To: Greg Williamson <gwilliamson39@yahoo.com>
Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
Sent: Wednesday, October 17, 2012 10:05 AM
Subject: Re: [ADMIN] Database size stays constant but disk space keeps shrinking -- postgres 9.1

On Wed, Oct 3, 2012 at 4:53 PM, Greg Williamson <gwilliamson39@yahoo.com> wrote:
>
> Indeed -- stopping repmgr has lef to a continued accumulation of the dead but not gone file handles, but almost all are now owned by the application, with only a few held by WAL shipping and the like. So repmgr was just getting the blame, unfairly.
>
> Am working on the code itself now.
>

Just in case, it seems like
http://archives.postgresql.org/pgsql-committers/2012-10/msg00194.php
is a fix to your problem. It was commited a few minutes ago by Tom
Lane so you can patch your instalation or wait until the next
postgresql minor version to be released

--
Jaime Casanova        www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566        Cell: +593 987171157