Обсуждение: postgresql-server-8.4.4-1PGDG.el5 - ERROR: could not write block 503414 of temporary file: No space le,ft on device

Поиск
Список
Период
Сортировка
System info: Lancelot 1874-T dual Intel Xeon(Nehalem-DP) SATAII 24GB
RAM, CentOS release 5.4 (Final), postgresql-server-8.4.4-1PGDG

hi,

We installed a fresh version of postgresql-server-8.4.4-1 using RPMs
from PGDG. We are seeing the following errors, if the system trys to do
'anything' sort of update, insert, etc.:

> 2010-07-26 11:06:59 MST  LOG:  database system was shut down at 2010-07-26 10:51:41 MST
> 2010-07-26 11:06:59 MST  LOG:  autovacuum launcher started
> 2010-07-26 11:06:59 MST  LOG:  database system is ready to accept connections
> 2010-07-26 11:08:18 MST science_admin metadataFATAL:  password authentication failed for user "science_admin"
> 2010-07-26 11:09:33 MST postgres postgresFATAL:  password authentication failed for user "postgres"
> 2010-07-26 11:09:55 MST arcsoft arcsoftFATAL:  password authentication failed for user "arcsoft"
> 2010-07-26 11:10:00 MST arcsoft arcsoftFATAL:  database "arcsoft" does not exist
> 2010-07-26 11:10:43 MST arcsoft postgresLOG:  statement: ALTER USER postgres WITH PASSWORD 'nsa_core';
> 2010-07-26 11:11:06 MST postgres postgresLOG:  statement: ALTER USER operations WITH PASSWORD 'Mosaic_DHS';
> 2010-07-26 11:11:15 MST postgres postgresLOG:  statement: ALTER USER pipeline WITH PASSWORD 'Mosaic_DHS';
> 2010-07-26 11:11:23 MST postgres postgresLOG:  statement: ALTER USER science_admin WITH PASSWORD 'nsa_devel';
> 2010-07-26 11:11:31 MST postgres postgresLOG:  statement: ALTER USER security_admin WITH PASSWORD 'nsa_devel';
> 2010-07-26 11:11:48 MST postgres postgresLOG:  statement: ALTER USER voi WITH PASSWORD 'nsa_devel';
> 2010-07-26 11:15:24 MST system_admin metadataFATAL:  password authentication failed for user "system_admin"
> 2010-07-26 11:19:36 MST postgres postgresLOG:  statement: ALTER USER system_admin WITH PASSWORD 'nsa_core';
> 2010-07-26 11:21:03 MST postgres postgresLOG:  statement: ALTER USER system_admin WITH PASSWORD 'nsa_devel';
> 2010-07-26 11:21:19 MST system_admin system_adminFATAL:  database "system_admin" does not exist
> 2010-07-26 11:33:33 MST system_admin metadataERROR:  could not write block 503414 of temporary file: No space le
> ft on device
> 2010-07-26 11:33:33 MST system_admin metadataHINT:  Perhaps out of disk space?

The postgresql is installed on /var/lib/pgsql, but it has plenty of space::

> -bash-3.2$ df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3              16G   13G  1.7G  89% /
> /dev/sda8             635G  979M  601G   1% /data0
> /dev/sda7             2.0G  241M  1.7G  13% /home
> /dev/sda5             7.8G  1.5G  5.9G  20% /u1
> /dev/sda2              16G  184M   15G   2% /var/log
> /dev/sda1             487M   22M  440M   5% /boot
> tmpfs                  12G     0   12G   0% /dev/shm
> /dev/md0              2.7T  202M  2.6T   1% /data1


The pg_stat_tmp is hardly growing::

> -bash-3.2$ date
> Mon Jul 26 14:37:38 MST 2010
> -bash-3.2$ pwd
> /var/lib/pgsql/data
> -bash-3.2$ du -ks pg_stat_tmp/*
> 56    pg_stat_tmp/pgstat.stat
> -bash-3.2$ du -ks pg_stat_tmp/*
> 56    pg_stat_tmp/pgstat.stat

In summary:

1) installed new version of postgresql-server-8.4.4-1.
2) restored a binary dump of a database from another system running same
version. The restore succeeded without error.
3) started using the system

We saw this same symptom in an CentOS5.2 machine, but we thought it had
to do with the version of CentOS, so we repeated the steps to a system
with more resources and running CentOS5.4.

We have postgresql-server-8.2.14-1PGDG.rhel5 on other CentOS5.3 systems
and we do not see this error.

I read this post:

> http://archives.postgresql.org/pgsql-performance/2010-04/msg00192.php

I understand that autovacuum is enabled by default, I have not modified
them:

> -bash-3.2$ grep vacuum postgresql.conf
> #vacuum_cost_delay = 0ms        # 0-100 milliseconds
> #vacuum_cost_page_hit = 1        # 0-10000 credits
> #vacuum_cost_page_miss = 10        # 0-10000 credits
> #vacuum_cost_page_dirty = 20        # 0-10000 credits
> #vacuum_cost_limit = 200        # 1-10000 credits
> #autovacuum = on            # Enable autovacuum subprocess?  'on'
> #log_autovacuum_min_duration = -1    # -1 disables, 0 logs all actions and
> #autovacuum_max_workers = 3        # max number of autovacuum subprocesses
> #autovacuum_naptime = 1min        # time between autovacuum runs
> #autovacuum_vacuum_threshold = 50    # min number of row updates before
>                     # vacuum
> #autovacuum_analyze_threshold = 50    # min number of row updates before
> #autovacuum_vacuum_scale_factor = 0.2    # fraction of table size before vacuum
> #autovacuum_analyze_scale_factor = 0.1    # fraction of table size before analyze
> #autovacuum_freeze_max_age = 200000000    # maximum XID age before forced vacuum
> #autovacuum_vacuum_cost_delay = 20ms    # default vacuum cost delay for
>                     # autovacuum, in milliseconds;
>                     # -1 means use vacuum_cost_delay
> #autovacuum_vacuum_cost_limit = -1    # default vacuum cost limit for
>                     # autovacuum, -1 means use
>                     # vacuum_cost_limit
> #vacuum_freeze_min_age = 50000000
> #vacuum_freeze_table_age = 150000000


Your comments/suggestions are appreciated.
--irene
---------------------------------------------------------------------
Irene Barg                    Email:  ibarg@noao.edu
NOAO/AURA Inc.                 http://www.noao.edu/noao/staff/ibarg
950 N. Cherry Ave.            Voice:  520-318-8273
Tucson, AZ  85726 USA           FAX:  520-318-8360
---------------------------------------------------------------------

On Mon, Jul 26, 2010 at 16:01, Irene Barg <ibarg@noao.edu> wrote:
>> 2010-07-26 11:33:33 MST system_admin metadataERROR:  could not write block
>> 503414 of temporary file: No space le
>> ft on device
>> 2010-07-26 11:33:33 MST system_admin metadataHINT:  Perhaps out of disk
>> space?

>> -bash-3.2$ df -h
>> Filesystem            Size  Used Avail Use% Mounted on
>> /dev/sda3              16G   13G  1.7G  89% /
>> /dev/sda8             635G  979M  601G   1% /data0
>> /dev/sda7             2.0G  241M  1.7G  13% /home
>> /dev/sda5             7.8G  1.5G  5.9G  20% /u1
>> /dev/sda2              16G  184M   15G   2% /var/log
>> /dev/sda1             487M   22M  440M   5% /boot
>> tmpfs                  12G     0   12G   0% /dev/shm
>> /dev/md0              2.7T  202M  2.6T   1% /data1
>
>
> The pg_stat_tmp is hardly growing::
>
>> -bash-3.2$ date
>> Mon Jul 26 14:37:38 MST 2010
>> -bash-3.2$ pwd
>> /var/lib/pgsql/data

Ok so it looks like this is on your / (sda3) partition that only has
1.7G free.  Its entirely possible that you have some query that needs
to write more than that in tmp files.

>> -bash-3.2$ du -ks pg_stat_tmp/*

No surprise that is not growing, thats not for temp files, its for the
stats collector.

I would try turning on log_temp_files, that might help pin point the
problem query.

Alex Hunsaker <badalex@gmail.com> writes:
> On Mon, Jul 26, 2010 at 16:01, Irene Barg <ibarg@noao.edu> wrote:
> 2010-07-26 11:33:33 MST system_admin metadataERROR:  could not write block
> 503414 of temporary file: No space le
> ft on device
> 2010-07-26 11:33:33 MST system_admin metadataHINT:  Perhaps out of disk
> space?

> Ok so it looks like this is on your / (sda3) partition that only has
> 1.7G free.  Its entirely possible that you have some query that needs
> to write more than that in tmp files.

"Block 503414" is a red flag here --- that means that this temp file
alone required 503414 * 8K = nearly 4Gb.  I suspect /var/lib/pgsql has
been symlinked someplace else, as the root device has clearly not got
anything close to that much free space.  It's quite possible that the
query involved multiple temp files, though, so maybe one of the other
partitions is getting filled.

            regards, tom lane

Alex,

It's funny how one cannot see what is right in front of them. I did NOT
notice that 'var' was mounted as '/var/log', so postgres was getting
installed in root '/'. So I moved pgsql to /data1, and created a symlink
  in /var/lib/pgsql to /data1/pgsql.

Now I have a pgsql_tmp file that has grown to 1.9Tb!

> -bash-3.2$ du -ks *
> 5556    1
> 5364    11563
> 5436    11564
> 5529348    28523
> 1962946784    pgsql_tmp

I'm afraid I don't understand how turning on 'log_temp_files' will help?
Right now it is set to '-1' which should mean it's disabled, yet it is
creating 1GB files and NOT cleaning them up:

> -bash-3.2$ du -hs  pgsql_tmp/*
> 15M    pgsql_tmp/pgsql_tmp22192.3070
> 1.1G    pgsql_tmp/pgsql_tmp22192.4916
> 168M    pgsql_tmp/pgsql_tmp22192.4918
> 1.1G    pgsql_tmp/pgsql_tmp22192.4919
....

Thanks you.
--irene

Alex Hunsaker wrote:
> On Mon, Jul 26, 2010 at 16:01, Irene Barg <ibarg@noao.edu> wrote:
>>> 2010-07-26 11:33:33 MST system_admin metadataERROR:  could not write block
>>> 503414 of temporary file: No space le
>>> ft on device
>>> 2010-07-26 11:33:33 MST system_admin metadataHINT:  Perhaps out of disk
>>> space?
>
>>> -bash-3.2$ df -h
>>> Filesystem            Size  Used Avail Use% Mounted on
>>> /dev/sda3              16G   13G  1.7G  89% /
>>> /dev/sda8             635G  979M  601G   1% /data0
>>> /dev/sda7             2.0G  241M  1.7G  13% /home
>>> /dev/sda5             7.8G  1.5G  5.9G  20% /u1
>>> /dev/sda2              16G  184M   15G   2% /var/log
>>> /dev/sda1             487M   22M  440M   5% /boot
>>> tmpfs                  12G     0   12G   0% /dev/shm
>>> /dev/md0              2.7T  202M  2.6T   1% /data1
>>
>> The pg_stat_tmp is hardly growing::
>>
>>> -bash-3.2$ date
>>> Mon Jul 26 14:37:38 MST 2010
>>> -bash-3.2$ pwd
>>> /var/lib/pgsql/data
>
> Ok so it looks like this is on your / (sda3) partition that only has
> 1.7G free.  Its entirely possible that you have some query that needs
> to write more than that in tmp files.
>
>>> -bash-3.2$ du -ks pg_stat_tmp/*
>
> No surprise that is not growing, thats not for temp files, its for the
> stats collector.
>
> I would try turning on log_temp_files, that might help pin point the
> problem query.
>

--
---------------------------------------------------------------------
Irene Barg                    Email:  ibarg@noao.edu
NOAO/AURA Inc.                 http://www.noao.edu/noao/staff/ibarg
950 N. Cherry Ave.            Voice:  520-318-8273
Tucson, AZ  85726 USA           FAX:  520-318-8360
---------------------------------------------------------------------

On Tue, Jul 27, 2010 at 15:58, Irene Barg <ibarg@noao.edu> wrote:
> I'm afraid I don't understand how turning on 'log_temp_files' will help?

Well, postgres is probably doing exactly what its told.  You just need
to figure out what that is and why. ( and if it is some kind of bug,
we wont be able to do anything about it unless you give us a test case
:-) )

The idea behind log_temp_files is that you should be able to see whats
causing the creation of all those temp files (and so potentially 'fix'
the query/problem).  The main problem with log_temp_files is it only
logs when the command is done executing.  Of course if it really is
creating tons of files it should take long enough that you should be
able to select * from pg_stat_activity; and pinpoint the problem query
that way.  If all else fails you might have better luck with turning
on log_statements and then lurking through the logs when you start to
get lots of tmp files.

> Right now it is set to '-1' which should mean it's disabled, yet it is
> creating 1GB files and NOT cleaning them up:

It wont clean them up until its done using them.   If a backend
crashes, it looks like it might orphan some tmp files, at least until
postgres is restarted.

Alex,

Thanks for the explanation. That helps.

After looking at some of the pgsql_tmp files, It was clear we needed to
increase work_mem. So I switched back to the pgtune suggestions with one
change:

> -bash-3.2$ diff postgresql.conf postgresql.conf.local | grep "< "
> < log_temp_files = 0        # log temporary files equal or larger
> < default_statistics_target = 50 # pgtune wizard 2010-07-26
> < maintenance_work_mem = 1GB # pgtune wizard 2010-07-26
> < constraint_exclusion = on # pgtune wizard 2010-07-26
> < checkpoint_completion_target = 0.9 # pgtune wizard 2010-07-26
> < effective_cache_size = 16GB # pgtune wizard 2010-07-26
> < work_mem = 144MB # pgtune wizard 2010-07-26
> < wal_buffers = 8MB # pgtune wizard 2010-07-26
> < checkpoint_segments = 16 # pgtune wizard 2010-07-26
> < #shared_buffers = 5632MB # pgtune wizard 2010-07-26
> < shared_buffers = 2048MB
> < max_connections = 80 # pgtune wizard 2010-07-26

The postmaster wouldn't start with shared_buffers = 5632MB, so we
reduced it to 2GB. The system's shared memory is:

> [dppops@pipen18 ~]$ sudo /sbin/sysctl -a | grep kernel | grep shmmax
> kernel.shmmax = 4182863872

We have 24GB RAM total.

The system is behaving normal now, it does create intermittant
'pgsql_tmp' files, but removes them after the SQL statement completes.
We will monitor the logs and try to make informed tweaks to the
postgresql settings as needed.

Thanks to you and Tom Lane for your responses. This group is one of the
best!

--irene barg

Alex Hunsaker wrote:
> On Tue, Jul 27, 2010 at 15:58, Irene Barg <ibarg@noao.edu> wrote:
>> I'm afraid I don't understand how turning on 'log_temp_files' will help?
>
> Well, postgres is probably doing exactly what its told.  You just need
> to figure out what that is and why. ( and if it is some kind of bug,
> we wont be able to do anything about it unless you give us a test case
> :-) )
>
> The idea behind log_temp_files is that you should be able to see whats
> causing the creation of all those temp files (and so potentially 'fix'
> the query/problem).  The main problem with log_temp_files is it only
> logs when the command is done executing.  Of course if it really is
> creating tons of files it should take long enough that you should be
> able to select * from pg_stat_activity; and pinpoint the problem query
> that way.  If all else fails you might have better luck with turning
> on log_statements and then lurking through the logs when you start to
> get lots of tmp files.
>
>> Right now it is set to '-1' which should mean it's disabled, yet it is
>> creating 1GB files and NOT cleaning them up:
>
> It wont clean them up until its done using them.   If a backend
> crashes, it looks like it might orphan some tmp files, at least until
> postgres is restarted.

--
---------------------------------------------------------------------
Irene Barg                    Email:  ibarg@noao.edu
NOAO/AURA Inc.                 http://www.noao.edu/noao/staff/ibarg
950 N. Cherry Ave.            Voice:  520-318-8273
Tucson, AZ  85726 USA           FAX:  520-318-8360
---------------------------------------------------------------------

On Wed, Jul 28, 2010 at 15:43, Irene Barg <ibarg@noao.edu> wrote:
> Alex,
> The postmaster wouldn't start with shared_buffers = 5632MB, so we reduced it
> to 2GB. The system's shared memory is:
>
>> [dppops@pipen18 ~]$ sudo /sbin/sysctl -a | grep kernel | grep shmmax
>> kernel.shmmax = 4182863872

You can bump that up so it works with the larger setting (assuming you
are running a 64bit OS):
http://developer.postgresql.org/pgdocs/postgres/kernel-resources.html

> The system is behaving normal now, it does create intermittant 'pgsql_tmp'
> files, but removes them after the SQL statement completes. We will monitor
> the logs and try to make informed tweaks to the postgresql settings as
> needed.

If its doing a sort or join a btree index on the appropriate column(s) may help.