Обсуждение: "Too many open files in system" error

Поиск
Список
Период
Сортировка

"Too many open files in system" error

От
Jonatan Evald Buus
Дата:
Greetings,
We're seeing numerous "LOG:  out of file descriptors: Too many open files in system; release and retry" entries as well as quite a few "LOG:  could not open temporary statistics file "global/pgstat.tmp": Too many open files in system"
Much more alarming however, we're seeing errors such as:
FATAL:  could not open file "global/pg_database": Too many open files in system
ERROR:  could not open relation 1663/2219053/2601: Too many open files in system

kern.maxfiles is currently set to 12328 and "sysctl -a | grep kern.openfiles" showed over 10.000 open files prior to a reboot.
After the reboot kern.openfiles quickly increased from less than 200 to over 4000 and appears to be ever increasing.
"lsof | grep postgres | wc -l" showed approximately 4400 just after reboot but appears to be increasing at a steady pace (have increased to around 4800 in 30 min)

System is running FreeBSD 7.1 with 2GB RAM and have been running without issues for over 6 months with PostGreSQL 8.3.6.
We upgraded to PostGreSQL 8.3.7 around a month ago, not sure if this is related?
The server is a dedicated file server running PostGreSQL and Subversion. Subversion has very few users so it shouldn't be an issue.

We've tried changing max_files_per_process to 500 from the default 1000 but this doesn't appear to have had any effect.
Other changes from the default configuration include:
max_connections = 256
shared_buffers = 768MB
temp_buffers = 96MB
max_prepared_transactions = 50
work_mem = 48MB
maintenance_work_mem = 192MB
max_stack_depth = 8MB

Is it normal for PostGreSQL to have close to 5000 file handles open while running?
Or is there some other circumstance which could cause file descriptors to leak?

Appreciate any input

Cheers
Jona

Re: "Too many open files in system" error

От
Emanuel Calvo Franco
Дата:
2009/8/12 Jonatan Evald Buus <jonatan.buus@cellpointmobile.com>:
> Greetings,
> We're seeing numerous "LOG:  out of file descriptors: Too many open files in
> system; release and retry" entries as well as quite a few "LOG:  could not
> open temporary statistics file "global/pgstat.tmp": Too many open files in
> system"
> Much more alarming however, we're seeing errors such as:
> FATAL:  could not open file "global/pg_database": Too many open files in
> system
> ERROR:  could not open relation 1663/2219053/2601: Too many open files in
> system
>
> kern.maxfiles is currently set to 12328 and "sysctl -a | grep
> kern.openfiles" showed over 10.000 open files prior to a reboot.
> After the reboot kern.openfiles quickly increased from less than 200 to over

Did you request about kern.maxfiles?


--
              Emanuel Calvo Franco
             Database consultant at:
                    www.siu.edu.ar
        www.emanuelcalvofranco.com.ar

Re: "Too many open files in system" error

От
jinson abraham
Дата:
since the message is "out of file descriptors: Too many open files in system; release and retry" i think you should also be checking the system ulimit.

try commands like "ulimit -a" to check the number of files currently open in the system. Also try "ulimit -Hn" to get  "The maximum number of open file descriptors (most systems do not allow this value to be set)" see man pages of ulimit for more details.

you could also increase the ulimit by making changes in "ac/etc/security/limits.conf".

However i hope you have already checked the possibility of some other process which might be continuously opening files and not closing it properly. Linux has a fixed number of fd's that can be used and if the process dont relese them it coudl lead to such a problem.

Hope this helps.

Thanks,
 - Jinson.


On Wed, Aug 12, 2009 at 10:44 PM, Emanuel Calvo Franco <postgres.arg@gmail.com> wrote:
2009/8/12 Jonatan Evald Buus <jonatan.buus@cellpointmobile.com>:
> Greetings,
> We're seeing numerous "LOG:  out of file descriptors: Too many open files in
> system; release and retry" entries as well as quite a few "LOG:  could not
> open temporary statistics file "global/pgstat.tmp": Too many open files in
> system"
> Much more alarming however, we're seeing errors such as:
> FATAL:  could not open file "global/pg_database": Too many open files in
> system
> ERROR:  could not open relation 1663/2219053/2601: Too many open files in
> system
>
> kern.maxfiles is currently set to 12328 and "sysctl -a | grep
> kern.openfiles" showed over 10.000 open files prior to a reboot.
> After the reboot kern.openfiles quickly increased from less than 200 to over

Did you request about kern.maxfiles?


--
             Emanuel Calvo Franco
            Database consultant at:
                   www.siu.edu.ar
       www.emanuelcalvofranco.com.ar

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: "Too many open files in system" error

От
Jonatan Evald Buus
Дата:
"ulimit -Hn" shows 11095 which is the same as kern.maxfilesperproc.

"ulimit -a" shows the following
socket buffer size       (bytes, -b) unlimited
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) 524288
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 11095 <-------------- Samee as kern.maxfilesperproc
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 65536
cpu time               (seconds, -t) unlimited
max user processes              (-u) 5547
virtual memory          (kbytes, -v) unlimited

"sysctl -a | grep kern.maxfile" gives the following kernel limits:
kern.maxfiles: 12328
kern.maxfilesperproc: 11095

PostGreSQL seems to have settled around 4800 open file descriptors as counted by "lsof | grep postgres | wc -l" and the number of open files around 3500 as indicated by "sysctl -a | grep kern.openfile"

The only things that's running on the machine is PostGreSQL, hence first suspicion naturally falls on the database.
It just seems odd that it'd start behaving like this after having run with no problems for 6 months.

Any insight or suggestions as to where to start digging would be greatly appreciated

Cheers
Jona

On Wed, Aug 12, 2009 at 7:52 PM, jinson abraham <abraham.jinson@gmail.com> wrote:
since the message is "out of file descriptors: Too many open files in system; release and retry" i think you should also be checking the system ulimit.

try commands like "ulimit -a" to check the number of files currently open in the system. Also try "ulimit -Hn" to get  "The maximum number of open file descriptors (most systems do not allow this value to be set)" see man pages of ulimit for more details.

you could also increase the ulimit by making changes in "ac/etc/security/limits.conf".

However i hope you have already checked the possibility of some other process which might be continuously opening files and not closing it properly. Linux has a fixed number of fd's that can be used and if the process dont relese them it coudl lead to such a problem.

Hope this helps.

Thanks,
 - Jinson.


On Wed, Aug 12, 2009 at 10:44 PM, Emanuel Calvo Franco <postgres.arg@gmail.com> wrote:
2009/8/12 Jonatan Evald Buus <jonatan.buus@cellpointmobile.com>:
> Greetings,
> We're seeing numerous "LOG:  out of file descriptors: Too many open files in
> system; release and retry" entries as well as quite a few "LOG:  could not
> open temporary statistics file "global/pgstat.tmp": Too many open files in
> system"
> Much more alarming however, we're seeing errors such as:
> FATAL:  could not open file "global/pg_database": Too many open files in
> system
> ERROR:  could not open relation 1663/2219053/2601: Too many open files in
> system
>
> kern.maxfiles is currently set to 12328 and "sysctl -a | grep
> kern.openfiles" showed over 10.000 open files prior to a reboot.
> After the reboot kern.openfiles quickly increased from less than 200 to over

Did you request about kern.maxfiles?


--
             Emanuel Calvo Franco
            Database consultant at:
                   www.siu.edu.ar
       www.emanuelcalvofranco.com.ar

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin




--
JONATAN EVALD BUUS

Executive Vice President Open Systems and Telecommunications

Mobile US  +1 (305) 331-5242
Mobile DK  +45 2888 2861
Telephone  +1 (305) 777-0392
Fax.          +1 (305) 777-0449
jonatan.buus@cellpointmobile.com
www.cellpointmobile.com

CellPoint Mobile Inc.
4000 Ponce de Leon Boulevard
Suite 470
Coral Gables, FL 33146
USA

'Mobilizing the Enterprise'

Re: "Too many open files in system" error

От
"Kevin Grittner"
Дата:
Jonatan Evald Buus <jonatan.buus@cellpointmobile.com> wrote:

> [too many files open in system]

> Any insight or suggestions as to where to start digging would be
> greatly appreciated

You didn't start truncating temporary tables inside a loop somewhere,
did you?

http://archives.postgresql.org/pgsql-hackers/2009-08/msg00444.php

-Kevin

Re: "Too many open files in system" error

От
Jonatan Evald Buus
Дата:
Nope, though that sounds fun!! ;-)

The application hasn't really changed much over the past 6 months either (and definitively not its use of the database which is mainly INSERT queries used for logging a few SELECTS for sharing data between the clustered application nodes, all in all pretty simply and straight forward)

/Jona

On Wed, Aug 12, 2009 at 9:02 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Jonatan Evald Buus <jonatan.buus@cellpointmobile.com> wrote:

> [too many files open in system]

> Any insight or suggestions as to where to start digging would be
> greatly appreciated

You didn't start truncating temporary tables inside a loop somewhere,
did you?

http://archives.postgresql.org/pgsql-hackers/2009-08/msg00444.php

-Kevin

Re: "Too many open files in system" error

От
Tom Lane
Дата:
Jonatan Evald Buus <jonatan.buus@cellpointmobile.com> writes:
> Is it normal for PostGreSQL to have close to 5000 file handles open while
> running?

It can be, if you have enough active backends and enough tables that
they are touching.  You have not provided nearly enough information to
gauge what the expected number of actual open files might be in your
installation, but I'll just point out that at max_connections = 256 and
max_files_per_process = 500 you have no grounds to complain if Postgres
tries to use 128000 open files.  If that value of max_connections
actually is representative of what you need to do, I'd recommend cutting
max_files_per_process to a couple hundred and upping the kernel limit
to somewhere north of 50000.

            regards, tom lane

Re: "Too many open files in system" error

От
Jonatan Evald Buus
Дата:
Cheers for the insight Tom,
We generally have anywhere between 60 and 100 active connections to Postgres under normal load, but at peak times this may increase hence the max_connections = 256.
There're several databases in the Postgres cluster so an estimate would be approximately 200 tables in totals.
At least 2 of the databases contains 50 - 100 tables (most of the tables are static) each which are located in 5 - 10 schemas within each database.
Is this enough to gauge the expected number of open files?
If not, what additional information would be required?

Also, what would a reasonable setting for "max_files_per_process" based on a machine with 2GB RAM running FreeBSD 7.1 be?
The comments mention that "max_files_per_process may" be set as low as 25 but what would the implications of this restriction be?
Based on your suggestion, 50.000 / 256 = 191, so setting "max_files_per_process" to around 200 seems reasonable in addition to increasing the kern.maxfiles limit?

By the way, we've only been able to find the following log entries by the kernel which appears related to the problem:
Aug  8 06:30:50 node5 kernel: kern.maxfiles limit exceeded by uid 70, please see tuning(7).
Aug  8 06:30:57 node5 kernel: kern.maxfiles limit exceeded by uid 70, please see tuning(7).
Aug  8 06:32:57 node5 last message repeated 5 times
Aug  8 06:36:57 node5 last message repeated 9 times
"uid 70" is the Postgres user but Postgres logged the "too many files open" error several times in the days following the 8th.
Also the system never appeared to be unresponsive at any given time while the errors occurred, which appears to be a common problem based on previous discussions for this type of event.
We've previously run stress tests on the clusters where the database server reached a very very high load (30 or so) without any errors being logged (by PostGres or otherwise) thus it's quite curious that this problem would "suddenly" appear.

Appreciate the input

Cheers
Jona

On Thu, Aug 13, 2009 at 1:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jonatan Evald Buus <jonatan.buus@cellpointmobile.com> writes:
> Is it normal for PostGreSQL to have close to 5000 file handles open while
> running?

It can be, if you have enough active backends and enough tables that
they are touching.  You have not provided nearly enough information to
gauge what the expected number of actual open files might be in your
installation, but I'll just point out that at max_connections = 256 and
max_files_per_process = 500 you have no grounds to complain if Postgres
tries to use 128000 open files.  If that value of max_connections
actually is representative of what you need to do, I'd recommend cutting
max_files_per_process to a couple hundred and upping the kernel limit
to somewhere north of 50000.

                       regards, tom lane

Re: "Too many open files in system" error

От
Tom Lane
Дата:
Jonatan Evald Buus <jonatan.buus@cellpointmobile.com> writes:
> Also, what would a reasonable setting for "max_files_per_process" based on a
> machine with 2GB RAM running FreeBSD 7.1 be?
> The comments mention that "max_files_per_process may" be set as low as 25
> but what would the implications of this restriction be?

At some point you're going to start losing performance because of too
many close() and open() calls that have to be issued to switch a small
number of file descriptors around to different tables.  AFAIK no one
has really tried to measure the issue.  Personally I'd be pretty
concerned about setting it to less than 100 or so.

            regards, tom lane