Обсуждение: open file counts in 8.1.2?
We're trying to make sense of the number of open files on an HP-UX 11.23 system that's getting several new 8.1.2 clusters, and in particular why the numbers appear to be significantly larger than our 7.4 clusters on similar hardware. Would there be anything particular to 8.1.2 over 7.4 that would lead to a larger number of open files? Ed
"Ed L." <pgsql@bluepolka.net> writes: > We're trying to make sense of the number of open files on an > HP-UX 11.23 system that's getting several new 8.1.2 clusters, > and in particular why the numbers appear to be significantly > larger than our 7.4 clusters on similar hardware. Would there > be anything particular to 8.1.2 over 7.4 that would lead to a > larger number of open files? This is much too handwavy to provide an intelligent comment on. Get a copy of "lsof" and find out exactly which processes have how many files open, then we'll have some idea what's going on... regards, tom lane
On Tuesday March 14 2006 10:25 am, Tom Lane wrote: > "Ed L." <pgsql@bluepolka.net> writes: > > We're trying to make sense of the number of open files on an > > HP-UX 11.23 system that's getting several new 8.1.2 > > clusters, and in particular why the numbers appear to be > > significantly larger than our 7.4 clusters on similar > > hardware. Would there be anything particular to 8.1.2 over > > 7.4 that would lead to a larger number of open files? > > This is much too handwavy to provide an intelligent comment > on. Get a copy of "lsof" and find out exactly which processes > have how many files open, then we'll have some idea what's > going on... We have 3 clusters with 24K, 34K, and 47K open files according to lsof. These same clusters have 164, 179, and 210 active connections, respectively. Their schemas, counting the number of user and system entries in pg_class as a generously rough measure of potential open files, contain roughly 2000 entries each. Those open files seem pretty plausible, they're just much higher than what we see on the older systems. Ed
On Tuesday March 14 2006 10:31 am, Ed L. wrote: > On Tuesday March 14 2006 10:25 am, Tom Lane wrote: > > "Ed L." <pgsql@bluepolka.net> writes: > > > We're trying to make sense of the number of open files on > > > an HP-UX 11.23 system that's getting several new 8.1.2 > > > clusters, and in particular why the numbers appear to be > > > significantly larger than our 7.4 clusters on similar > > > hardware. Would there be anything particular to 8.1.2 > > > over 7.4 that would lead to a larger number of open files? > > > > This is much too handwavy to provide an intelligent comment > > on. Get a copy of "lsof" and find out exactly which > > processes have how many files open, then we'll have some > > idea what's going on... > > We have 3 clusters with 24K, 34K, and 47K open files according > to lsof. These same clusters have 164, 179, and 210 active > connections, respectively. Their schemas, counting the number > of user and system entries in pg_class as a generously rough > measure of potential open files, contain roughly 2000 entries > each. Those open files seem pretty plausible, they're just > much higher than what we see on the older systems. One lsof curiosity is that one cluster seems to have it's partition directory listing open about 10K times, including many times by the same backend process: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME postgres 4023 db1dba 49u REG 64,0x10001 16384 7435 /db1 (/dev/vgdb1/lvol1) postgres 4023 db1dba 62u REG 64,0x10001 8192 7673 /db1 (/dev/vgdb1/lvol1) postgres 4023 db1dba 68u REG 64,0x10001 16384 7601 /db1 (/dev/vgdb1/lvol1) postgres 4023 db1dba 78u REG 64,0x10001 16384 7379 /db1 (/dev/vgdb1/lvol1) postgres 4023 db1dba 79u REG 64,0x10001 16384 7380 /db1 (/dev/vgdb1/lvol1) postgres 4023 db1dba 135u REG 64,0x10001 352256 7305 /db1 (/dev/vgdb1/lvol1) postgres 4023 db1dba 136u REG 64,0x10001 262144 7640 /db1 (/dev/vgdb1/lvol1) postgres 4023 db1dba 137u REG 64,0x10001 262144 7642 /db1 (/dev/vgdb1/lvol1) postgres 4023 db1dba 138u REG 64,0x10001 262144 7643 /db1 (/dev/vgdb1/lvol1) Ed
"Ed L." <pgsql@bluepolka.net> writes: > We have 3 clusters with 24K, 34K, and 47K open files according to > lsof. These same clusters have 164, 179, and 210 active > connections, respectively. Their schemas, counting the number > of user and system entries in pg_class as a generously rough > measure of potential open files, contain roughly 2000 entries > each. Those open files seem pretty plausible, they're just much > higher than what we see on the older systems. Hm. AFAICT from the CVS logs, 7.4.2 and later should have about the same behavior as 8.1.* in this regard. What version is the older installation exactly? You can always reduce max_files_per_process if you want more conservative behavior. regards, tom lane
"Ed L." <pgsql@bluepolka.net> writes: > One lsof curiosity is that one cluster seems to have it's > partition directory listing open about 10K times, including > many times by the same backend process: Nah, that's just an lsof aberration on HPUX --- it doesn't always tell the truth about files' names. Notice the NODEs are all different, so these really are different files. You could use ls -i if you want to determine what they actually are. regards, tom lane
On Tuesday March 14 2006 10:46 am, Tom Lane wrote: > "Ed L." <pgsql@bluepolka.net> writes: > > We have 3 clusters with 24K, 34K, and 47K open files > > according to lsof. These same clusters have 164, 179, and > > 210 active connections, respectively. Their schemas, > > counting the number of user and system entries in pg_class > > as a generously rough measure of potential open files, > > contain roughly 2000 entries each. Those open files seem > > pretty plausible, they're just much higher than what we see > > on the older systems. > > Hm. AFAICT from the CVS logs, 7.4.2 and later should have > about the same behavior as 8.1.* in this regard. What version > is the older installation exactly? They are machines each with a mix of 7.3.4, 7.4.6, and 7.4.8. I'm working on lsof comparison to find specific diffs. It would seem the factors driving number of open files are current connections, # of relations, indices, etc. Am I correct about that? > You can always reduce max_files_per_process if you want more > conservative behavior. Ah, thanks. I'm not particularly worried about this since the numbers on the new system somewhat make sense to me. But others here are concerned, so I'm trying to explain/justify/understand better. If we want to handle 16 clusters on this one box, each with 300 max_connections and 2000 relations, would it be ball-park reasonable to say that worst case we might have 300 backends with ~2000 open file descriptors each (300 * 2000 = 600K open files per cluster, 600K * 16 clusters = 10M open files)? Increasing the kernel parameter 'nfiles' (max total open files on system) to something like 10M seems to make some of the ITRC HP gurus gasp. (I suspect we'll hit I/O limits long before open files become an issue.) Ed
"Ed L." <pgsql@bluepolka.net> writes: > If we want to handle 16 clusters on this one box, each > with 300 max_connections and 2000 relations, would it be > ball-park reasonable to say that worst case we might have 300 > backends with ~2000 open file descriptors each (300 * 2000 = > 600K open files per cluster, 600K * 16 clusters = 10M open > files)? No, an individual backend should never exceed max_files_per_process open files (1000 by default). It will feel free to go up that high, though, if it has reason to touch that many database files over its lifetime. 1000 is probably much higher than you really need for reasonable performance; I'd be inclined to cut it to a couple hundred at most if you need to sustain large numbers of backends. I dunno what sort of penalties the kernel might have for millions of open files but there probably are some ... regards, tom lane
I try to build 8.1.3 with: ./configure --prefix=/usr/local/pgsql8.1.3 --with-openssl --with-pam --enable-thread-safety It fails the openssl test, saying openssl/ssl.h is unavailable. Digging deeper, I find that it is because the test program with #include <openssl/ssl.h> is failing because it can't include krb5.h. Based on another post, I tried adding "--with-krb5". That explicitly aborted with it unable to find krb5.h. I then tried: ./configure --prefix=/usr/local/pgsql8.1.3 --with-openssl --with-pam --enable-thread-safety --with-krb5 --with-includes=/usr/kerberos/include Now it gets past both the openssl and kerberos, but bites the dust with: configure: error: *** Thread test program failed. Your platform is not thread-safe. *** Check the file 'config.log'for the exact reason. *** *** You can use the configure option --enable-thread-safety-force *** to force threads to be enabled. However, you must then run *** the program in src/tools/thread and add locking function calls *** to your applications to guarantee thread safety. If I remove the --with-krb5, it works. Why does enabling Kerberos break threads? I haven't been able to find any issues in the archives with krb5 and threads. Am I missing something here? Wes
Wes, Did you try to ./configure w/out "--enable-thread-safety?" I recently compiled postgreSQL 8.0.1 on Solaris and _needed_ --enable-thread-safety strictly for building Slony-I against postgresql with that feature enabled. What is the reason you are compiling this _with_ the feature? If it's necessary, then you may need to --with-includes= and/or --with-libs= with additional include directories, such as /usr/include:/usr/include/sys where-ever the thread .h files are for your OS. This configure attempt could be failing, because it can't locate the correct thread headers and/or libraries Wes wrote: >I try to build 8.1.3 with: > > ./configure --prefix=/usr/local/pgsql8.1.3 --with-openssl --with-pam >--enable-thread-safety > >It fails the openssl test, saying openssl/ssl.h is unavailable. Digging >deeper, I find that it is because the test program with > > #include <openssl/ssl.h> > >is failing because it can't include krb5.h. > >Based on another post, I tried adding "--with-krb5". That explicitly >aborted with it unable to find krb5.h. I then tried: > >./configure --prefix=/usr/local/pgsql8.1.3 --with-openssl --with-pam >--enable-thread-safety --with-krb5 --with-includes=/usr/kerberos/include > >Now it gets past both the openssl and kerberos, but bites the dust with: > >configure: error: >*** Thread test program failed. Your platform is not thread-safe. >*** Check the file 'config.log'for the exact reason. >*** >*** You can use the configure option --enable-thread-safety-force >*** to force threads to be enabled. However, you must then run >*** the program in src/tools/thread and add locking function calls >*** to your applications to guarantee thread safety. > >If I remove the --with-krb5, it works. Why does enabling Kerberos break >threads? > >I haven't been able to find any issues in the archives with krb5 and >threads. Am I missing something here? > >Wes > > > >---------------------------(end of broadcast)--------------------------- >TIP 6: explain analyze is your friend > >
On 3/14/06 2:55 PM, "Louis Gonzales" <louis.gonzales@linuxlouis.net> wrote: > Did you try to ./configure w/out "--enable-thread-safety?" I recently > compiled postgreSQL 8.0.1 on Solaris and _needed_ --enable-thread-safety > strictly for building Slony-I against postgresql with that feature enabled. > > What is the reason you are compiling this _with_ the feature? > If it's necessary, then you may need to --with-includes= and/or --with-libs= > with additional include directories, such as /usr/include:/usr/include/sys > where-ever the thread .h files are for your OS. > > This configure attempt could be failing, because it can't locate the > correct thread headers and/or libraries Why would I not want to specify enable-thread-safety? I want to be able to write threaded programs. --enable-thread-safety works fine until I enable --with-krb5, so it is finding the thread libraries. Wes