Обсуждение: Industrial-Strength Logging
It's rude, it's crude, but it's finally here!
The mods for the logging subsystem have been posted to pgsql-patches
for your amusement and edification.
Or whatever.
At the moment it does nothing but start itself up, channel a message
every time a backend starts, and leave some slop hanging around the
system when you shutdown, but I hope that its potential will shine through anyway.
regards,
Tim Holloway
------- Blind-Carbon-Copy To: pgsql-hackers@postgresql.org Subject: Re: Industrial-Strength Logging In-reply-to: <Pine.LNX.4.21.0006021547090.406-100000@localhost.localdomain> Date: Sat, 03 Jun 2000 22:59:34 +1000 Message-ID: <16919.960037174@nemeton.com.au> From: Giles Lean <giles@nemeton.com.au> On Sat, 3 Jun 2000 01:48:33 +0200 (CEST) Peter Eisentraut wrote: > Yeah, let's have another logging discussion... :) Mmm, seems popular. There was a mention on -ports and -general a couple of weeks ago, and here we are (were) on -patches. I'm moving this discussion to -hackers (hope that's a good choice) since that is where Tim Holloway's proposals were discussed late last year. A start point I found in the archives for Tim's proposal is: http://www.postgresql.org/mhonarc/pgsql-hackers/1999-10/msg00747.html I'm not proposing anything that advanced. In particular, I'm not discussing the -content- of log messages at all. For now it would be nice to see the logging mechanism improved; changing or improving the content can be another project. I don't discuss the current logging implementation except to note that the backend postgres processes' logging depends on whether the process is running under postmaster or not, has a controlling terminal or not, whether a -o option was provided, and whether postgres was compiled to use syslog. Maybe that functionality can be simplified a bit ... ;-) One more thing I don't discuss is how the debug log level is set. Certainly something more sophisticated and dynamically variable than the current command line method would be nice, but that too can be a discussion for another day; it isn't much related to -how- the error messages are tucked away. Typical logging methods ======================= (a)(i) write to standard error with redirection to a file Pro: - what the code (mostly) does currently - very easy to set up, just redirect standard error at startup - efficient and low overhead Con: - can't rotate log files - problematic when there is an I/O error or the filesystem the log file is on fills up (a)(ii) write to standard error, with standard error piped to another process Pro: - administrator chooses between (i) and (ii) and can change this via shutdown and restart, no recompilation needed - no code changes to backend programs - clean separation of functionality - choice of backend logging programs o Bernstein's logtools o Apache's rotatelogs o swatch o logsurfer o ... Con: - backend can block if the logging process is not reading log messages fast enough (can the backends generate enough data for this to be a problem in practice?) - reliability of message logging is dependent on the log process - log messages can be lost if the log process aborts, or is not started (solution: make portmaster responsible for starting and restartin the log process) (b) write to named log file(s) One way to allow rotation of log files is for the backend processes to know what log files they write to, and to have them open them directly without shell redirection. There is some support for this with the postgres -o option, but no support for rotating these files that I have seen so far. In the simplest case, the backend processes open the log file at when they start and close it when they exit. This allows rotation of the log file by moving it and waiting for all the currently running backend processes to finish. Pro: - relatively simple code change - still efficient and low overhead Con: - backend processes can run for some time, and postmaster runs indefinitely, so at least postmaster needs to know about log file rotation - doesn't help much for I/O errors or full filesystem To address these limitations some applications open their log file for each message and then close it afterward: Pro: - nothing holds the log file open for long - still efficient and low overhead for the actual writing the log file Con: - all error logging has to be via a log routine. This would be elog(), but there is some use of fprintf(stderr, ...) around the place that would want to be changed - there will be some efficiency hit for the open() and close() calls. This won't be -too- bad since the operating system's inode cache (or local equivalent) should contain an entry for the log file, but it is still two more system calls. Another way to handle rotation with long running processes is to signal them to re-open their log file, like syslogd is managed: Pro: - it's a solution Con: - more code in the backend processes - more communication with the backend processes - more complication (c) log via some logging facility such as syslogd Pro: - people know syslogd - syslogd "looks like" the right answer Con: - messages are sent to it typically via UDP, so it's easy for them to get lost - syslogd is not coded robustly, and can hang, stop listening to input sources, stop writing to output files, and put wrong timestamps on messages - using non-UDP transport to syslogd is non-portable (some systems allow Unix domain sockets, some named pipes, some neither) - syslogd can be hard to secure (e.g. to stop it listening on particular network interfaces) - do you know what your syslog(3) routine does when it gets a write(2) error? - it is not supported by all (any?) vendors to have syslogd write to a pipe, so on-the-fly processing of error messages is not possible; they have to be written to the filesystem - Unix specific(?) (d) log into the database Pro: - we've got a nice tool, let's use it Con: - chicken-and-egg problem of logging serious/fatal errors - infinite recursion problem of logging messages that cause more errors - error messages are predominantly text messages, very susceptible to processing with perl, grep and friends. It isn't crystal clear that having them inside the database helps a lot - using the database is friendly to a DBA, but quite possibly not to the system adminstrator who very possibly knows no SQL but who (on a Unix system, at least) has many text log files to look after - postmaster doesn't access tables according to the previous discussion Recommendations =============== From here on it's definitely all IMHO! Your mileage may vary. To restate the methods I discussed: (a)(i) standard error to file (ii) standard error piped to a process (b) named log file(s) (c) syslogd (d) database I would recommend (a)(ii), with (a)(i) available for anyone who wants it. (Someone who has high load 9-5 but who can shut down daily might be happy writing directly to a log file, for example.) Using the database is too complex/fragile, using syslogd fails too often, and having every process involved manage log files as in (b) seems like more work than is necessary unless/until logging via a single process turns out to be a bottleneck. There has been a suggestion to use the Apache "rotatelogs" program to do the logging. That program is perhaps a little -too- simplistic: - - no timestamp option - - no way to make ad-hoc requests for log file rotation - - write errors cause it to exit (... oops!) - - no fsync option On the other hand the program is only 100 lines. Writing a robust version isn't rocket science, and if postmaster is taught to start it, then pg_ctl looks like a natural place to control log file rotation and possibly one day log levels as well. Regards, Giles P.S. Yes, I've some coding time to offer. I'll wait and see what is liked before I do anything though. ;-) ------- End of Blind-Carbon-Copy
Giles Lean <giles@nemeton.com.au> writes:
>> Yeah, let's have another logging discussion... :)
>
> [ good summary of different approaches: ]
> (a)(i) standard error to file
> (ii) standard error piped to a process
> (b) named log file(s)
> (c) syslogd
> (d) database
> I would recommend (a)(ii), with (a)(i) available for anyone who wants
> it. (Someone who has high load 9-5 but who can shut down daily might
> be happy writing directly to a log file, for example.)
You mentioned the issue of trying to deal with out-of-disk-space errors
for the log file, but there is another kind of resource exhaustion
problem that should also be taken into account. Namely, inability to
open the log file due to EMFILE (no kernel filetable slots left) errors.
This is fresh in my mind because I just finished making some fixes to
make Postgres more robust in the full-filetable scenario. It's quite
easy for a Postgres installation to run the kernel out of filetable
slots if the admin has set a large MaxBackends limit without increasing
the kernel's NFILE parameter enough to cope. So this isn't a very
farfetched scenario, and we ought to take care that our logging
mechanism doesn't break down when it happens.
You mentioned that case (b) has a popular variant of opening and closing
the logfile for each message. I think this would be the most prone to
EMFILE failures, since the backends wouldn't normally be holding the
logfile open. In the other cases the logfile or log pipe is held open
continually by each backend so there's no risk at that point. Of
course, the downstream logging daemon in cases (a)(ii) and (c) might
suffer EMFILE at the time that it's trying to rotate to a new logfile.
I doubt we can expect that syslogd has a good strategy for coping with
this :-(. If the daemon is of our own making, the first thought that
comes to mind is to hold the previous logfile open until after we
successfully open the new one. If we get a failure on opening the new
file, we just keep logging into the old one, while periodically trying
to rotate again.
The recovery strategy for individual backends faced with EMFILE failures
is to close inessential files until the open() request succeeds. (There
are normally plenty of inessential open files, since most backend I/O
goes through VFDs managed by fd.c, and any of those that are physically
open can be closed at need.) If we use case (b) then a backend that
finds itself unable to open a log file could try to recover that way.
However there are two problems with it: one, we might be unable to log
startup failures under EMFILE conditions (since there might well be no
open VFDs in a newly-started backend, especially if the system is in
filetable trouble), and two, there's some risk of circularity problems
if fd.c is itself trying to write a log message and has to be called
back by elog.c.
Case (d), logging to a database table, would be OK in the face of EMFILE
during normal operation, but again I worry about the prospect of being
unable to log startup failures. (Actually, there's a more serious
problem with it for startup failures: a backend cannot be expected to do
database writes until it's pretty fully up to speed. Between that and
the fact the postmaster can't write to tables either, I think we can
reject case (d) for our purposes.)
So from this point of view, it again seems that case (a)(i) or (a)(ii)
is the best alternative, so long as the logging daemon is coded not to
give up its handle for an old log file until it's successfully acquired
a new one.
Seems like the next step should be for someone to take a close look at
the several available log-daemon packages and see which of them looks
like the best bet for our purposes. (I assume there's no good reason
to roll our own from scratch...)
regards, tom lane
On Sat, 03 Jun 2000 12:36:58 -0400 Tom Lane wrote:
> You mentioned the issue of trying to deal with out-of-disk-space errors
> for the log file, but there is another kind of resource exhaustion
> problem that should also be taken into account. Namely, inability to
> open the log file due to EMFILE (no kernel filetable slots left)
> errors.
s/EMFILE/ENFILE/, but yes.
> I doubt we can expect that syslogd has a good strategy for coping with
> this :-(.
if ((f->f_file = open(p, O_WRONLY|O_APPEND, 0)) < 0) { f->f_type = F_UNUSED; logerror(p);
'nuff said.
> If the daemon is of our own making, the first thought that
> comes to mind is to hold the previous logfile open until after we
> successfully open the new one. If we get a failure on opening the new
> file, we just keep logging into the old one, while periodically trying
> to rotate again.
Costs an extra file descriptor, but it's only one and it's temporary.
I can't see anything better to do.
> Seems like the next step should be for someone to take a close look at
> the several available log-daemon packages and see which of them looks
> like the best bet for our purposes. (I assume there's no good reason
> to roll our own from scratch...)
Suggestions anyone? I like Bernstein's tools, but they're not freely
distributable enough to be integrated into Postgres. The Apache
program would be an OK starting place ... if 100 lines of not quite
right code is better than a blank page.
The other tools I mentioned -- swatch and logsurfer -- are really
analysis programs, and while someone might use them for on-the-fly
data reduction they're not really what we want as a default. Anyone?
It wouldn't hurt to discuss the requirements a little bit more here,
too. There is a compile time option to postgres to add timestamps.
Should the log program do that instead? Should the log writing
program have any responsibility for filtering?
Bernstein's "multilog" can do this sort of thing, although I don't
expect the interface to be to many peoples' liking:
http://cr.yp.to/daemontools/multilog.html
Regards,
Giles
P.S. I sent two copies of the message that Tom was replying to to
-hackers; I saw none come back. Very peculiar, since the one was sent
from an address subscribed to pgsql-loophole and one from an address
actually subscribed to the list.
I realized that 7.0.2 still doesn't have documentation about 'database recovery'. Searching the PostgreSQL web server, I found worthy comments. What are the results of this discussion? My poor idea is to create a text file which made from the debugging messages (via syslog and pg_options) and then, after a crash, resend all the queries from the text file. I have no idea to send the original time to the database if a recovery is needed (but this is only the first problem, I'm sure there will be many). What solution do you suggest? Does anybody plan to write the documentation about this? TIA, Zoltan