Re: Patch to implement pg_current_logfile() function
От | Karl O. Pinc |
---|---|
Тема | Re: Patch to implement pg_current_logfile() function |
Дата | |
Msg-id | 20161026222513.74cd3def@slate.meme.com обсуждение исходный текст |
Ответ на | Re: Patch to implement pg_current_logfile() function (Gilles Darold <gilles.darold@dalibo.com>) |
Ответы |
Re: Patch to implement pg_current_logfile() function
(Christoph Berg <myon@debian.org>)
Re: Patch to implement pg_current_logfile() function (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On Thu, 27 Oct 2016 00:31:56 +0200 Gilles Darold <gilles.darold@dalibo.com> wrote: > Thanks a lot for the documentation fixes, I've also patched some of > your changes, see v7 of the patch and explanations bellow. Thanks. Sorry if I've not kept up on your latest decisions. > > Put pg_log_file in alphabetical order in the file name listing > > and section body. > > This file is now named current_logfile, I have changed the named in > the documentation, especially in storage.sgml. Since it now can contain multiple pathnames perhaps the name of the file should be "current_logfiles"? Seems more descriptive. > One other change in documentation is the explanation of values stored > in this file. This is a path: log_directory/log_filename, and no more > the log file name only. This will help to get full path of the log at > system command level. This is also the value returned by function the > pg_current_logfile() to be able to read file directly with a simple: > > SELECT pg_read_file(pg_current_logfile()); Sounds good. > > I also have a couple of preliminary comments. > > Since pg_log_file may contain only one line, and that > > line may be either the filename of the csv log file or > > the file name of the stderr file name it's impossible > > to tell whether that single file is in csv or stderr > > format. I suppose it might be possible based on file > > name suffix, but Unix expressly ignores file name > > extensions and it seems unwise to force dependence > > on them. Perhaps each line could begin with > > the "type" of the file, either 'csv' or 'stderr' > > followed by a space and the file name? > > The current_logfile may contain 2 lines, one for csvlog and an other > for stderr when they are both defined in log_destination. As said > above, the csvlog file will always have the .csv suffix, I guess it > is enough to now the format but I agree that it will not be true in > all cases. To keep things simple I prefer to only register the path > to the log file, external tools can easily detect the format or can > ask for the path to a specific log format using SELECT > pg_current_logfile('stderr'); for example. This is my point of view, > but if there's a majority to add the log format into the > current_logfile why not. Let me explain my reasoning: The current_logfile file's location is not typically going to change. The cluster gets created and current_logfile then has a known location. Having a known location in the filesystem for current_logfile is sort of the point of current_logfile. A script knows where to find it, to look at it's content and find out where to get the log file it needs. Execing a psql process or otherwise creating a database connection, just in order to execute "SELECT pg_current_logfile('stderr');" to discover where the current logs are, is _way_ more complicated and error prone that reading a file's content. But what if current_logfile contains only a single line? What sort of file format does the logfile have? If you don't know you can't process the logfile content. When there's multiple lines in current_logfile your script might be looking for a logfile in a particular format. How is the script supposed to know the file format of each logfile listed? Relying on a '.csv' file name extension is adequate, but only under a number of assumptions. What if PG adds another logfile format that's different from stderr; how would this new format be distinguished from stderr? What if MS Excel changes the CSV format (yet again) or a non MS Excel CSV format is supported by PG, so that PG supports multiple CSV formats; how would the different CSV file formats be distinguished from each other. Without going on and on, there's good reasons why Unix does not rely on filename extensions to determine the format of file content. Sure, scripts could always do "something", like exec the "file" program, to discover, or at least guess at, the format of the log file. Or we could have PG be forever dependent upon using filename extensions when introducing new logfile formats. There's always kludges which will handle new circumstances. But why go there? PG knows the format of the files it's writing into current_logfile. My argument is strongly related to the "Explicit is better than implicit" philosophy of design. It's not hard to cleanly expose the logfile format to the reader of current_logfile; guessing would never be required no matter the future of PG. "current_logfile" must then be parsed, but if you _are_ going to expose the file format then putting it into the same file as the logfile pathnames is the way to go. "format <space> path" is as easy to parse as looking for a ".csv" suffix, and a lot more clear and future proof. (Even works with spaces in the "path", in shell, using the "read" builtin.) It does mean that every user of current_logfile _must_ parse. If you don't put an explicit format into the file content readers who already know what file format they're going to get don't have to parse. Making these readers parse does not seem like too high a price. What it comes down to is I don't buy the adequacy of the ".csv" suffix test and think that "keeping things simple" now is a recipe for future breakage, or at least significant future complication and confusion when it come to processing logfile content. Regards the data structure to use to expose the file format I can't vouch that "format path" is most future-proof. It's what I came up with on the spur of the moment. Something like: "format <format>: path <path>", where ":" is the field separator and each data element is tagged, would still be parseable by the shell "read" built-in so long as the path comes last. I don't really care about the exact data structure but I do think the file format meta-information should be included. > I have copy/paste here your other comments to limit the number of > messages: Thanks. I was not very disciplined and wrote a lot of emails. > > You're writing Unix eol characters into pg_log_file. (I think.) > > Does this matter on MS Windows? (I'm not up on MS Windows, > > and haven't put any thought into this at all. But didn't > > want to forget about the potential issue.) > > This doesn't matter as the file is opened in text mode (see > logfile_open() in syslogger.c) and use of LF in fprintf() will be > automatically converted to CRLF on MS Windows. This I should have recalled. Thanks for the explanation. > > While you're at it, it wouldn't hurt to provide another > > function that tells you the format of the file returned > > by pg_current_logfile(), since pg_current_logfile() > > called without arguments could return either a stderr > > formatted file or a csvlog formatted file. > > The log format can be check using something like: > > postgres=# SELECT pg_current_logfile(), pg_current_logfile('csvlog'); > pg_current_logfile | > pg_current_logfile > -----------------------------------------+----------------------------------------- > pg_log/postgresql-2016-10-26_231700.log | > pg_log/postgresql-2016-10-26_231700.csv > (1 row) > > postgres=# SELECT CASE WHEN (pg_current_logfile() = > pg_current_logfile('stderr')) THEN 'stderr' ELSE 'csvlog' END AS > log_format; log_format > ------------ > stderr > (1 row) > > but if we want we can have an other internal function with a call > like: > > SELECT pg_log_format(pg_current_logfile()); > > that will return stderr or csvlog but I'm not sure this is necessary. Why not just: SELECT pg_log_format(); since you only ever need to know what log format is returned by pg_current_logfile() when called without arguments? Otherwise you already know the log format because you asked for something specific. My thoughts are as follows: Either you know the log format because you configured the cluster or you don't. If you don't know the log format having the log file is halfway useless. You can do something like back it up, but you can't ever look at it's contents (in some sense) without knowing what data structure you're looking at. Therefore pg_current_logfile() without any arguments is, in the sense of any sort of automated processing of the logfile content, useless. You must know the format of the returned logfile. Anybody who's doing automated log file processing, on a cluster which they did not configure, must write a "pg_log_format()" equivalent function. (Probably, as you say, by using the pg_current_logfile(text) function.) Not everybody's going to be doing this sort of logfile processing but I'd think it'd be common enough that a pg_log_format() function would be in some demand. In a lot of ways this argument is related to the one above. I see a need for automated processing of logfiles in arbitrarily configured PG clusters. > Thanks for your review, let me know if there's any thing to adapt. I'll look over your v7 patch very soon, I hope. If I have documentation edits I'll probably send these first. Knowing what the patch is supposed to do helps me when reading the code. :) Regards, Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
В списке pgsql-hackers по дате отправления:
Следующее
От: Michael PaquierДата:
Сообщение: Re: [BUG] pg_basebackup from disconnected standby fails