Обсуждение: several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL
Hi all,
I have some questions about backup on PostgreSQL:
1) pg_dumpall
I am sure that this is asked in somewhere: why the pg_dumpall does not support custom or tar format like pg_dump.
I have heard that the custom format could be faster and may generate a smaller dump file.
Is the feature enhancement being developed?
2) pg_start_backup in postgresql 8.4
After issuing pg_start_backup, we can take the file based backup (by tar/snapshot) on the postgresql data directory.
If the use of file system snapshot is not possible.
It is possible that there is file changes (added or file size changed) between the pg_start_backup and pg_stop_backup.
Is the backup consistent?
3) extension of 2): If the postgresql version is 9.2
Is pg_basebackup better than pg_start_backup + file based backup + pg_stop_backup?
4) For the WAL backup in postgresql 8.4
After the archive mode is on and WAL is backup, how do I remove the old WAL files?
How about newer version (for example 9.2?)
Thanks and regards,
Patrick
I have some questions about backup on PostgreSQL:
1) pg_dumpall
I am sure that this is asked in somewhere: why the pg_dumpall does not support custom or tar format like pg_dump.
I have heard that the custom format could be faster and may generate a smaller dump file.
Is the feature enhancement being developed?
2) pg_start_backup in postgresql 8.4
After issuing pg_start_backup, we can take the file based backup (by tar/snapshot) on the postgresql data directory.
If the use of file system snapshot is not possible.
It is possible that there is file changes (added or file size changed) between the pg_start_backup and pg_stop_backup.
Is the backup consistent?
3) extension of 2): If the postgresql version is 9.2
Is pg_basebackup better than pg_start_backup + file based backup + pg_stop_backup?
4) For the WAL backup in postgresql 8.4
After the archive mode is on and WAL is backup, how do I remove the old WAL files?
How about newer version (for example 9.2?)
Thanks and regards,
Patrick
Re: several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL
От
Albe Laurenz
Дата:
Patrick Dung wrote: > I have some questions about backup on PostgreSQL: > > 1) pg_dumpall > I am sure that this is asked in somewhere: why the pg_dumpall does not support custom or tar format > like pg_dump. > I have heard that the custom format could be faster and may generate a smaller dump file. > Is the feature enhancement being developed? I suspect that the reason is that pg_dumpall just calls pg_dump and concatenates the output, and that probably only works in text format. > 2) pg_start_backup in postgresql 8.4 > After issuing pg_start_backup, we can take the file based backup (by tar/snapshot) on the postgresql > data directory. > If the use of file system snapshot is not possible. I don't understand this sentence. > It is possible that there is file changes (added or file size changed) between the pg_start_backup and > pg_stop_backup. Yes. > Is the backup consistent? The tar or snapshot itself will not be consistent, it will have to be recovered at least until the end on the online backup. > 3) extension of 2): If the postgresql version is 9.2 > Is pg_basebackup better than pg_start_backup + file based backup + pg_stop_backup? Technically they are equivalent. pg_basebackup may be easier to use, the other option may provide more flexibility. One difference is that there can be many pg_basebackups running at the same time. > 4) For the WAL backup in postgresql 8.4 > After the archive mode is on and WAL is backup, how do I remove the old WAL files? > How about newer version (for example 9.2?) You never touch the files in pg_xlog on a running server. You remove old WAL archives when you don't need them any more. Usually that is when there is no older base backup. Yours, Laurenz Albe
Re: several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL
От
Patrick Dung
Дата:
Hi Able,
Please see the reply in-line:
From: Albe Laurenz <laurenz.albe@wien.gv.at>
To: Patrick Dung <patrick_dkt@yahoo.com.hk>; "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
Sent: Friday, August 23, 2013 4:14 PM
Subject: Re: [ADMIN] several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL
>Patrick Dung wrote:
>> I have some questions about backup on PostgreSQL:
>>
>> 1) pg_dumpall
>> I am sure that this is asked in somewhere: why the pg_dumpall does not support custom or tar format
>> like pg_dump.
>> I have heard that the custom format could be faster and may generate a smaller dump file.
>> Is the feature enhancement being developed?
>
>I suspect that the reason is that pg_dumpall just calls pg_dump
>and concatenates the output, and that probably only works
>in text format.
I would like to see the pg_dumpall support the custom/tar format.
>> 2) pg_start_backup in postgresql 8.4
>> After issuing pg_start_backup, we can take the file based backup (by tar/snapshot) on the postgresql
>> data directory.
>> If the use of file system snapshot is not possible.
>
>I don't understand this sentence.
After calling pg_start_backup, we can make backup by tar on PGDATA root directory.
Or we can take file system snapshot on PGDATA partition and make the backup on the snapshot.
I suppose the latter one is more consistent than the first one.
>> It is possible that there is file changes (added or file size changed) between the pg_start_backup and
>> pg_stop_backup.
>
>Yes.
>
>> Is the backup consistent?
>
>The tar or snapshot itself will not be consistent, it will have to be recovered
>at least until the end on the online backup.
I should ask: is the backup crash consistent?
Which means it is consistent at the time that the pg_start_backup is run.
>> 4) For the WAL backup in postgresql 8.4
>> After the archive mode is on and WAL is backup, how do I remove the old WAL files?
>> How about newer version (for example 9.2?)
>
>You never touch the files in pg_xlog on a running server.
>
>You remove old WAL archives when you don't need them any more.
>Usually that is when there is no older base backup.
>
Let me ask the question in this way:
4) For the WAL backup in postgresql 8.4
After setting the archive mode to on.
And the WAL files is copied to another directory eg. PGDATA\wal.
How do we remove the old WAL files in PGDATA\wal? We are not taking about PGDATA\pg_xlog.
How about newer version (for example 9.2?)
>Yours,
>Laurenz Albe
>
>--
>Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-admin
>
Please see the reply in-line:
Thanks,
Patrick Dung
From: Albe Laurenz <laurenz.albe@wien.gv.at>
To: Patrick Dung <patrick_dkt@yahoo.com.hk>; "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
Sent: Friday, August 23, 2013 4:14 PM
Subject: Re: [ADMIN] several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL
>Patrick Dung wrote:
>> I have some questions about backup on PostgreSQL:
>>
>> 1) pg_dumpall
>> I am sure that this is asked in somewhere: why the pg_dumpall does not support custom or tar format
>> like pg_dump.
>> I have heard that the custom format could be faster and may generate a smaller dump file.
>> Is the feature enhancement being developed?
>
>I suspect that the reason is that pg_dumpall just calls pg_dump
>and concatenates the output, and that probably only works
>in text format.
I would like to see the pg_dumpall support the custom/tar format.
>> 2) pg_start_backup in postgresql 8.4
>> After issuing pg_start_backup, we can take the file based backup (by tar/snapshot) on the postgresql
>> data directory.
>> If the use of file system snapshot is not possible.
>
>I don't understand this sentence.
After calling pg_start_backup, we can make backup by tar on PGDATA root directory.
Or we can take file system snapshot on PGDATA partition and make the backup on the snapshot.
I suppose the latter one is more consistent than the first one.
>> It is possible that there is file changes (added or file size changed) between the pg_start_backup and
>> pg_stop_backup.
>
>Yes.
>
>> Is the backup consistent?
>
>The tar or snapshot itself will not be consistent, it will have to be recovered
>at least until the end on the online backup.
I should ask: is the backup crash consistent?
Which means it is consistent at the time that the pg_start_backup is run.
>> 4) For the WAL backup in postgresql 8.4
>> After the archive mode is on and WAL is backup, how do I remove the old WAL files?
>> How about newer version (for example 9.2?)
>
>You never touch the files in pg_xlog on a running server.
>
>You remove old WAL archives when you don't need them any more.
>Usually that is when there is no older base backup.
>
Let me ask the question in this way:
4) For the WAL backup in postgresql 8.4
After setting the archive mode to on.
And the WAL files is copied to another directory eg. PGDATA\wal.
How do we remove the old WAL files in PGDATA\wal? We are not taking about PGDATA\pg_xlog.
How about newer version (for example 9.2?)
>Yours,
>Laurenz Albe
>
>--
>Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-admin
>
Re: several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL
От
Albe Laurenz
Дата:
> Patrick Dung wrote: >>> I have some questions about backup on PostgreSQL: >>> >>> 1) pg_dumpall >>> I am sure that this is asked in somewhere: why the pg_dumpall does not support custom or tar format >>> like pg_dump. >>> I have heard that the custom format could be faster and may generate a smaller dump file. >>> Is the feature enhancement being developed? >> >> I suspect that the reason is that pg_dumpall just calls pg_dump >> and concatenates the output, and that probably only works >> in text format. > > I would like to see the pg_dumpall support the custom/tar format. I think that will not happen for the reason stated above. I concur that it would be nice. You can do it yourself by using pg_dumpall to just dump the global data (switch -g) and then call "pgdump -Fc" for all the databases in the cluster. >>> Is the backup consistent? >> >> The tar or snapshot itself will not be consistent, it will have to be recovered >> at least until the end on the online backup. > > I should ask: is the backup crash consistent? > Which means it is consistent at the time that the pg_start_backup is run. I am not sure that I understand you right, but let me explain: You cannot perform crash recovery with a backup unless it was taken with a truly atomic snapshot of everything at once. In all other cases, the individual files in the backup are not consistent in ways that crash recovery cannot repair, and even if you manage to get the backup running using crash recovery (by removing the "backup_label" file), you will end up with a corrupt database that will eat and destroy your data. On the other hand, the cluster from which the backup is being taken is fully operational and consistent. Does that answer your question? >>> 4) For the WAL backup in postgresql 8.4 >>> After the archive mode is on and WAL is backup, how do I remove the old WAL files? >>> How about newer version (for example 9.2?) >> >> You never touch the files in pg_xlog on a running server. >> >> You remove old WAL archives when you don't need them any more. >> Usually that is when there is no older base backup. >> > > Let me ask the question in this way: > > 4) For the WAL backup in postgresql 8.4 > After setting the archive mode to on. > And the WAL files is copied to another directory eg. PGDATA\wal. > How do we remove the old WAL files in PGDATA\wal? We are not taking about PGDATA\pg_xlog. > > How about newer version (for example 9.2?) That is up to you in any version of PostgreSQL. PostgreSQL does not know how long you need to keep your archived WAL files, so it will never delete them. You can use cron jobs or similar things to remove WAL archives older than a certain time, or you can use tools like Barman (http://www.pgbarman.org/) that manage your backups for you. Yours, Laurenz Albe
Re: several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL
От
Kevin Grittner
Дата:
Patrick Dung <patrick_dkt@yahoo.com.hk> wrote: > Albe Laurenz <laurenz.albe@wien.gv.at> >> Patrick Dung wrote: >>> It is possible that there is file changes (added or file size >>> changed) between the pg_start_backup and pg_stop_backup. >> >> Yes. >> >>> Is the backup consistent? >> >> The tar or snapshot itself will not be consistent, it will have >> to be recovered at least until the end on the online backup. > > I should ask: is the backup crash consistent? PITR restore procedures will use the crash recovery mechanism to make the database consistent, if that's what you mean. > Which means it is consistent at the time that the pg_start_backup > is run. No. It will be consistent with the time that pg_stop_backup was run, or any later point in time that you choose, as long as you have WAL to that point in time. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL
От
Patrick Dung
Дата:
'Crash consistent' is a term I heard in some oracle white papers..
If I am correct, it means the database would be consistent after the crash recovery is done.
>> Which means it is consistent at the time that the pg_start_backup
>> is run.
>No. It will be consistent with the time that pg_stop_backup was
>run, or any later point in time that you choose, as long as you
>have WAL to that point in time.
1) pg_start_backup
2) backup by tar
3) pg_stop_backup
Thanks and regards,
Patrick Dung
From: Kevin Grittner <kgrittn@ymail.com>
To: Patrick Dung <patrick_dkt@yahoo.com.hk>; Albe Laurenz <laurenz.albe@wien.gv.at>; "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
Sent: Wednesday, August 28, 2013 3:31 AM
Subject: Re: [ADMIN] several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL
Patrick Dung <patrick_dkt@yahoo.com.hk> wrote:
> Albe Laurenz <laurenz.albe@wien.gv.at>
>> Patrick Dung wrote:
>>> It is possible that there is file changes (added or file size
>>> changed) between the pg_start_backup and pg_stop_backup.
>>
>> Yes.
>>
>>> Is the backup consistent?
>>
>> The tar or snapshot itself will not be consistent, it will have
>> to be recovered at least until the end on the online backup.
>
> I should ask: is the backup crash consistent?
PITR restore procedures will use the crash recovery mechanism to
make the database consistent, if that's what you mean.
> Which means it is consistent at the time that the pg_start_backup
> is run.
No. It will be consistent with the time that pg_stop_backup was
run, or any later point in time that you choose, as long as you
have WAL to that point in time.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Now that there is a setting to give a cluster a "name", it would be nice to have an escape sequence in the log_line_prefix setting that could reference the cluster_name. Comments? Can we get this on the "to do list"? Evan.
Hi, (x-posting to -hackers, more relevant audience) On 2016-01-29 22:19:45 -0800, Evan Rempel wrote: > Now that there is a setting to give a cluster a "name", it would be nice to > have an escape sequence in the log_line_prefix setting that could reference > the cluster_name. I've argued[1][2] for this when cluster_name was introduced, but back then I seemed to have been the only one arguing for it. Josh later jumped that train. Given that we now had a number of people wishing for this, can we maybe reconsider? Greetings, Andres Freund [1] http://archives.postgresql.org/message-id/CADLWmXUm%3D7Y3UORZnGMdSC6p1eymO0k0JkH4NKr4KstdWk0P7g%40mail.gmail.com [2] http://archives.postgresql.org/message-id/20140505101007.GU12715@awork2.anarazel.de
On 02/08/2016 06:24 AM, Andres Freund wrote: > On 2016-01-29 22:19:45 -0800, Evan Rempel wrote: >> Now that there is a setting to give a cluster a "name", it would be nice to >> have an escape sequence in the log_line_prefix setting that could reference >> the cluster_name. > > I've argued[1][2] for this when cluster_name was introduced, but back > then I seemed to have been the only one arguing for it. Josh later > jumped that train. > > Given that we now had a number of people wishing for this, can we maybe > reconsider? Seems like a reasonable idea to me. But if we add a log_line_prefix setting, shouldn't we also add it to csvlog output too? Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
On Tue, Feb 9, 2016 at 5:30 AM, Joe Conway <mail@joeconway.com> wrote: > On 02/08/2016 06:24 AM, Andres Freund wrote: >> On 2016-01-29 22:19:45 -0800, Evan Rempel wrote: >>> Now that there is a setting to give a cluster a "name", it would be nice to >>> have an escape sequence in the log_line_prefix setting that could reference >>> the cluster_name. >> >> I've argued[1][2] for this when cluster_name was introduced, but back >> then I seemed to have been the only one arguing for it. Josh later >> jumped that train. >> >> Given that we now had a number of people wishing for this, can we maybe >> reconsider? > > Seems like a reasonable idea to me. But if we add a log_line_prefix > setting, shouldn't we also add it to csvlog output too? Here's a tiny patch adding support for %C to log_line_prefix (this was part of the cluster_name patch that didn't go it). Given that csvlog's output format is hardcoded in write_csvlog, how is it supposed to evolve without upsetting consumers of this data? Wouldn't we first need to add a GUC that lets you control the columns it outputs? -- Thomas Munro http://www.enterprisedb.com
Вложения
Thomas, * Thomas Munro (thomas.munro@enterprisedb.com) wrote: > On Tue, Feb 9, 2016 at 5:30 AM, Joe Conway <mail@joeconway.com> wrote: > > On 02/08/2016 06:24 AM, Andres Freund wrote: > >> On 2016-01-29 22:19:45 -0800, Evan Rempel wrote: > >>> Now that there is a setting to give a cluster a "name", it would be nice to > >>> have an escape sequence in the log_line_prefix setting that could reference > >>> the cluster_name. > >> > >> I've argued[1][2] for this when cluster_name was introduced, but back > >> then I seemed to have been the only one arguing for it. Josh later > >> jumped that train. > >> > >> Given that we now had a number of people wishing for this, can we maybe > >> reconsider? > > > > Seems like a reasonable idea to me. But if we add a log_line_prefix > > setting, shouldn't we also add it to csvlog output too? > > Here's a tiny patch adding support for %C to log_line_prefix (this was > part of the cluster_name patch that didn't go it). > > Given that csvlog's output format is hardcoded in write_csvlog, how is > it supposed to evolve without upsetting consumers of this data? > Wouldn't we first need to add a GUC that lets you control the columns > it outputs? Not sure if you really want to go there, but I do agree with you and there's a thread from a few years back about something similar: http://www.postgresql.org/message-id/flat/20110112142345.GA4933@tamriel.snowman.net Included in that thread is a patch, which likely requires some dusting off, to add exactly that ability. Thanks! Stephen
Вложения
On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote: > > Now that there is a setting to give a cluster a "name", it would be > nice to have an escape sequence in the log_line_prefix setting that > could reference the cluster_name. > > Comments? > > Can we get this on the "to do list"? Uh, wouldn't the cluster name be the same on every line? Is that useful? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Roman grave inscription +
On 02/18/2016 01:42 PM, Bruce Momjian wrote: > On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote: >> Now that there is a setting to give a cluster a "name", it would be >> nice to have an escape sequence in the log_line_prefix setting that >> could reference the cluster_name. >> >> Comments? >> >> Can we get this on the "to do list"? > Uh, wouldn't the cluster name be the same on every line? Is that > useful? > That is correct. It does not make sense when looking into local log files, but you would not use the log prefix in that environment. When this feature would be useful is when you are sending the logs to syslog from many postgresql clusters, and on a centralserver, You want to collect logs together that use the same cluster name. It just provides more flexibility to addcontextual information to the log lines. -- Evan Rempel
Bruce Momjian <bruce@momjian.us> writes: > On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote: >> Now that there is a setting to give a cluster a "name", it would be >> nice to have an escape sequence in the log_line_prefix setting that >> could reference the cluster_name. > Uh, wouldn't the cluster name be the same on every line? Is that > useful? I guess if you were running multiple clusters and having them all log to the same syslog daemon, there might be multiple clusters' output in the same log ... but we already have ways to disambiguate that case, ie, syslog_ident. Likewise for Windows' event log. I don't think we should consider it supported for multiple clusters to be logging into the same plain files. So yeah, the use case for this seems unclear. regards, tom lane
On 02/18/2016 01:53 PM, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: >> On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote: >>> Now that there is a setting to give a cluster a "name", it would be >>> nice to have an escape sequence in the log_line_prefix setting that >>> could reference the cluster_name. >> Uh, wouldn't the cluster name be the same on every line? Is that >> useful? > I guess if you were running multiple clusters and having them all log > to the same syslog daemon, there might be multiple clusters' output > in the same log ... but we already have ways to disambiguate that case, > ie, syslog_ident. Likewise for Windows' event log. I don't think we > should consider it supported for multiple clusters to be logging into > the same plain files. So yeah, the use case for this seems unclear. > > regards, tom lane The use case is for logging to syslog. Using the syslog_ident effectively changes the "program name" to something other than postgres. This would break any typeof SIEM system that would be looking for events from the program postgres. This means that if you want to have a clusterspecific contextual identifier, it needs to be in the log message content, which for the purposes of postgresql meansplacing it into the log_line_prefix. It could be hard coded as part of the log_line_prefix such as log_line_prefix = 'cluster-db1:%d:%u:%h ' That would work, but that means that my cluster name needs to be in the configuration file twice and that opens up the errorscenario of not making them exactly the same. Allowing for the cluster_name to be referenced by an escape sequence would make manual and automated configuration managementeasier. Thanks for listening. -- Evan Rempel
On Fri, Feb 19, 2016 at 7:05 AM, Evan Rempel <erempel@uvic.ca> wrote: > On 02/18/2016 01:53 PM, Tom Lane wrote: >> >> Bruce Momjian <bruce@momjian.us> writes: >>> >>> On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote: >>>> >>>> Now that there is a setting to give a cluster a "name", it would be >>>> nice to have an escape sequence in the log_line_prefix setting that >>>> could reference the cluster_name. >>> >>> Uh, wouldn't the cluster name be the same on every line? Is that >>> useful? >> >> I guess if you were running multiple clusters and having them all log >> to the same syslog daemon, there might be multiple clusters' output >> in the same log ... but we already have ways to disambiguate that case, >> ie, syslog_ident. Likewise for Windows' event log. I don't think we >> should consider it supported for multiple clusters to be logging into >> the same plain files. So yeah, the use case for this seems unclear. >> >> regards, tom lane > > > The use case is for logging to syslog. > > Using the syslog_ident effectively changes the "program name" to something > other than postgres. This would break any type of SIEM system that would be > looking for events from the program postgres. This means that if you want to > have a cluster specific contextual identifier, it needs to be in the log > message content, which for the purposes of postgresql means placing it into > the log_line_prefix. > > It could be hard coded as part of the log_line_prefix such as > > log_line_prefix = 'cluster-db1:%d:%u:%h ' > > That would work, but that means that my cluster name needs to be in the > configuration file twice and that opens up the error scenario of not making > them exactly the same. > > Allowing for the cluster_name to be referenced by an escape sequence would > make manual and automated configuration management easier. What if log_destination is set to csvlog? The same cluster_name is output in every lines, and there is no way to disable it because something like csv_line_prefix has not been supported yet? Regards, -- Fujii Masao
On 02/19/2016 05:56 AM, Fujii Masao wrote: > On Fri, Feb 19, 2016 at 7:05 AM, Evan Rempel <erempel@uvic.ca> wrote: >> On 02/18/2016 01:53 PM, Tom Lane wrote: >>> Bruce Momjian <bruce@momjian.us> writes: >>>> On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote: >>>>> Now that there is a setting to give a cluster a "name", it would be >>>>> nice to have an escape sequence in the log_line_prefix setting that >>>>> could reference the cluster_name. >>>> Uh, wouldn't the cluster name be the same on every line? Is that >>>> useful? >>> I guess if you were running multiple clusters and having them all log >>> to the same syslog daemon, there might be multiple clusters' output >>> in the same log ... but we already have ways to disambiguate that case, >>> ie, syslog_ident. Likewise for Windows' event log. I don't think we >>> should consider it supported for multiple clusters to be logging into >>> the same plain files. So yeah, the use case for this seems unclear. >>> >>> regards, tom lane >> >> The use case is for logging to syslog. >> >> Using the syslog_ident effectively changes the "program name" to something >> other than postgres. This would break any type of SIEM system that would be >> looking for events from the program postgres. This means that if you want to >> have a cluster specific contextual identifier, it needs to be in the log >> message content, which for the purposes of postgresql means placing it into >> the log_line_prefix. >> >> It could be hard coded as part of the log_line_prefix such as >> >> log_line_prefix = 'cluster-db1:%d:%u:%h ' >> >> That would work, but that means that my cluster name needs to be in the >> configuration file twice and that opens up the error scenario of not making >> them exactly the same. >> >> Allowing for the cluster_name to be referenced by an escape sequence would >> make manual and automated configuration management easier. > What if log_destination is set to csvlog? The same cluster_name is > output in every lines, and there is no way to disable it because > something like csv_line_prefix has not been supported yet? > > Regards, > I would think that in the case of using a csvlog that you would configure the log_line_prefix to not include the escape sequenceof the cluster_name. The administrator would tune all of the logging components to work together, or am I missingsomething here? -- Evan Rempel
On Sat, Feb 20, 2016 at 12:52 AM, Evan Rempel <erempel@uvic.ca> wrote: > On 02/19/2016 05:56 AM, Fujii Masao wrote: >> >> On Fri, Feb 19, 2016 at 7:05 AM, Evan Rempel <erempel@uvic.ca> wrote: >>> >>> On 02/18/2016 01:53 PM, Tom Lane wrote: >>>> >>>> Bruce Momjian <bruce@momjian.us> writes: >>>>> >>>>> On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote: >>>>>> >>>>>> Now that there is a setting to give a cluster a "name", it would be >>>>>> nice to have an escape sequence in the log_line_prefix setting that >>>>>> could reference the cluster_name. >>>>> >>>>> Uh, wouldn't the cluster name be the same on every line? Is that >>>>> useful? >>>> >>>> I guess if you were running multiple clusters and having them all log >>>> to the same syslog daemon, there might be multiple clusters' output >>>> in the same log ... but we already have ways to disambiguate that case, >>>> ie, syslog_ident. Likewise for Windows' event log. I don't think we >>>> should consider it supported for multiple clusters to be logging into >>>> the same plain files. So yeah, the use case for this seems unclear. >>>> >>>> regards, tom lane >>> >>> >>> The use case is for logging to syslog. >>> >>> Using the syslog_ident effectively changes the "program name" to >>> something >>> other than postgres. This would break any type of SIEM system that would >>> be >>> looking for events from the program postgres. This means that if you want >>> to >>> have a cluster specific contextual identifier, it needs to be in the log >>> message content, which for the purposes of postgresql means placing it >>> into >>> the log_line_prefix. >>> >>> It could be hard coded as part of the log_line_prefix such as >>> >>> log_line_prefix = 'cluster-db1:%d:%u:%h ' >>> >>> That would work, but that means that my cluster name needs to be in the >>> configuration file twice and that opens up the error scenario of not >>> making >>> them exactly the same. >>> >>> Allowing for the cluster_name to be referenced by an escape sequence >>> would >>> make manual and automated configuration management easier. >> >> What if log_destination is set to csvlog? The same cluster_name is >> output in every lines, and there is no way to disable it because >> something like csv_line_prefix has not been supported yet? >> >> Regards, >> > > I would think that in the case of using a csvlog that you would configure > the log_line_prefix to not include the escape sequence of the cluster_name. > The administrator would tune all of the logging components to work together, > or am I missing something here? log_line_prefix is only for stderr and syslog, not for csvlog. It cannot control what to output in csvlog, for now. Regards, -- Fujii Masao
On Fri, Feb 19, 2016 at 07:52:25AM -0800, Evan Rempel wrote: > >>Allowing for the cluster_name to be referenced by an escape sequence > >>would make manual and automated configuration management easier. > >What if log_destination is set to csvlog? The same cluster_name is > >output in every lines, and there is no way to disable it because > >something like csv_line_prefix has not been supported yet? > > > >Regards, > > > > I would think that in the case of using a csvlog that you would > configure the log_line_prefix to not include the escape sequence of > the cluster_name. The administrator would tune all of the logging > components to work together, or am I missing something here? csvlog isn't affected by log_line_prefix --- it outputs all fields every time: http://www.postgresql.org/docs/9.5/static/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Roman grave inscription +
On 02/19/2016 08:10 AM, Fujii Masao wrote: > On Sat, Feb 20, 2016 at 12:52 AM, Evan Rempel <erempel@uvic.ca> wrote: >> On 02/19/2016 05:56 AM, Fujii Masao wrote: >>> On Fri, Feb 19, 2016 at 7:05 AM, Evan Rempel <erempel@uvic.ca> wrote: >>>> On 02/18/2016 01:53 PM, Tom Lane wrote: >>>>> Bruce Momjian <bruce@momjian.us> writes: >>>>>> On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote: >>>>>>> Now that there is a setting to give a cluster a "name", it would be >>>>>>> nice to have an escape sequence in the log_line_prefix setting that >>>>>>> could reference the cluster_name. >>>>>> Uh, wouldn't the cluster name be the same on every line? Is that >>>>>> useful? >>>>> I guess if you were running multiple clusters and having them all log >>>>> to the same syslog daemon, there might be multiple clusters' output >>>>> in the same log ... but we already have ways to disambiguate that case, >>>>> ie, syslog_ident. Likewise for Windows' event log. I don't think we >>>>> should consider it supported for multiple clusters to be logging into >>>>> the same plain files. So yeah, the use case for this seems unclear. >>>>> >>>>> regards, tom lane >>>> >>>> The use case is for logging to syslog. >>>> >>>> Using the syslog_ident effectively changes the "program name" to >>>> something >>>> other than postgres. This would break any type of SIEM system that would >>>> be >>>> looking for events from the program postgres. This means that if you want >>>> to >>>> have a cluster specific contextual identifier, it needs to be in the log >>>> message content, which for the purposes of postgresql means placing it >>>> into >>>> the log_line_prefix. >>>> >>>> It could be hard coded as part of the log_line_prefix such as >>>> >>>> log_line_prefix = 'cluster-db1:%d:%u:%h ' >>>> >>>> That would work, but that means that my cluster name needs to be in the >>>> configuration file twice and that opens up the error scenario of not >>>> making >>>> them exactly the same. >>>> >>>> Allowing for the cluster_name to be referenced by an escape sequence >>>> would >>>> make manual and automated configuration management easier. >>> What if log_destination is set to csvlog? The same cluster_name is >>> output in every lines, and there is no way to disable it because >>> something like csv_line_prefix has not been supported yet? >>> >>> Regards, >>> >> I would think that in the case of using a csvlog that you would configure >> the log_line_prefix to not include the escape sequence of the cluster_name. >> The administrator would tune all of the logging components to work together, >> or am I missing something here? > log_line_prefix is only for stderr and syslog, not for csvlog. > It cannot control what to output in csvlog, for now. > > Regards, In that case, my request for a new escape sequence for the log_line_prefix will have no impact on csvlog output because itwould only affect uses cases where log_line_prefix is used AND where the administrator has added the cluster name into the log_line_prefix. -- Evan
On Sat, Feb 20, 2016 at 1:19 AM, Evan Rempel <erempel@uvic.ca> wrote: > On 02/19/2016 08:10 AM, Fujii Masao wrote: >> >> On Sat, Feb 20, 2016 at 12:52 AM, Evan Rempel <erempel@uvic.ca> wrote: >>> >>> On 02/19/2016 05:56 AM, Fujii Masao wrote: >>>> >>>> On Fri, Feb 19, 2016 at 7:05 AM, Evan Rempel <erempel@uvic.ca> wrote: >>>>> >>>>> On 02/18/2016 01:53 PM, Tom Lane wrote: >>>>>> >>>>>> Bruce Momjian <bruce@momjian.us> writes: >>>>>>> >>>>>>> On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote: >>>>>>>> >>>>>>>> Now that there is a setting to give a cluster a "name", it would be >>>>>>>> nice to have an escape sequence in the log_line_prefix setting that >>>>>>>> could reference the cluster_name. >>>>>>> >>>>>>> Uh, wouldn't the cluster name be the same on every line? Is that >>>>>>> useful? >>>>>> >>>>>> I guess if you were running multiple clusters and having them all log >>>>>> to the same syslog daemon, there might be multiple clusters' output >>>>>> in the same log ... but we already have ways to disambiguate that >>>>>> case, >>>>>> ie, syslog_ident. Likewise for Windows' event log. I don't think we >>>>>> should consider it supported for multiple clusters to be logging into >>>>>> the same plain files. So yeah, the use case for this seems unclear. >>>>>> >>>>>> regards, tom lane >>>>> >>>>> >>>>> The use case is for logging to syslog. >>>>> >>>>> Using the syslog_ident effectively changes the "program name" to >>>>> something >>>>> other than postgres. This would break any type of SIEM system that >>>>> would >>>>> be >>>>> looking for events from the program postgres. This means that if you >>>>> want >>>>> to >>>>> have a cluster specific contextual identifier, it needs to be in the >>>>> log >>>>> message content, which for the purposes of postgresql means placing it >>>>> into >>>>> the log_line_prefix. >>>>> >>>>> It could be hard coded as part of the log_line_prefix such as >>>>> >>>>> log_line_prefix = 'cluster-db1:%d:%u:%h ' >>>>> >>>>> That would work, but that means that my cluster name needs to be in the >>>>> configuration file twice and that opens up the error scenario of not >>>>> making >>>>> them exactly the same. >>>>> >>>>> Allowing for the cluster_name to be referenced by an escape sequence >>>>> would >>>>> make manual and automated configuration management easier. >>>> >>>> What if log_destination is set to csvlog? The same cluster_name is >>>> output in every lines, and there is no way to disable it because >>>> something like csv_line_prefix has not been supported yet? >>>> >>>> Regards, >>>> >>> I would think that in the case of using a csvlog that you would configure >>> the log_line_prefix to not include the escape sequence of the >>> cluster_name. >>> The administrator would tune all of the logging components to work >>> together, >>> or am I missing something here? >> >> log_line_prefix is only for stderr and syslog, not for csvlog. >> It cannot control what to output in csvlog, for now. >> >> Regards, > > > In that case, my request for a new escape sequence for the log_line_prefix > will have no impact on csvlog output because it would > only affect uses cases where log_line_prefix is used AND where the > administrator has added the cluster name into the log_line_prefix. So cluster_name will never appear in csvlog. Maybe I'm OK with that. OTOH, there are many "escape sequences" available in log_line_prefix, and their corresponding information is basically output in csvlog, but only cluster_name is not. That design looks a bit odd. Regards, -- Fujii Masao
On Tue, Feb 9, 2016 at 3:24 AM, Andres Freund <andres@anarazel.de> wrote: > Hi, > > (x-posting to -hackers, more relevant audience) > > On 2016-01-29 22:19:45 -0800, Evan Rempel wrote: >> Now that there is a setting to give a cluster a "name", it would be nice to >> have an escape sequence in the log_line_prefix setting that could reference >> the cluster_name. > > I've argued[1][2] for this when cluster_name was introduced, but back > then I seemed to have been the only one arguing for it. Josh later > jumped that train. > > Given that we now had a number of people wishing for this, can we maybe > reconsider? +1 I missed the part of this conversation that took place on -admin. I agree with Evan Rempel's post over there[1] that it's useful for syslog users, and that it's not ideal to have to hijack syslog_ident or explicitly copy the name into log_line_prefix. [1] https://www.postgresql.org/message-id/56C64017.7050303%40uvic.ca -- Thomas Munro http://www.enterprisedb.com
* Thomas Munro (thomas.munro@enterprisedb.com) wrote: > On Tue, Feb 9, 2016 at 3:24 AM, Andres Freund <andres@anarazel.de> wrote: > > (x-posting to -hackers, more relevant audience) > > > > On 2016-01-29 22:19:45 -0800, Evan Rempel wrote: > >> Now that there is a setting to give a cluster a "name", it would be nice to > >> have an escape sequence in the log_line_prefix setting that could reference > >> the cluster_name. > > > > I've argued[1][2] for this when cluster_name was introduced, but back > > then I seemed to have been the only one arguing for it. Josh later > > jumped that train. > > > > Given that we now had a number of people wishing for this, can we maybe > > reconsider? > > +1 > > I missed the part of this conversation that took place on -admin. I > agree with Evan Rempel's post over there[1] that it's useful for > syslog users, and that it's not ideal to have to hijack syslog_ident > or explicitly copy the name into log_line_prefix. > > [1] https://www.postgresql.org/message-id/56C64017.7050303%40uvic.ca I'm with Thomas on this and I disagree that the "csvlog bloat" argument has merit. If we're worried about bloat in csv then we should provide a way for users to control what goes into the csvlog, not argue that something which is clearly useful be excluded. There's already a patch out there for adding a way to control what goes into csvlog and it probably wouldn't be too hard to update it. I know, because I wrote it. I continue to feel that it would be useful to add and that tools which need to deal with csvlogs should be updated to handle being told what's in it, just like pgBadger does for our much more complicated and painful regular log files. That would remove these bars to moving forward with more flexibility in log_line_prefix and csvlog output. Thanks! Stephen
Вложения
On Mon, Oct 17, 2016 at 6:24 PM, Stephen Frost <sfrost@snowman.net> wrote: > I'm with Thomas on this and I disagree that the "csvlog bloat" argument > has merit. If we're worried about bloat in csv then we should provide a > way for users to control what goes into the csvlog, not argue that > something which is clearly useful be excluded. I agree, but I think if we do that it would be highly desirable do something to make the format discoverable. For example, at the beginning of each file and whenever the format changes, we write some kind of unmistakable header line into the file identifying what fields will be present on each line thereafter. It's undesirable for log analysis tools to need to contain logic that tries (imperfectly, no doubt) to reverse-engineer the field list. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Mon, Oct 17, 2016 at 6:24 PM, Stephen Frost <sfrost@snowman.net> wrote: > > I'm with Thomas on this and I disagree that the "csvlog bloat" argument > > has merit. If we're worried about bloat in csv then we should provide a > > way for users to control what goes into the csvlog, not argue that > > something which is clearly useful be excluded. > > I agree, but I think if we do that it would be highly desirable do > something to make the format discoverable. For example, at the > beginning of each file and whenever the format changes, we write some > kind of unmistakable header line into the file identifying what fields > will be present on each line thereafter. It's undesirable for log > analysis tools to need to contain logic that tries (imperfectly, no > doubt) to reverse-engineer the field list. Hmm, I wouldn't use the same file for different formats. If the format is changed, a different file should be used. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Oct 17, 2016 at 6:24 PM, Stephen Frost <sfrost@snowman.net> wrote: >> I'm with Thomas on this and I disagree that the "csvlog bloat" argument >> has merit. If we're worried about bloat in csv then we should provide a >> way for users to control what goes into the csvlog, not argue that >> something which is clearly useful be excluded. > I agree, but I think if we do that it would be highly desirable do > something to make the format discoverable. For example, at the > beginning of each file and whenever the format changes, we write some > kind of unmistakable header line into the file identifying what fields > will be present on each line thereafter. It's undesirable for log > analysis tools to need to contain logic that tries (imperfectly, no > doubt) to reverse-engineer the field list. Automatic "csv header" at the start of each logfile, perhaps? And force a logfile switch if it changes. regards, tom lane