Обсуждение: several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL

Поиск
Список
Период
Сортировка

several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL

От
Patrick Dung
Дата:
Hi all,

I have some questions about backup on PostgreSQL:

1) pg_dumpall
I am sure that this is asked in somewhere: why the pg_dumpall does not support custom or tar format like pg_dump.
I have heard that the custom format could be faster and may generate a smaller dump file.
Is the feature enhancement being developed?

2) pg_start_backup in postgresql 8.4
After issuing pg_start_backup, we can take the file based backup (by tar/snapshot) on the postgresql data directory.
If the use of file system snapshot is not possible.
It is possible that there is file changes (added or file size changed) between the pg_start_backup and pg_stop_backup.
Is the backup consistent?

3) extension of 2): If the postgresql version is 9.2
Is pg_basebackup better than pg_start_backup + file based backup + pg_stop_backup?

4) For the WAL backup in postgresql 8.4
After the archive mode is on and WAL is backup, how do I remove the old WAL files?
How about newer version (for example 9.2?)

Thanks and regards,
Patrick


Re: several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL

От
Albe Laurenz
Дата:
Patrick Dung wrote:
> I have some questions about backup on PostgreSQL:
> 
> 1) pg_dumpall
> I am sure that this is asked in somewhere: why the pg_dumpall does not support custom or tar format
> like pg_dump.
> I have heard that the custom format could be faster and may generate a smaller dump file.
> Is the feature enhancement being developed?

I suspect that the reason is that pg_dumpall just calls pg_dump
and concatenates the output, and that probably only works
in text format.

> 2) pg_start_backup in postgresql 8.4
> After issuing pg_start_backup, we can take the file based backup (by tar/snapshot) on the postgresql
> data directory.
> If the use of file system snapshot is not possible.

I don't understand this sentence.

> It is possible that there is file changes (added or file size changed) between the pg_start_backup and
> pg_stop_backup.

Yes.

> Is the backup consistent?

The tar or snapshot itself will not be consistent, it will have to be recovered
at least until the end on the online backup.

> 3) extension of 2): If the postgresql version is 9.2
> Is pg_basebackup better than pg_start_backup + file based backup + pg_stop_backup?

Technically they are equivalent.
pg_basebackup may be easier to use, the other option may provide
more flexibility.

One difference is that there can be many pg_basebackups running at the
same time.

> 4) For the WAL backup in postgresql 8.4
> After the archive mode is on and WAL is backup, how do I remove the old WAL files?
> How about newer version (for example 9.2?)

You never touch the files in pg_xlog on a running server.

You remove old WAL archives when you don't need them any more.
Usually that is when there is no older base backup.

Yours,
Laurenz Albe

Re: several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL

От
Patrick Dung
Дата:
Hi Able,

Please see the reply in-line:

Thanks,
Patrick Dung


From: Albe Laurenz <laurenz.albe@wien.gv.at>
To: Patrick Dung <patrick_dkt@yahoo.com.hk>; "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
Sent: Friday, August 23, 2013 4:14 PM
Subject: Re: [ADMIN] several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL

>Patrick Dung wrote:
>> I have some questions about backup on PostgreSQL:
>>
>> 1) pg_dumpall
>> I am sure that this is asked in somewhere: why the pg_dumpall does not support custom or tar format
>> like pg_dump.
>> I have heard that the custom format could be faster and may generate a smaller dump file.
>> Is the feature enhancement being developed?
>
>I suspect that the reason is that pg_dumpall just calls pg_dump
>and concatenates the output, and that probably only works
>in text format.

I would like to see the pg_dumpall support the custom/tar format.

>> 2) pg_start_backup in postgresql 8.4
>> After issuing pg_start_backup, we can take the file based backup (by tar/snapshot) on the postgresql
>> data directory.
>> If the use of file system snapshot is not possible.
>
>I don't understand this sentence.

After calling pg_start_backup, we can make backup by tar on PGDATA root directory.
Or we can take file system snapshot on PGDATA partition and make the backup on the snapshot.
I suppose the latter one is more consistent than the first one.


>> It is possible that there is file changes (added or file size changed) between the pg_start_backup and
>> pg_stop_backup.
>
>Yes.
>
>> Is the backup consistent?
>
>The tar or snapshot itself will not be consistent, it will have to be recovered
>at least until the end on the online backup.

I should ask: is the backup crash consistent?
Which means it is consistent at the time that the pg_start_backup is run.


>> 4) For the WAL backup in postgresql 8.4
>> After the archive mode is on and WAL is backup, how do I remove the old WAL files?
>> How about newer version (for example 9.2?)
>
>You never touch the files in pg_xlog on a running server.
>
>You remove old WAL archives when you don't need them any more.
>Usually that is when there is no older base backup.
>

Let me ask the question in this way:

4) For the WAL backup in postgresql 8.4
After setting the archive mode to on.
And the WAL files is copied to another directory eg. PGDATA\wal.
How do we remove the old WAL files in PGDATA\wal? We are not taking about PGDATA\pg_xlog.

How about newer version (for example 9.2?)


>Yours,
>Laurenz Albe
>
>--
>Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-admin
>



Re: several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL

От
Albe Laurenz
Дата:
> Patrick Dung wrote:
>>> I have some questions about backup on PostgreSQL:
>>>
>>> 1) pg_dumpall
>>> I am sure that this is asked in somewhere: why the pg_dumpall does not support custom or tar format
>>> like pg_dump.
>>> I have heard that the custom format could be faster and may generate a smaller dump file.
>>> Is the feature enhancement being developed?
>>
>> I suspect that the reason is that pg_dumpall just calls pg_dump
>> and concatenates the output, and that probably only works
>> in text format.
> 
> I would like to see the pg_dumpall support the custom/tar format.

I think that will not happen for the reason stated above.
I concur that it would be nice.

You can do it yourself by using pg_dumpall to just dump the
global data (switch -g) and then call "pgdump -Fc" for all
the databases in the cluster.

>>> Is the backup consistent?
>>
>> The tar or snapshot itself will not be consistent, it will have to be recovered
>> at least until the end on the online backup.
> 
> I should ask: is the backup crash consistent?
> Which means it is consistent at the time that the pg_start_backup is run.

I am not sure that I understand you right, but let me explain:

You cannot perform crash recovery with a backup unless it was taken
with a truly atomic snapshot of everything at once.
In all other cases, the individual files in the backup are not
consistent in ways that crash recovery cannot repair, and even if
you manage to get the backup running using crash recovery
(by removing the "backup_label" file), you will end up
with a corrupt database that will eat and destroy your data.

On the other hand, the cluster from which the backup is being
taken is fully operational and consistent.

Does that answer your question?

>>> 4) For the WAL backup in postgresql 8.4
>>> After the archive mode is on and WAL is backup, how do I remove the old WAL files?
>>> How about newer version (for example 9.2?)
>>
>> You never touch the files in pg_xlog on a running server.
>>
>> You remove old WAL archives when you don't need them any more.
>> Usually that is when there is no older base backup.
>>
> 
> Let me ask the question in this way:
> 
> 4) For the WAL backup in postgresql 8.4
> After setting the archive mode to on.
> And the WAL files is copied to another directory eg. PGDATA\wal.
> How do we remove the old WAL files in PGDATA\wal? We are not taking about PGDATA\pg_xlog.
> 
> How about newer version (for example 9.2?)

That is up to you in any version of PostgreSQL.
PostgreSQL does not know how long you need to keep your archived
WAL files, so it will never delete them.

You can use cron jobs or similar things to remove WAL archives
older than a certain time, or you can use tools like Barman
(http://www.pgbarman.org/) that manage your backups for you.

Yours,
Laurenz Albe

Re: several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL

От
Kevin Grittner
Дата:
Patrick Dung <patrick_dkt@yahoo.com.hk> wrote:
> Albe Laurenz <laurenz.albe@wien.gv.at>
>> Patrick Dung wrote:

>>> It is possible that there is file changes (added or file size
>>> changed) between the pg_start_backup and pg_stop_backup.
>>
>> Yes.
>>
>>> Is the backup consistent?
>>
>> The tar or snapshot itself will not be consistent, it will have
>> to be recovered at least until the end on the online backup.
>
> I should ask: is the backup crash consistent?

PITR restore procedures will use the crash recovery mechanism to
make the database consistent, if that's what you mean.

> Which means it is consistent at the time that the pg_start_backup
> is run.

No.  It will be consistent with the time that pg_stop_backup was
run, or any later point in time that you choose, as long as you
have WAL to that point in time.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL

От
Patrick Dung
Дата:
'Crash consistent' is a term I heard in some oracle white papers..
If I am correct, it means the database would be consistent after the crash recovery is done.

>> Which means it is consistent at the time that the pg_start_backup
>> is run.

>No.  It will be consistent with the time that pg_stop_backup was
>run, or any later point in time that you choose, as long as you
>have WAL to that point in time.

In the manual, we where told to do three steps:
1) pg_start_backup
2) backup by tar
3) pg_stop_backup

If the backup is consistent when pg_stop_backup is called, I think it worth to mention it in the manual.

Thanks and regards,
Patrick Dung


From: Kevin Grittner <kgrittn@ymail.com>
To: Patrick Dung <patrick_dkt@yahoo.com.hk>; Albe Laurenz <laurenz.albe@wien.gv.at>; "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
Sent: Wednesday, August 28, 2013 3:31 AM
Subject: Re: [ADMIN] several questions about pg_dumpall, pg_start_backup, pg_basebackup and WAL

Patrick Dung <patrick_dkt@yahoo.com.hk> wrote:
> Albe Laurenz <laurenz.albe@wien.gv.at>
>> Patrick Dung wrote:

>>> It is possible that there is file changes (added or file size
>>> changed) between the pg_start_backup and pg_stop_backup.
>>
>> Yes.
>>
>>> Is the backup consistent?
>>
>> The tar or snapshot itself will not be consistent, it will have
>> to be recovered at least until the end on the online backup.
>
> I should ask: is the backup crash consistent?

PITR restore procedures will use the crash recovery mechanism to
make the database consistent, if that's what you mean.

> Which means it is consistent at the time that the pg_start_backup
> is run.

No.  It will be consistent with the time that pg_stop_backup was
run, or any later point in time that you choose, as long as you
have WAL to that point in time.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

9.5 new setting "cluster name" and logging

От
Evan Rempel
Дата:
Now that there is a setting to give a cluster a "name", it would be nice
to have an escape sequence in the log_line_prefix setting that could
reference the cluster_name.

Comments?

Can we get this on the "to do list"?

Evan.


Re: 9.5 new setting "cluster name" and logging

От
Andres Freund
Дата:
Hi,

(x-posting to -hackers, more relevant audience)

On 2016-01-29 22:19:45 -0800, Evan Rempel wrote:
> Now that there is a setting to give a cluster a "name", it would be nice to
> have an escape sequence in the log_line_prefix setting that could reference
> the cluster_name.

I've argued[1][2] for this when cluster_name was introduced, but back
then I seemed to have been the only one arguing for it. Josh later
jumped that train.

Given that we now had a number of people wishing for this, can we maybe
reconsider?

Greetings,

Andres Freund

[1] http://archives.postgresql.org/message-id/CADLWmXUm%3D7Y3UORZnGMdSC6p1eymO0k0JkH4NKr4KstdWk0P7g%40mail.gmail.com
[2] http://archives.postgresql.org/message-id/20140505101007.GU12715@awork2.anarazel.de


Re: [HACKERS] 9.5 new setting "cluster name" and logging

От
Joe Conway
Дата:
On 02/08/2016 06:24 AM, Andres Freund wrote:
> On 2016-01-29 22:19:45 -0800, Evan Rempel wrote:
>> Now that there is a setting to give a cluster a "name", it would be nice to
>> have an escape sequence in the log_line_prefix setting that could reference
>> the cluster_name.
>
> I've argued[1][2] for this when cluster_name was introduced, but back
> then I seemed to have been the only one arguing for it. Josh later
> jumped that train.
>
> Given that we now had a number of people wishing for this, can we maybe
> reconsider?

Seems like a reasonable idea to me. But if we add a log_line_prefix
setting, shouldn't we also add it to csvlog output too?

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Re: [HACKERS] 9.5 new setting "cluster name" and logging

От
Thomas Munro
Дата:
On Tue, Feb 9, 2016 at 5:30 AM, Joe Conway <mail@joeconway.com> wrote:
> On 02/08/2016 06:24 AM, Andres Freund wrote:
>> On 2016-01-29 22:19:45 -0800, Evan Rempel wrote:
>>> Now that there is a setting to give a cluster a "name", it would be nice to
>>> have an escape sequence in the log_line_prefix setting that could reference
>>> the cluster_name.
>>
>> I've argued[1][2] for this when cluster_name was introduced, but back
>> then I seemed to have been the only one arguing for it. Josh later
>> jumped that train.
>>
>> Given that we now had a number of people wishing for this, can we maybe
>> reconsider?
>
> Seems like a reasonable idea to me. But if we add a log_line_prefix
> setting, shouldn't we also add it to csvlog output too?

Here's a tiny patch adding support for %C to log_line_prefix (this was
part of the cluster_name patch that didn't go it).

Given that csvlog's output format is hardcoded in write_csvlog, how is
it supposed to evolve without upsetting consumers of this data?
Wouldn't we first need to add a GUC that lets you control the columns
it outputs?

--
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: [HACKERS] 9.5 new setting "cluster name" and logging

От
Stephen Frost
Дата:
Thomas,

* Thomas Munro (thomas.munro@enterprisedb.com) wrote:
> On Tue, Feb 9, 2016 at 5:30 AM, Joe Conway <mail@joeconway.com> wrote:
> > On 02/08/2016 06:24 AM, Andres Freund wrote:
> >> On 2016-01-29 22:19:45 -0800, Evan Rempel wrote:
> >>> Now that there is a setting to give a cluster a "name", it would be nice to
> >>> have an escape sequence in the log_line_prefix setting that could reference
> >>> the cluster_name.
> >>
> >> I've argued[1][2] for this when cluster_name was introduced, but back
> >> then I seemed to have been the only one arguing for it. Josh later
> >> jumped that train.
> >>
> >> Given that we now had a number of people wishing for this, can we maybe
> >> reconsider?
> >
> > Seems like a reasonable idea to me. But if we add a log_line_prefix
> > setting, shouldn't we also add it to csvlog output too?
>
> Here's a tiny patch adding support for %C to log_line_prefix (this was
> part of the cluster_name patch that didn't go it).
>
> Given that csvlog's output format is hardcoded in write_csvlog, how is
> it supposed to evolve without upsetting consumers of this data?
> Wouldn't we first need to add a GUC that lets you control the columns
> it outputs?

Not sure if you really want to go there, but I do agree with you and
there's a thread from a few years back about something similar:

http://www.postgresql.org/message-id/flat/20110112142345.GA4933@tamriel.snowman.net

Included in that thread is a patch, which likely requires some dusting
off, to add exactly that ability.

Thanks!

Stephen

Вложения

Re: 9.5 new setting "cluster name" and logging

От
Bruce Momjian
Дата:
On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote:
>
> Now that there is a setting to give a cluster a "name", it would be
> nice to have an escape sequence in the log_line_prefix setting that
> could reference the cluster_name.
>
> Comments?
>
> Can we get this on the "to do list"?

Uh, wouldn't the cluster name be the same on every line?  Is that
useful?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +


Re: 9.5 new setting "cluster name" and logging

От
Evan Rempel
Дата:
On 02/18/2016 01:42 PM, Bruce Momjian wrote:
> On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote:
>> Now that there is a setting to give a cluster a "name", it would be
>> nice to have an escape sequence in the log_line_prefix setting that
>> could reference the cluster_name.
>>
>> Comments?
>>
>> Can we get this on the "to do list"?
> Uh, wouldn't the cluster name be the same on every line?  Is that
> useful?
>
That is correct. It does not make sense when looking into local log files, but you would not use the log prefix in that
environment.

When this feature would be useful is when you are sending the logs to syslog from many postgresql clusters, and on a
centralserver, You want to collect logs together that use the same cluster name. It just provides more flexibility to
addcontextual information to the log lines. 


--
Evan Rempel


Re: 9.5 new setting "cluster name" and logging

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote:
>> Now that there is a setting to give a cluster a "name", it would be
>> nice to have an escape sequence in the log_line_prefix setting that
>> could reference the cluster_name.

> Uh, wouldn't the cluster name be the same on every line?  Is that
> useful?

I guess if you were running multiple clusters and having them all log
to the same syslog daemon, there might be multiple clusters' output
in the same log ... but we already have ways to disambiguate that case,
ie, syslog_ident.  Likewise for Windows' event log.  I don't think we
should consider it supported for multiple clusters to be logging into
the same plain files.  So yeah, the use case for this seems unclear.

            regards, tom lane


Re: 9.5 new setting "cluster name" and logging

От
Evan Rempel
Дата:
On 02/18/2016 01:53 PM, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote:
>>> Now that there is a setting to give a cluster a "name", it would be
>>> nice to have an escape sequence in the log_line_prefix setting that
>>> could reference the cluster_name.
>> Uh, wouldn't the cluster name be the same on every line?  Is that
>> useful?
> I guess if you were running multiple clusters and having them all log
> to the same syslog daemon, there might be multiple clusters' output
> in the same log ... but we already have ways to disambiguate that case,
> ie, syslog_ident.  Likewise for Windows' event log.  I don't think we
> should consider it supported for multiple clusters to be logging into
> the same plain files.  So yeah, the use case for this seems unclear.
>
>             regards, tom lane

The use case is for logging to syslog.

Using the syslog_ident effectively changes the "program name" to something other than postgres. This would break any
typeof SIEM system that would be looking for events from the program postgres. This means that if you want to have a
clusterspecific contextual identifier, it needs to be in the log message content, which for the purposes of postgresql
meansplacing it into the log_line_prefix. 

It could be hard coded as part of the log_line_prefix such as

log_line_prefix = 'cluster-db1:%d:%u:%h '

That would work, but that means that my cluster name needs to be in the configuration file twice and that opens up the
errorscenario of not making them exactly the same. 

Allowing for the cluster_name to be referenced by an escape sequence would make manual and automated configuration
managementeasier. 

Thanks for listening.

--
Evan Rempel



Re: 9.5 new setting "cluster name" and logging

От
Fujii Masao
Дата:
On Fri, Feb 19, 2016 at 7:05 AM, Evan Rempel <erempel@uvic.ca> wrote:
> On 02/18/2016 01:53 PM, Tom Lane wrote:
>>
>> Bruce Momjian <bruce@momjian.us> writes:
>>>
>>> On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote:
>>>>
>>>> Now that there is a setting to give a cluster a "name", it would be
>>>> nice to have an escape sequence in the log_line_prefix setting that
>>>> could reference the cluster_name.
>>>
>>> Uh, wouldn't the cluster name be the same on every line?  Is that
>>> useful?
>>
>> I guess if you were running multiple clusters and having them all log
>> to the same syslog daemon, there might be multiple clusters' output
>> in the same log ... but we already have ways to disambiguate that case,
>> ie, syslog_ident.  Likewise for Windows' event log.  I don't think we
>> should consider it supported for multiple clusters to be logging into
>> the same plain files.  So yeah, the use case for this seems unclear.
>>
>>                         regards, tom lane
>
>
> The use case is for logging to syslog.
>
> Using the syslog_ident effectively changes the "program name" to something
> other than postgres. This would break any type of SIEM system that would be
> looking for events from the program postgres. This means that if you want to
> have a cluster specific contextual identifier, it needs to be in the log
> message content, which for the purposes of postgresql means placing it into
> the log_line_prefix.
>
> It could be hard coded as part of the log_line_prefix such as
>
> log_line_prefix = 'cluster-db1:%d:%u:%h '
>
> That would work, but that means that my cluster name needs to be in the
> configuration file twice and that opens up the error scenario of not making
> them exactly the same.
>
> Allowing for the cluster_name to be referenced by an escape sequence would
> make manual and automated configuration management easier.

What if log_destination is set to csvlog? The same cluster_name is
output in every lines, and there is no way to disable it because
something like csv_line_prefix has not been supported yet?

Regards,

--
Fujii Masao


Re: 9.5 new setting "cluster name" and logging

От
Evan Rempel
Дата:
On 02/19/2016 05:56 AM, Fujii Masao wrote:
> On Fri, Feb 19, 2016 at 7:05 AM, Evan Rempel <erempel@uvic.ca> wrote:
>> On 02/18/2016 01:53 PM, Tom Lane wrote:
>>> Bruce Momjian <bruce@momjian.us> writes:
>>>> On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote:
>>>>> Now that there is a setting to give a cluster a "name", it would be
>>>>> nice to have an escape sequence in the log_line_prefix setting that
>>>>> could reference the cluster_name.
>>>> Uh, wouldn't the cluster name be the same on every line?  Is that
>>>> useful?
>>> I guess if you were running multiple clusters and having them all log
>>> to the same syslog daemon, there might be multiple clusters' output
>>> in the same log ... but we already have ways to disambiguate that case,
>>> ie, syslog_ident.  Likewise for Windows' event log.  I don't think we
>>> should consider it supported for multiple clusters to be logging into
>>> the same plain files.  So yeah, the use case for this seems unclear.
>>>
>>>                          regards, tom lane
>>
>> The use case is for logging to syslog.
>>
>> Using the syslog_ident effectively changes the "program name" to something
>> other than postgres. This would break any type of SIEM system that would be
>> looking for events from the program postgres. This means that if you want to
>> have a cluster specific contextual identifier, it needs to be in the log
>> message content, which for the purposes of postgresql means placing it into
>> the log_line_prefix.
>>
>> It could be hard coded as part of the log_line_prefix such as
>>
>> log_line_prefix = 'cluster-db1:%d:%u:%h '
>>
>> That would work, but that means that my cluster name needs to be in the
>> configuration file twice and that opens up the error scenario of not making
>> them exactly the same.
>>
>> Allowing for the cluster_name to be referenced by an escape sequence would
>> make manual and automated configuration management easier.
> What if log_destination is set to csvlog? The same cluster_name is
> output in every lines, and there is no way to disable it because
> something like csv_line_prefix has not been supported yet?
>
> Regards,
>

I would think that in the case of using a csvlog that you would configure the log_line_prefix to not include the escape
sequenceof the cluster_name. The administrator would tune all of the logging components to work together, or am I
missingsomething here? 

--
Evan Rempel



Re: 9.5 new setting "cluster name" and logging

От
Fujii Masao
Дата:
On Sat, Feb 20, 2016 at 12:52 AM, Evan Rempel <erempel@uvic.ca> wrote:
> On 02/19/2016 05:56 AM, Fujii Masao wrote:
>>
>> On Fri, Feb 19, 2016 at 7:05 AM, Evan Rempel <erempel@uvic.ca> wrote:
>>>
>>> On 02/18/2016 01:53 PM, Tom Lane wrote:
>>>>
>>>> Bruce Momjian <bruce@momjian.us> writes:
>>>>>
>>>>> On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote:
>>>>>>
>>>>>> Now that there is a setting to give a cluster a "name", it would be
>>>>>> nice to have an escape sequence in the log_line_prefix setting that
>>>>>> could reference the cluster_name.
>>>>>
>>>>> Uh, wouldn't the cluster name be the same on every line?  Is that
>>>>> useful?
>>>>
>>>> I guess if you were running multiple clusters and having them all log
>>>> to the same syslog daemon, there might be multiple clusters' output
>>>> in the same log ... but we already have ways to disambiguate that case,
>>>> ie, syslog_ident.  Likewise for Windows' event log.  I don't think we
>>>> should consider it supported for multiple clusters to be logging into
>>>> the same plain files.  So yeah, the use case for this seems unclear.
>>>>
>>>>                          regards, tom lane
>>>
>>>
>>> The use case is for logging to syslog.
>>>
>>> Using the syslog_ident effectively changes the "program name" to
>>> something
>>> other than postgres. This would break any type of SIEM system that would
>>> be
>>> looking for events from the program postgres. This means that if you want
>>> to
>>> have a cluster specific contextual identifier, it needs to be in the log
>>> message content, which for the purposes of postgresql means placing it
>>> into
>>> the log_line_prefix.
>>>
>>> It could be hard coded as part of the log_line_prefix such as
>>>
>>> log_line_prefix = 'cluster-db1:%d:%u:%h '
>>>
>>> That would work, but that means that my cluster name needs to be in the
>>> configuration file twice and that opens up the error scenario of not
>>> making
>>> them exactly the same.
>>>
>>> Allowing for the cluster_name to be referenced by an escape sequence
>>> would
>>> make manual and automated configuration management easier.
>>
>> What if log_destination is set to csvlog? The same cluster_name is
>> output in every lines, and there is no way to disable it because
>> something like csv_line_prefix has not been supported yet?
>>
>> Regards,
>>
>
> I would think that in the case of using a csvlog that you would configure
> the log_line_prefix to not include the escape sequence of the cluster_name.
> The administrator would tune all of the logging components to work together,
> or am I missing something here?

log_line_prefix is only for stderr and syslog, not for csvlog.
It cannot control what to output in csvlog, for now.

Regards,

--
Fujii Masao


Re: 9.5 new setting "cluster name" and logging

От
Bruce Momjian
Дата:
On Fri, Feb 19, 2016 at 07:52:25AM -0800, Evan Rempel wrote:
> >>Allowing for the cluster_name to be referenced by an escape sequence
> >>would make manual and automated configuration management easier.
> >What if log_destination is set to csvlog? The same cluster_name is
> >output in every lines, and there is no way to disable it because
> >something like csv_line_prefix has not been supported yet?
> >
> >Regards,
> >
>
> I would think that in the case of using a csvlog that you would
> configure the log_line_prefix to not include the escape sequence of
> the cluster_name. The administrator would tune all of the logging
> components to work together, or am I missing something here?

csvlog isn't affected by log_line_prefix --- it outputs all fields every
time:

    http://www.postgresql.org/docs/9.5/static/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +


Re: 9.5 new setting "cluster name" and logging

От
Evan Rempel
Дата:
On 02/19/2016 08:10 AM, Fujii Masao wrote:
> On Sat, Feb 20, 2016 at 12:52 AM, Evan Rempel <erempel@uvic.ca> wrote:
>> On 02/19/2016 05:56 AM, Fujii Masao wrote:
>>> On Fri, Feb 19, 2016 at 7:05 AM, Evan Rempel <erempel@uvic.ca> wrote:
>>>> On 02/18/2016 01:53 PM, Tom Lane wrote:
>>>>> Bruce Momjian <bruce@momjian.us> writes:
>>>>>> On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote:
>>>>>>> Now that there is a setting to give a cluster a "name", it would be
>>>>>>> nice to have an escape sequence in the log_line_prefix setting that
>>>>>>> could reference the cluster_name.
>>>>>> Uh, wouldn't the cluster name be the same on every line?  Is that
>>>>>> useful?
>>>>> I guess if you were running multiple clusters and having them all log
>>>>> to the same syslog daemon, there might be multiple clusters' output
>>>>> in the same log ... but we already have ways to disambiguate that case,
>>>>> ie, syslog_ident.  Likewise for Windows' event log.  I don't think we
>>>>> should consider it supported for multiple clusters to be logging into
>>>>> the same plain files.  So yeah, the use case for this seems unclear.
>>>>>
>>>>>                           regards, tom lane
>>>>
>>>> The use case is for logging to syslog.
>>>>
>>>> Using the syslog_ident effectively changes the "program name" to
>>>> something
>>>> other than postgres. This would break any type of SIEM system that would
>>>> be
>>>> looking for events from the program postgres. This means that if you want
>>>> to
>>>> have a cluster specific contextual identifier, it needs to be in the log
>>>> message content, which for the purposes of postgresql means placing it
>>>> into
>>>> the log_line_prefix.
>>>>
>>>> It could be hard coded as part of the log_line_prefix such as
>>>>
>>>> log_line_prefix = 'cluster-db1:%d:%u:%h '
>>>>
>>>> That would work, but that means that my cluster name needs to be in the
>>>> configuration file twice and that opens up the error scenario of not
>>>> making
>>>> them exactly the same.
>>>>
>>>> Allowing for the cluster_name to be referenced by an escape sequence
>>>> would
>>>> make manual and automated configuration management easier.
>>> What if log_destination is set to csvlog? The same cluster_name is
>>> output in every lines, and there is no way to disable it because
>>> something like csv_line_prefix has not been supported yet?
>>>
>>> Regards,
>>>
>> I would think that in the case of using a csvlog that you would configure
>> the log_line_prefix to not include the escape sequence of the cluster_name.
>> The administrator would tune all of the logging components to work together,
>> or am I missing something here?
> log_line_prefix is only for stderr and syslog, not for csvlog.
> It cannot control what to output in csvlog, for now.
>
> Regards,

In that case, my request for a new escape sequence for the log_line_prefix will have no impact on csvlog output because
itwould 
only affect uses cases where log_line_prefix is used AND where the administrator has added the cluster name into the
log_line_prefix.

--
Evan



Re: 9.5 new setting "cluster name" and logging

От
Fujii Masao
Дата:
On Sat, Feb 20, 2016 at 1:19 AM, Evan Rempel <erempel@uvic.ca> wrote:
> On 02/19/2016 08:10 AM, Fujii Masao wrote:
>>
>> On Sat, Feb 20, 2016 at 12:52 AM, Evan Rempel <erempel@uvic.ca> wrote:
>>>
>>> On 02/19/2016 05:56 AM, Fujii Masao wrote:
>>>>
>>>> On Fri, Feb 19, 2016 at 7:05 AM, Evan Rempel <erempel@uvic.ca> wrote:
>>>>>
>>>>> On 02/18/2016 01:53 PM, Tom Lane wrote:
>>>>>>
>>>>>> Bruce Momjian <bruce@momjian.us> writes:
>>>>>>>
>>>>>>> On Fri, Jan 29, 2016 at 10:19:45PM -0800, Evan Rempel wrote:
>>>>>>>>
>>>>>>>> Now that there is a setting to give a cluster a "name", it would be
>>>>>>>> nice to have an escape sequence in the log_line_prefix setting that
>>>>>>>> could reference the cluster_name.
>>>>>>>
>>>>>>> Uh, wouldn't the cluster name be the same on every line?  Is that
>>>>>>> useful?
>>>>>>
>>>>>> I guess if you were running multiple clusters and having them all log
>>>>>> to the same syslog daemon, there might be multiple clusters' output
>>>>>> in the same log ... but we already have ways to disambiguate that
>>>>>> case,
>>>>>> ie, syslog_ident.  Likewise for Windows' event log.  I don't think we
>>>>>> should consider it supported for multiple clusters to be logging into
>>>>>> the same plain files.  So yeah, the use case for this seems unclear.
>>>>>>
>>>>>>                           regards, tom lane
>>>>>
>>>>>
>>>>> The use case is for logging to syslog.
>>>>>
>>>>> Using the syslog_ident effectively changes the "program name" to
>>>>> something
>>>>> other than postgres. This would break any type of SIEM system that
>>>>> would
>>>>> be
>>>>> looking for events from the program postgres. This means that if you
>>>>> want
>>>>> to
>>>>> have a cluster specific contextual identifier, it needs to be in the
>>>>> log
>>>>> message content, which for the purposes of postgresql means placing it
>>>>> into
>>>>> the log_line_prefix.
>>>>>
>>>>> It could be hard coded as part of the log_line_prefix such as
>>>>>
>>>>> log_line_prefix = 'cluster-db1:%d:%u:%h '
>>>>>
>>>>> That would work, but that means that my cluster name needs to be in the
>>>>> configuration file twice and that opens up the error scenario of not
>>>>> making
>>>>> them exactly the same.
>>>>>
>>>>> Allowing for the cluster_name to be referenced by an escape sequence
>>>>> would
>>>>> make manual and automated configuration management easier.
>>>>
>>>> What if log_destination is set to csvlog? The same cluster_name is
>>>> output in every lines, and there is no way to disable it because
>>>> something like csv_line_prefix has not been supported yet?
>>>>
>>>> Regards,
>>>>
>>> I would think that in the case of using a csvlog that you would configure
>>> the log_line_prefix to not include the escape sequence of the
>>> cluster_name.
>>> The administrator would tune all of the logging components to work
>>> together,
>>> or am I missing something here?
>>
>> log_line_prefix is only for stderr and syslog, not for csvlog.
>> It cannot control what to output in csvlog, for now.
>>
>> Regards,
>
>
> In that case, my request for a new escape sequence for the log_line_prefix
> will have no impact on csvlog output because it would
> only affect uses cases where log_line_prefix is used AND where the
> administrator has added the cluster name into the log_line_prefix.

So cluster_name will never appear in csvlog. Maybe I'm OK with that.

OTOH, there are many "escape sequences" available in log_line_prefix,
and their corresponding information is basically output in csvlog,
but only cluster_name is not. That design looks a bit odd.

Regards,

--
Fujii Masao


Re: [HACKERS] 9.5 new setting "cluster name" and logging

От
Thomas Munro
Дата:
On Tue, Feb 9, 2016 at 3:24 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> (x-posting to -hackers, more relevant audience)
>
> On 2016-01-29 22:19:45 -0800, Evan Rempel wrote:
>> Now that there is a setting to give a cluster a "name", it would be nice to
>> have an escape sequence in the log_line_prefix setting that could reference
>> the cluster_name.
>
> I've argued[1][2] for this when cluster_name was introduced, but back
> then I seemed to have been the only one arguing for it. Josh later
> jumped that train.
>
> Given that we now had a number of people wishing for this, can we maybe
> reconsider?

+1

I missed the part of this conversation that took place on -admin.  I
agree with Evan Rempel's post over there[1] that it's useful for
syslog users, and that it's not ideal to have to hijack syslog_ident
or explicitly copy the name into log_line_prefix.

[1] https://www.postgresql.org/message-id/56C64017.7050303%40uvic.ca

--
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] 9.5 new setting "cluster name" and logging

От
Stephen Frost
Дата:
* Thomas Munro (thomas.munro@enterprisedb.com) wrote:
> On Tue, Feb 9, 2016 at 3:24 AM, Andres Freund <andres@anarazel.de> wrote:
> > (x-posting to -hackers, more relevant audience)
> >
> > On 2016-01-29 22:19:45 -0800, Evan Rempel wrote:
> >> Now that there is a setting to give a cluster a "name", it would be nice to
> >> have an escape sequence in the log_line_prefix setting that could reference
> >> the cluster_name.
> >
> > I've argued[1][2] for this when cluster_name was introduced, but back
> > then I seemed to have been the only one arguing for it. Josh later
> > jumped that train.
> >
> > Given that we now had a number of people wishing for this, can we maybe
> > reconsider?
>
> +1
>
> I missed the part of this conversation that took place on -admin.  I
> agree with Evan Rempel's post over there[1] that it's useful for
> syslog users, and that it's not ideal to have to hijack syslog_ident
> or explicitly copy the name into log_line_prefix.
>
> [1] https://www.postgresql.org/message-id/56C64017.7050303%40uvic.ca

I'm with Thomas on this and I disagree that the "csvlog bloat" argument
has merit.  If we're worried about bloat in csv then we should provide a
way for users to control what goes into the csvlog, not argue that
something which is clearly useful be excluded.

There's already a patch out there for adding a way to control what goes
into csvlog and it probably wouldn't be too hard to update it.  I know,
because I wrote it.  I continue to feel that it would be useful to add
and that tools which need to deal with csvlogs should be updated to
handle being told what's in it, just like pgBadger does for our much
more complicated and painful regular log files.  That would remove these
bars to moving forward with more flexibility in log_line_prefix and
csvlog output.

Thanks!

Stephen

Вложения

Re: [HACKERS] 9.5 new setting "cluster name" and logging

От
Robert Haas
Дата:
On Mon, Oct 17, 2016 at 6:24 PM, Stephen Frost <sfrost@snowman.net> wrote:
> I'm with Thomas on this and I disagree that the "csvlog bloat" argument
> has merit.  If we're worried about bloat in csv then we should provide a
> way for users to control what goes into the csvlog, not argue that
> something which is clearly useful be excluded.

I agree, but I think if we do that it would be highly desirable do
something to make the format discoverable.  For example, at the
beginning of each file and whenever the format changes, we write some
kind of unmistakable header line into the file identifying what fields
will be present on each line thereafter.  It's undesirable for log
analysis tools to need to contain logic that tries (imperfectly, no
doubt) to reverse-engineer the field list.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] 9.5 new setting "cluster name" and logging

От
Alvaro Herrera
Дата:
Robert Haas wrote:
> On Mon, Oct 17, 2016 at 6:24 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > I'm with Thomas on this and I disagree that the "csvlog bloat" argument
> > has merit.  If we're worried about bloat in csv then we should provide a
> > way for users to control what goes into the csvlog, not argue that
> > something which is clearly useful be excluded.
>
> I agree, but I think if we do that it would be highly desirable do
> something to make the format discoverable.  For example, at the
> beginning of each file and whenever the format changes, we write some
> kind of unmistakable header line into the file identifying what fields
> will be present on each line thereafter.  It's undesirable for log
> analysis tools to need to contain logic that tries (imperfectly, no
> doubt) to reverse-engineer the field list.

Hmm, I wouldn't use the same file for different formats.  If the format
is changed, a different file should be used.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] 9.5 new setting "cluster name" and logging

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Oct 17, 2016 at 6:24 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> I'm with Thomas on this and I disagree that the "csvlog bloat" argument
>> has merit.  If we're worried about bloat in csv then we should provide a
>> way for users to control what goes into the csvlog, not argue that
>> something which is clearly useful be excluded.

> I agree, but I think if we do that it would be highly desirable do
> something to make the format discoverable.  For example, at the
> beginning of each file and whenever the format changes, we write some
> kind of unmistakable header line into the file identifying what fields
> will be present on each line thereafter.  It's undesirable for log
> analysis tools to need to contain logic that tries (imperfectly, no
> doubt) to reverse-engineer the field list.

Automatic "csv header" at the start of each logfile, perhaps?  And
force a logfile switch if it changes.

            regards, tom lane