Обсуждение: Augment every test postgresql.conf

Поиск
Список
Период
Сортировка

Augment every test postgresql.conf

От
Noah Misch
Дата:
"env EXTRA_REGRESS_OPTS=--temp-config=SOMEFILE make check" appends the
contents of SOMEFILE to the test cluster's postgresql.conf.  I want a similar
feature for TAP suites and other non-pg_regress suites.  (My immediate use
case is to raise authentication_timeout and wal_sender_timeout on my buildfarm
animals, which sometimes fail at the defaults.)  I'm thinking to do this by
recognizing the PG_TEST_TEMP_CONFIG environment variable as a
whitespace-separated list of file names for appending to postgresql.conf.


Re: Augment every test postgresql.conf

От
Noah Misch
Дата:
On Fri, Dec 28, 2018 at 06:19:50PM -0800, Noah Misch wrote:
> "env EXTRA_REGRESS_OPTS=--temp-config=SOMEFILE make check" appends the
> contents of SOMEFILE to the test cluster's postgresql.conf.  I want a similar
> feature for TAP suites and other non-pg_regress suites.  (My immediate use
> case is to raise authentication_timeout and wal_sender_timeout on my buildfarm
> animals, which sometimes fail at the defaults.)  I'm thinking to do this by
> recognizing the PG_TEST_TEMP_CONFIG environment variable as a
> whitespace-separated list of file names for appending to postgresql.conf.

Looking more closely, we already have the TEMP_CONFIG variable and apply it to
everything except TAP suites.  Closing that gap, as attached, is enough.  The
buildfarm client uses TEMP_CONFIG to implement its extra_config setting, so
this will cause extra_config to start applying to TAP suites.

Вложения

Re: Augment every test postgresql.conf

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> Looking more closely, we already have the TEMP_CONFIG variable and apply it to
> everything except TAP suites.  Closing that gap, as attached, is enough.  The
> buildfarm client uses TEMP_CONFIG to implement its extra_config setting, so
> this will cause extra_config to start applying to TAP suites.

Seems reasonable, but it might be a good idea to warn the buildfarm-owners
list before committing.  (Although I guess it wouldn't be hard to check
the buildfarm database to see if anyone is putting anything interesting
into their critters' TEMP_CONFIG.)

Also, if we're to do this, it seems like applying it to back branches
would be helpful --- but will it work in all the back branches?

            regards, tom lane


Re: Augment every test postgresql.conf

От
Noah Misch
Дата:
On Sat, Dec 29, 2018 at 10:46:31PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > Looking more closely, we already have the TEMP_CONFIG variable and apply it to
> > everything except TAP suites.  Closing that gap, as attached, is enough.  The
> > buildfarm client uses TEMP_CONFIG to implement its extra_config setting, so
> > this will cause extra_config to start applying to TAP suites.
> 
> Seems reasonable, but it might be a good idea to warn the buildfarm-owners
> list before committing.  (Although I guess it wouldn't be hard to check
> the buildfarm database to see if anyone is putting anything interesting
> into their critters' TEMP_CONFIG.)

Good idea.  Here are the extra_config entries seen since 2018-12-01:

 archive_mode = off
 force_parallel_mode = regress
 fsync = off
 fsync = on
 jit=1
 jit = 1
 jit_above_cost=0
 jit = on
 jit_optimize_above_cost=1000
 log_checkpoints = 'true'
 log_connections = 'true'
 log_disconnections = 'true'
 log_line_prefix = '[%c:%l] '
 log_line_prefix = '%m [%c:%l] '
 log_line_prefix = '%m [%c:%l] %q%a '
 log_line_prefix = '%m [%p:%l] '
 log_line_prefix = '%m [%p:%l] %q%a '
 log_line_prefix = '%m [%s %p:%l] %q%a '
 log_statement = 'all'
 max_parallel_workers_per_gather = 2
 max_parallel_workers_per_gather = 5
 max_wal_senders = 0
 shared_buffers = 10MB
 stats_temp_directory = '/cygdrive/w/lorikeet/HEAD'
 stats_temp_directory = '/cygdrive/w/lorikeet/REL_10_STABLE'
 stats_temp_directory = '/cygdrive/w/lorikeet/REL_11_STABLE'
 stats_temp_directory = '/cygdrive/w/lorikeet/REL9_4_STABLE'
 stats_temp_directory = '/cygdrive/w/lorikeet/REL9_5_STABLE'
 stats_temp_directory = '/cygdrive/w/lorikeet/REL9_6_STABLE'
 stats_temp_directory= '/home/buildfarm/data/stats_temp'
 wal_level = 'minimal'

Problems:

1. max_wal_senders=0 and wal_level=minimal break a number of suites,
   e.g. pg_basebackup.
2. stats_temp_directory is incompatible with TAP suites that start more than
   one node simultaneously.

Problem (1) goes away if I inject the TEMP_CONFIG settings earlier in the
file, which seems defensible:

--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -447,10 +447,13 @@ sub init
     print $conf "log_statement = all\n";
     print $conf "log_replication_commands = on\n";
     print $conf "wal_retrieve_retry_interval = '500ms'\n";
     print $conf "port = $port\n";
 
+    print $conf TestLib::slurp_file($ENV{TEMP_CONFIG})
+      if defined $ENV{TEMP_CONFIG};
+
     if ($params{allows_streaming})
     {
         if ($params{allows_streaming} eq "logical")
         {
             print $conf "wal_level = logical\n";

Problem (2) remains.  It's already a problem for "make -j check-world".  I'll
give that one more thought.

> Also, if we're to do this, it seems like applying it to back branches
> would be helpful --- but will it work in all the back branches?

Yes.  TEMP_CONFIG exists in all supported branches, and the back-patch to 9.4
is no more complex.  Before 9.6 (commit 87cc6b5), TEMP_CONFIG affected "make
check" and the pg_upgrade test suite, but it did not affect other pg_regress
suites like contrib/* and src/pl/*.  We could back-patch commit 87cc6b5, if
there's demand.  I don't personally need it, because the tests I want to
influence are all TAP tests.


Re: Augment every test postgresql.conf

От
Andrew Dunstan
Дата:
On 12/30/18 12:53 AM, Noah Misch wrote:
> On Sat, Dec 29, 2018 at 10:46:31PM -0500, Tom Lane wrote:
>> Noah Misch <noah@leadboat.com> writes:
>>> Looking more closely, we already have the TEMP_CONFIG variable and apply it to
>>> everything except TAP suites.  Closing that gap, as attached, is enough.  The
>>> buildfarm client uses TEMP_CONFIG to implement its extra_config setting, so
>>> this will cause extra_config to start applying to TAP suites.
>> Seems reasonable, but it might be a good idea to warn the buildfarm-owners
>> list before committing.  (Although I guess it wouldn't be hard to check
>> the buildfarm database to see if anyone is putting anything interesting
>> into their critters' TEMP_CONFIG.)
> Good idea.  Here are the extra_config entries seen since 2018-12-01:
>
>  archive_mode = off
>  force_parallel_mode = regress
>  fsync = off
>  fsync = on
>  jit=1
>  jit = 1
>  jit_above_cost=0
>  jit = on
>  jit_optimize_above_cost=1000
>  log_checkpoints = 'true'
>  log_connections = 'true'
>  log_disconnections = 'true'
>  log_line_prefix = '[%c:%l] '
>  log_line_prefix = '%m [%c:%l] '
>  log_line_prefix = '%m [%c:%l] %q%a '
>  log_line_prefix = '%m [%p:%l] '
>  log_line_prefix = '%m [%p:%l] %q%a '
>  log_line_prefix = '%m [%s %p:%l] %q%a '
>  log_statement = 'all'
>  max_parallel_workers_per_gather = 2
>  max_parallel_workers_per_gather = 5
>  max_wal_senders = 0
>  shared_buffers = 10MB
>  stats_temp_directory = '/cygdrive/w/lorikeet/HEAD'
>  stats_temp_directory = '/cygdrive/w/lorikeet/REL_10_STABLE'
>  stats_temp_directory = '/cygdrive/w/lorikeet/REL_11_STABLE'
>  stats_temp_directory = '/cygdrive/w/lorikeet/REL9_4_STABLE'
>  stats_temp_directory = '/cygdrive/w/lorikeet/REL9_5_STABLE'
>  stats_temp_directory = '/cygdrive/w/lorikeet/REL9_6_STABLE'
>  stats_temp_directory= '/home/buildfarm/data/stats_temp'
>  wal_level = 'minimal'
>
> Problems:
>
> 1. max_wal_senders=0 and wal_level=minimal break a number of suites,
>    e.g. pg_basebackup.
> 2. stats_temp_directory is incompatible with TAP suites that start more than
>    one node simultaneously.
>
> Problem (1) goes away if I inject the TEMP_CONFIG settings earlier in the
> file, which seems defensible:
>
> --- a/src/test/perl/PostgresNode.pm
> +++ b/src/test/perl/PostgresNode.pm
> @@ -447,10 +447,13 @@ sub init
>      print $conf "log_statement = all\n";
>      print $conf "log_replication_commands = on\n";
>      print $conf "wal_retrieve_retry_interval = '500ms'\n";
>      print $conf "port = $port\n";
>  
> +    print $conf TestLib::slurp_file($ENV{TEMP_CONFIG})
> +      if defined $ENV{TEMP_CONFIG};
> +
>      if ($params{allows_streaming})
>      {
>          if ($params{allows_streaming} eq "logical")
>          {
>              print $conf "wal_level = logical\n";
>
> Problem (2) remains.  It's already a problem for "make -j check-world".  I'll
> give that one more thought.
>


lorikeet is putting the stats_temp directory on a ramdisk. This is worth
testing in any case, but in lorikeet's case was done to help speed up
the tests. When I had a Raspberry Pi instance I did something similar,
for the same reason.


The obvious quick fix would be to have PostgresNode.pm set this to the
default after inserting the TEMP_CONFIG file.


There are a couple of problems here that bear further consideration.
First, that the stats_temp_directory has to exist, and second that there
is no convenient way to make it unique. It would be nice if a) the
directory could be created if it didn't exist and b) some place-holder
in the name could be replaced by a unique identifier such as the node
id. If there is interest I'll work on these. One problem I foresee is
that it might lead to a plethora of stats temp directories being left
around. Still thinking about how we should deal with that. In the
buildfarm client I'd be tempted to create a directory to hold all the
run's stats_temp_directories and then clean it up at the end of the run.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



stats_temp_directory conflicts

От
Noah Misch
Дата:
On Sun, Dec 30, 2018 at 10:32:31AM -0500, Andrew Dunstan wrote:
> On 12/30/18 12:53 AM, Noah Misch wrote:
> >  stats_temp_directory = '/cygdrive/w/lorikeet/HEAD'

> > 2. stats_temp_directory is incompatible with TAP suites that start more than
> >    one node simultaneously.

> > It's already a problem for "make -j check-world".

> lorikeet is putting the stats_temp directory on a ramdisk. This is worth
> testing in any case, but in lorikeet's case was done to help speed up
> the tests. When I had a Raspberry Pi instance I did something similar,
> for the same reason.

I, too, value the ability to override stats_temp_directory for test runs.  (I
get stats.sql failures at high load, even on a high-performance machine.
Using stats_temp_directory may fix that.)

> The obvious quick fix would be to have PostgresNode.pm set this to the
> default after inserting the TEMP_CONFIG file.

True.

> There are a couple of problems here that bear further consideration.
> First, that the stats_temp_directory has to exist, and second that there
> is no convenient way to make it unique. It would be nice if a) the
> directory could be created if it didn't exist and b) some place-holder
> in the name could be replaced by a unique identifier such as the node
> id. If there is interest I'll work on these. One problem I foresee is
> that it might lead to a plethora of stats temp directories being left
> around. Still thinking about how we should deal with that. In the
> buildfarm client I'd be tempted to create a directory to hold all the
> run's stats_temp_directories and then clean it up at the end of the run.

I'm thinking the server should manage this; during startup, create
$stats_temp_directory/PostgreSQL.$postmaster_pid and store each stats file
therein.  Just before creating that directory, scan $stats_temp_directory and
delete subdirectories that no longer correspond to live PIDs.  Subdirectories
would not build up over time, even if one deletes a test data directory while
its subdirectory of stats_temp_directory still exists.  For non-test
applications, this makes stats_temp_directory safer to use.  Today, we don't
detect two clusters using the same stats_temp_directory.  We don't even
document that it's unacceptable.  This makes it acceptable.

Thanks,
nm


Re: stats_temp_directory conflicts

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> I'm thinking the server should manage this; during startup, create
> $stats_temp_directory/PostgreSQL.$postmaster_pid and store each stats file
> therein.

+1

> Just before creating that directory, scan $stats_temp_directory and
> delete subdirectories that no longer correspond to live PIDs.

Hm, seems potentially racy, if multiple postmasters are starting
at the same time.  The only one you can be *sure* is dead is one
with your own PID.

            regards, tom lane


Re: stats_temp_directory conflicts

От
Noah Misch
Дата:
On Sun, Dec 30, 2018 at 07:47:05PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > I'm thinking the server should manage this; during startup, create
> > $stats_temp_directory/PostgreSQL.$postmaster_pid and store each stats file
> > therein.
> 
> +1
> 
> > Just before creating that directory, scan $stats_temp_directory and
> > delete subdirectories that no longer correspond to live PIDs.
> 
> Hm, seems potentially racy, if multiple postmasters are starting
> at the same time.  The only one you can be *sure* is dead is one
> with your own PID.

True; if a PID=123 postmaster launches and completes startup after a
slightly-older PID=122 postmaster issues kill(123, 0) and before PID=122
issues unlink()s, PID=122 unlinks files wrongly.  I think I would fix that
with fcntl(F_SETLKW)/Windows-equivalent on some file in stats_temp_directory.
One would acquire the lock before deleting a subdirectory, except for a
postmaster deleting the directory it created.  (If a postmaster finds a stale
directory for its PID, delete that directory and create a new one.  Like any
deletion, one must hold the lock.)

(Alternately, one could just accept that risk.)

Another problem comes to mind; if postmasters of different UIDs share a
stats_temp_directory, then a PID=1234 postmaster may find itself unable to
delete the stale PostgreSQL.1234 subdirectory owned by some other UID.  To fix
that, the name pattern probably should be
$stats_temp_directory/PostgreSQL.$euid.$postmaster_pid.


Re: stats_temp_directory conflicts

От
Andrew Dunstan
Дата:
On 12/30/18 11:06 PM, Noah Misch wrote:
> On Sun, Dec 30, 2018 at 07:47:05PM -0500, Tom Lane wrote:
>> Noah Misch <noah@leadboat.com> writes:
>>> I'm thinking the server should manage this; during startup, create
>>> $stats_temp_directory/PostgreSQL.$postmaster_pid and store each stats file
>>> therein.
>> +1
>>
>>> Just before creating that directory, scan $stats_temp_directory and
>>> delete subdirectories that no longer correspond to live PIDs.
>> Hm, seems potentially racy, if multiple postmasters are starting
>> at the same time.  The only one you can be *sure* is dead is one
>> with your own PID.
> True; if a PID=123 postmaster launches and completes startup after a
> slightly-older PID=122 postmaster issues kill(123, 0) and before PID=122
> issues unlink()s, PID=122 unlinks files wrongly.  I think I would fix that
> with fcntl(F_SETLKW)/Windows-equivalent on some file in stats_temp_directory.
> One would acquire the lock before deleting a subdirectory, except for a
> postmaster deleting the directory it created.  (If a postmaster finds a stale
> directory for its PID, delete that directory and create a new one.  Like any
> deletion, one must hold the lock.)
>
> (Alternately, one could just accept that risk.)
>
> Another problem comes to mind; if postmasters of different UIDs share a
> stats_temp_directory, then a PID=1234 postmaster may find itself unable to
> delete the stale PostgreSQL.1234 subdirectory owned by some other UID.  To fix
> that, the name pattern probably should be
> $stats_temp_directory/PostgreSQL.$euid.$postmaster_pid.



I like this scheme. It will certainly make using a RAMdisk simpler for
buildfarm members.


+1 for locking rather than running the risk of incorrect deletions.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Augment every test postgresql.conf

От
Noah Misch
Дата:
On Sun, Dec 30, 2018 at 10:32:31AM -0500, Andrew Dunstan wrote:
> On 12/30/18 12:53 AM, Noah Misch wrote:
> > 2. stats_temp_directory is incompatible with TAP suites that start more than
> >    one node simultaneously.

> The obvious quick fix would be to have PostgresNode.pm set this to the
> default after inserting the TEMP_CONFIG file.

I'd like to get $SUBJECT in place for variables other than
stats_temp_directory, using your quick fix idea.  Attached.  When its time
comes, your stats_temp_directory work can delete that section.

Вложения

Re: Augment every test postgresql.conf

От
Andrew Dunstan
Дата:
On Sun, Apr 7, 2019 at 2:41 AM Noah Misch <noah@leadboat.com> wrote:
>
> On Sun, Dec 30, 2018 at 10:32:31AM -0500, Andrew Dunstan wrote:
> > On 12/30/18 12:53 AM, Noah Misch wrote:
> > > 2. stats_temp_directory is incompatible with TAP suites that start more than
> > >    one node simultaneously.
>
> > The obvious quick fix would be to have PostgresNode.pm set this to the
> > default after inserting the TEMP_CONFIG file.
>
> I'd like to get $SUBJECT in place for variables other than
> stats_temp_directory, using your quick fix idea.  Attached.  When its time
> comes, your stats_temp_directory work can delete that section.

Looks good.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Augment every test postgresql.conf

От
Noah Misch
Дата:
On Sun, Apr 07, 2019 at 07:56:02AM -0400, Andrew Dunstan wrote:
> On Sun, Apr 7, 2019 at 2:41 AM Noah Misch <noah@leadboat.com> wrote:
> >
> > On Sun, Dec 30, 2018 at 10:32:31AM -0500, Andrew Dunstan wrote:
> > > On 12/30/18 12:53 AM, Noah Misch wrote:
> > > > 2. stats_temp_directory is incompatible with TAP suites that start more than
> > > >    one node simultaneously.
> >
> > > The obvious quick fix would be to have PostgresNode.pm set this to the
> > > default after inserting the TEMP_CONFIG file.
> >
> > I'd like to get $SUBJECT in place for variables other than
> > stats_temp_directory, using your quick fix idea.  Attached.  When its time
> > comes, your stats_temp_directory work can delete that section.
> 
> Looks good.

Pushed.  This broke 010_dump_connstr.pl on bowerbird, introducing 'invalid
byte sequence for encoding "UTF8"' errors.  That's because log_connections
renders this 010_dump_connstr.pl solution insufficient:

  # In a SQL_ASCII database, pgwin32_message_to_UTF16() needs to
  # interpret everything as UTF8.  We're going to use byte sequences
  # that aren't valid UTF-8 strings, so that would fail.  Use LATIN1,
  # which accepts any byte and has a conversion from each byte to UTF-8.
  $ENV{LC_ALL}           = 'C';
  $ENV{PGCLIENTENCODING} = 'LATIN1';

The log_connections message prints before CheckMyDatabase() calls
pg_perm_setlocale() to activate that LATIN1 database encoding.  Since
bowerbird does a non-NLS build, GetMessageEncoding()==PG_SQL_ASCII at that
time.  Some options:

1. Make this one test explicitly set log_connections = off.  This workaround
   restores what we had a day ago.

2. Move the log_connections message after CheckMyDatabase() calls
   pg_perm_setlocale(), so it gets regular post-startup encoding treatment.
   That fixes this particular test.  It's still wrong when a database's name
   is not valid in that database's encoding.

3. If GetMessageEncoding()==PG_SQL_ASCII, make pgwin32_message_to_UTF16()
   assume the text is already UTF8, like it does when not in a transaction.
   If UTF8->UTF16 conversion fails, the caller will send untranslated bytes to
   write() or ReportEventA().

4. If GetMessageEncoding()==PG_SQL_ASCII, make pgwin32_message_to_UTF16()
   return NULL.  The caller will always send untranslated bytes to write() or
   ReportEventA().  This seems consistent with the SQL_ASCII concept and with
   pg_do_encoding_conversion()'s interpretation of SQL_ASCII.

5. When including a datname or rolname value in a message, hex-escape
   non-ASCII bytes.  They are byte sequences, not text of known encoding.
   This preserves the most information, but it's overkill and ugly in the
   probably-common case of one encoding across all databases of a cluster.

I'm inclined to do (1) in back branches and (4) in HEAD only.  (If starting
fresh today, I would store the encoding of each rolname and dbname or just use
UTF8 for those particular fields.)  Other preferences?

Thanks,
nm



Re: Augment every test postgresql.conf

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> Pushed.  This broke 010_dump_connstr.pl on bowerbird, introducing 'invalid
> byte sequence for encoding "UTF8"' errors.  That's because log_connections
> renders this 010_dump_connstr.pl solution insufficient:

Ugh.

> 4. If GetMessageEncoding()==PG_SQL_ASCII, make pgwin32_message_to_UTF16()
>    return NULL.  The caller will always send untranslated bytes to write() or
>    ReportEventA().  This seems consistent with the SQL_ASCII concept and with
>    pg_do_encoding_conversion()'s interpretation of SQL_ASCII.

> 5. When including a datname or rolname value in a message, hex-escape
>    non-ASCII bytes.  They are byte sequences, not text of known encoding.
>    This preserves the most information, but it's overkill and ugly in the
>    probably-common case of one encoding across all databases of a cluster.

> I'm inclined to do (1) in back branches and (4) in HEAD only.  (If starting
> fresh today, I would store the encoding of each rolname and dbname or just use
> UTF8 for those particular fields.)  Other preferences?

I agree that (4) is a fairly reasonable thing to do, and wouldn't mind
back-patching that.  Taking a wider view, this seems closely related
to something I've been thinking about in connection with the recent
pg_stat_activity contretemps: that mechanism is also shoving strings
across database boundaries without a lot of worry about encodings.
Maybe we should try to develop a common solution.

One difference from the datname/rolname situation is that for
pg_stat_activity we can know the source encoding --- we aren't storing
it now, but we easily could.  If we're thinking of a future solution
only, adding a "name encoding" field to relevant shared catalogs makes
sense perhaps.  Alternatively, requiring names in shared catalogs to be
UTF8 might be a reasonable answer too.

In all these cases, throwing an error when we can't translate a character
into the destination encoding is not very pleasant.  For pg_stat_activity,
I was imagining that translating such characters to '?' might be the best
answer.  I don't know if we can get away with that for the datname/rolname
case --- at the very least, it opens problems with apparent duplication of
names that should be unique.  I don't much like your hex-encoding answer,
though; that has its own uniqueness-violation hazards, plus it's ugly.

I don't have a strong feeling about what's best.

            regards, tom lane



Re: Augment every test postgresql.conf

От
Noah Misch
Дата:
On Sat, May 11, 2019 at 10:43:59PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > Pushed.  This broke 010_dump_connstr.pl on bowerbird, introducing 'invalid
> > byte sequence for encoding "UTF8"' errors.  That's because log_connections
> > renders this 010_dump_connstr.pl solution insufficient:
> 
> Ugh.
> 
> > 4. If GetMessageEncoding()==PG_SQL_ASCII, make pgwin32_message_to_UTF16()
> >    return NULL.  The caller will always send untranslated bytes to write() or
> >    ReportEventA().  This seems consistent with the SQL_ASCII concept and with
> >    pg_do_encoding_conversion()'s interpretation of SQL_ASCII.
> 
> > 5. When including a datname or rolname value in a message, hex-escape
> >    non-ASCII bytes.  They are byte sequences, not text of known encoding.
> >    This preserves the most information, but it's overkill and ugly in the
> >    probably-common case of one encoding across all databases of a cluster.
> 
> > I'm inclined to do (1) in back branches and (4) in HEAD only.  (If starting
> > fresh today, I would store the encoding of each rolname and dbname or just use
> > UTF8 for those particular fields.)  Other preferences?
> 
> I agree that (4) is a fairly reasonable thing to do, and wouldn't mind
> back-patching that.

Okay.  Absent objections, I'll just do it that way.

> Taking a wider view, this seems closely related
> to something I've been thinking about in connection with the recent
> pg_stat_activity contretemps: that mechanism is also shoving strings
> across database boundaries without a lot of worry about encodings.
> Maybe we should try to develop a common solution.
> 
> One difference from the datname/rolname situation is that for
> pg_stat_activity we can know the source encoding --- we aren't storing
> it now, but we easily could.  If we're thinking of a future solution
> only, adding a "name encoding" field to relevant shared catalogs makes
> sense perhaps.  Alternatively, requiring names in shared catalogs to be
> UTF8 might be a reasonable answer too.
> 
> In all these cases, throwing an error when we can't translate a character
> into the destination encoding is not very pleasant.  For pg_stat_activity,
> I was imagining that translating such characters to '?' might be the best
> answer.  I don't know if we can get away with that for the datname/rolname
> case --- at the very least, it opens problems with apparent duplication of
> names that should be unique.  I don't much like your hex-encoding answer,
> though; that has its own uniqueness-violation hazards, plus it's ugly.

Another case of byte sequence masquerading as text is pg_settings.setting.

In most contexts, it's important to convey exact values.  Error messages can
use '?'.  I wouldn't let dump/reload of a rolname corrupt it that way, and I
wouldn't recognize the '?' version for authentication.  While
pg_stat_activity.query could use '?', I'd encourage adding bytea and encoding
columns for exact transmission.  pg_stat_activity can't standardize on UTF8
without shrinking the set of valid queries or inaccurately reporting some,
neither of which is attractive.

datname/rolname could afford to be more prescriptive, since non-ASCII names
are full of bugs today.  A useful consequence of UTF8 datname/rolname would be
today's "pg_dumpall --globals" remaining simple.  If we were to support
arbitrary encodings with a "name encoding" field, the general-case equivalent
of "pg_dumpall --globals" would connect to several databases of different
encodings in order to dump all objects, perhaps even creating a temporary
database if no suitable-encoding database existed.

MULE_INTERNAL presents trouble since we don't have a UTF8<->MULE_INTERNAL
conversion.  If we standardized cross-database strings on UTF8, it would be
impossible to read such strings, create roles, etc. from a MULE_INTERNAL
database.  I suppose we'd either add the conversion or deprecate
MULE_INTERNAL, forbidding its use as the initdb encoding.