Обсуждение: Augment every test postgresql.conf
"env EXTRA_REGRESS_OPTS=--temp-config=SOMEFILE make check" appends the contents of SOMEFILE to the test cluster's postgresql.conf. I want a similar feature for TAP suites and other non-pg_regress suites. (My immediate use case is to raise authentication_timeout and wal_sender_timeout on my buildfarm animals, which sometimes fail at the defaults.) I'm thinking to do this by recognizing the PG_TEST_TEMP_CONFIG environment variable as a whitespace-separated list of file names for appending to postgresql.conf.
On Fri, Dec 28, 2018 at 06:19:50PM -0800, Noah Misch wrote: > "env EXTRA_REGRESS_OPTS=--temp-config=SOMEFILE make check" appends the > contents of SOMEFILE to the test cluster's postgresql.conf. I want a similar > feature for TAP suites and other non-pg_regress suites. (My immediate use > case is to raise authentication_timeout and wal_sender_timeout on my buildfarm > animals, which sometimes fail at the defaults.) I'm thinking to do this by > recognizing the PG_TEST_TEMP_CONFIG environment variable as a > whitespace-separated list of file names for appending to postgresql.conf. Looking more closely, we already have the TEMP_CONFIG variable and apply it to everything except TAP suites. Closing that gap, as attached, is enough. The buildfarm client uses TEMP_CONFIG to implement its extra_config setting, so this will cause extra_config to start applying to TAP suites.
Вложения
Noah Misch <noah@leadboat.com> writes: > Looking more closely, we already have the TEMP_CONFIG variable and apply it to > everything except TAP suites. Closing that gap, as attached, is enough. The > buildfarm client uses TEMP_CONFIG to implement its extra_config setting, so > this will cause extra_config to start applying to TAP suites. Seems reasonable, but it might be a good idea to warn the buildfarm-owners list before committing. (Although I guess it wouldn't be hard to check the buildfarm database to see if anyone is putting anything interesting into their critters' TEMP_CONFIG.) Also, if we're to do this, it seems like applying it to back branches would be helpful --- but will it work in all the back branches? regards, tom lane
On Sat, Dec 29, 2018 at 10:46:31PM -0500, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > Looking more closely, we already have the TEMP_CONFIG variable and apply it to > > everything except TAP suites. Closing that gap, as attached, is enough. The > > buildfarm client uses TEMP_CONFIG to implement its extra_config setting, so > > this will cause extra_config to start applying to TAP suites. > > Seems reasonable, but it might be a good idea to warn the buildfarm-owners > list before committing. (Although I guess it wouldn't be hard to check > the buildfarm database to see if anyone is putting anything interesting > into their critters' TEMP_CONFIG.) Good idea. Here are the extra_config entries seen since 2018-12-01: archive_mode = off force_parallel_mode = regress fsync = off fsync = on jit=1 jit = 1 jit_above_cost=0 jit = on jit_optimize_above_cost=1000 log_checkpoints = 'true' log_connections = 'true' log_disconnections = 'true' log_line_prefix = '[%c:%l] ' log_line_prefix = '%m [%c:%l] ' log_line_prefix = '%m [%c:%l] %q%a ' log_line_prefix = '%m [%p:%l] ' log_line_prefix = '%m [%p:%l] %q%a ' log_line_prefix = '%m [%s %p:%l] %q%a ' log_statement = 'all' max_parallel_workers_per_gather = 2 max_parallel_workers_per_gather = 5 max_wal_senders = 0 shared_buffers = 10MB stats_temp_directory = '/cygdrive/w/lorikeet/HEAD' stats_temp_directory = '/cygdrive/w/lorikeet/REL_10_STABLE' stats_temp_directory = '/cygdrive/w/lorikeet/REL_11_STABLE' stats_temp_directory = '/cygdrive/w/lorikeet/REL9_4_STABLE' stats_temp_directory = '/cygdrive/w/lorikeet/REL9_5_STABLE' stats_temp_directory = '/cygdrive/w/lorikeet/REL9_6_STABLE' stats_temp_directory= '/home/buildfarm/data/stats_temp' wal_level = 'minimal' Problems: 1. max_wal_senders=0 and wal_level=minimal break a number of suites, e.g. pg_basebackup. 2. stats_temp_directory is incompatible with TAP suites that start more than one node simultaneously. Problem (1) goes away if I inject the TEMP_CONFIG settings earlier in the file, which seems defensible: --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -447,10 +447,13 @@ sub init print $conf "log_statement = all\n"; print $conf "log_replication_commands = on\n"; print $conf "wal_retrieve_retry_interval = '500ms'\n"; print $conf "port = $port\n"; + print $conf TestLib::slurp_file($ENV{TEMP_CONFIG}) + if defined $ENV{TEMP_CONFIG}; + if ($params{allows_streaming}) { if ($params{allows_streaming} eq "logical") { print $conf "wal_level = logical\n"; Problem (2) remains. It's already a problem for "make -j check-world". I'll give that one more thought. > Also, if we're to do this, it seems like applying it to back branches > would be helpful --- but will it work in all the back branches? Yes. TEMP_CONFIG exists in all supported branches, and the back-patch to 9.4 is no more complex. Before 9.6 (commit 87cc6b5), TEMP_CONFIG affected "make check" and the pg_upgrade test suite, but it did not affect other pg_regress suites like contrib/* and src/pl/*. We could back-patch commit 87cc6b5, if there's demand. I don't personally need it, because the tests I want to influence are all TAP tests.
On 12/30/18 12:53 AM, Noah Misch wrote: > On Sat, Dec 29, 2018 at 10:46:31PM -0500, Tom Lane wrote: >> Noah Misch <noah@leadboat.com> writes: >>> Looking more closely, we already have the TEMP_CONFIG variable and apply it to >>> everything except TAP suites. Closing that gap, as attached, is enough. The >>> buildfarm client uses TEMP_CONFIG to implement its extra_config setting, so >>> this will cause extra_config to start applying to TAP suites. >> Seems reasonable, but it might be a good idea to warn the buildfarm-owners >> list before committing. (Although I guess it wouldn't be hard to check >> the buildfarm database to see if anyone is putting anything interesting >> into their critters' TEMP_CONFIG.) > Good idea. Here are the extra_config entries seen since 2018-12-01: > > archive_mode = off > force_parallel_mode = regress > fsync = off > fsync = on > jit=1 > jit = 1 > jit_above_cost=0 > jit = on > jit_optimize_above_cost=1000 > log_checkpoints = 'true' > log_connections = 'true' > log_disconnections = 'true' > log_line_prefix = '[%c:%l] ' > log_line_prefix = '%m [%c:%l] ' > log_line_prefix = '%m [%c:%l] %q%a ' > log_line_prefix = '%m [%p:%l] ' > log_line_prefix = '%m [%p:%l] %q%a ' > log_line_prefix = '%m [%s %p:%l] %q%a ' > log_statement = 'all' > max_parallel_workers_per_gather = 2 > max_parallel_workers_per_gather = 5 > max_wal_senders = 0 > shared_buffers = 10MB > stats_temp_directory = '/cygdrive/w/lorikeet/HEAD' > stats_temp_directory = '/cygdrive/w/lorikeet/REL_10_STABLE' > stats_temp_directory = '/cygdrive/w/lorikeet/REL_11_STABLE' > stats_temp_directory = '/cygdrive/w/lorikeet/REL9_4_STABLE' > stats_temp_directory = '/cygdrive/w/lorikeet/REL9_5_STABLE' > stats_temp_directory = '/cygdrive/w/lorikeet/REL9_6_STABLE' > stats_temp_directory= '/home/buildfarm/data/stats_temp' > wal_level = 'minimal' > > Problems: > > 1. max_wal_senders=0 and wal_level=minimal break a number of suites, > e.g. pg_basebackup. > 2. stats_temp_directory is incompatible with TAP suites that start more than > one node simultaneously. > > Problem (1) goes away if I inject the TEMP_CONFIG settings earlier in the > file, which seems defensible: > > --- a/src/test/perl/PostgresNode.pm > +++ b/src/test/perl/PostgresNode.pm > @@ -447,10 +447,13 @@ sub init > print $conf "log_statement = all\n"; > print $conf "log_replication_commands = on\n"; > print $conf "wal_retrieve_retry_interval = '500ms'\n"; > print $conf "port = $port\n"; > > + print $conf TestLib::slurp_file($ENV{TEMP_CONFIG}) > + if defined $ENV{TEMP_CONFIG}; > + > if ($params{allows_streaming}) > { > if ($params{allows_streaming} eq "logical") > { > print $conf "wal_level = logical\n"; > > Problem (2) remains. It's already a problem for "make -j check-world". I'll > give that one more thought. > lorikeet is putting the stats_temp directory on a ramdisk. This is worth testing in any case, but in lorikeet's case was done to help speed up the tests. When I had a Raspberry Pi instance I did something similar, for the same reason. The obvious quick fix would be to have PostgresNode.pm set this to the default after inserting the TEMP_CONFIG file. There are a couple of problems here that bear further consideration. First, that the stats_temp_directory has to exist, and second that there is no convenient way to make it unique. It would be nice if a) the directory could be created if it didn't exist and b) some place-holder in the name could be replaced by a unique identifier such as the node id. If there is interest I'll work on these. One problem I foresee is that it might lead to a plethora of stats temp directories being left around. Still thinking about how we should deal with that. In the buildfarm client I'd be tempted to create a directory to hold all the run's stats_temp_directories and then clean it up at the end of the run. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Dec 30, 2018 at 10:32:31AM -0500, Andrew Dunstan wrote: > On 12/30/18 12:53 AM, Noah Misch wrote: > > stats_temp_directory = '/cygdrive/w/lorikeet/HEAD' > > 2. stats_temp_directory is incompatible with TAP suites that start more than > > one node simultaneously. > > It's already a problem for "make -j check-world". > lorikeet is putting the stats_temp directory on a ramdisk. This is worth > testing in any case, but in lorikeet's case was done to help speed up > the tests. When I had a Raspberry Pi instance I did something similar, > for the same reason. I, too, value the ability to override stats_temp_directory for test runs. (I get stats.sql failures at high load, even on a high-performance machine. Using stats_temp_directory may fix that.) > The obvious quick fix would be to have PostgresNode.pm set this to the > default after inserting the TEMP_CONFIG file. True. > There are a couple of problems here that bear further consideration. > First, that the stats_temp_directory has to exist, and second that there > is no convenient way to make it unique. It would be nice if a) the > directory could be created if it didn't exist and b) some place-holder > in the name could be replaced by a unique identifier such as the node > id. If there is interest I'll work on these. One problem I foresee is > that it might lead to a plethora of stats temp directories being left > around. Still thinking about how we should deal with that. In the > buildfarm client I'd be tempted to create a directory to hold all the > run's stats_temp_directories and then clean it up at the end of the run. I'm thinking the server should manage this; during startup, create $stats_temp_directory/PostgreSQL.$postmaster_pid and store each stats file therein. Just before creating that directory, scan $stats_temp_directory and delete subdirectories that no longer correspond to live PIDs. Subdirectories would not build up over time, even if one deletes a test data directory while its subdirectory of stats_temp_directory still exists. For non-test applications, this makes stats_temp_directory safer to use. Today, we don't detect two clusters using the same stats_temp_directory. We don't even document that it's unacceptable. This makes it acceptable. Thanks, nm
Noah Misch <noah@leadboat.com> writes: > I'm thinking the server should manage this; during startup, create > $stats_temp_directory/PostgreSQL.$postmaster_pid and store each stats file > therein. +1 > Just before creating that directory, scan $stats_temp_directory and > delete subdirectories that no longer correspond to live PIDs. Hm, seems potentially racy, if multiple postmasters are starting at the same time. The only one you can be *sure* is dead is one with your own PID. regards, tom lane
On Sun, Dec 30, 2018 at 07:47:05PM -0500, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > I'm thinking the server should manage this; during startup, create > > $stats_temp_directory/PostgreSQL.$postmaster_pid and store each stats file > > therein. > > +1 > > > Just before creating that directory, scan $stats_temp_directory and > > delete subdirectories that no longer correspond to live PIDs. > > Hm, seems potentially racy, if multiple postmasters are starting > at the same time. The only one you can be *sure* is dead is one > with your own PID. True; if a PID=123 postmaster launches and completes startup after a slightly-older PID=122 postmaster issues kill(123, 0) and before PID=122 issues unlink()s, PID=122 unlinks files wrongly. I think I would fix that with fcntl(F_SETLKW)/Windows-equivalent on some file in stats_temp_directory. One would acquire the lock before deleting a subdirectory, except for a postmaster deleting the directory it created. (If a postmaster finds a stale directory for its PID, delete that directory and create a new one. Like any deletion, one must hold the lock.) (Alternately, one could just accept that risk.) Another problem comes to mind; if postmasters of different UIDs share a stats_temp_directory, then a PID=1234 postmaster may find itself unable to delete the stale PostgreSQL.1234 subdirectory owned by some other UID. To fix that, the name pattern probably should be $stats_temp_directory/PostgreSQL.$euid.$postmaster_pid.
On 12/30/18 11:06 PM, Noah Misch wrote: > On Sun, Dec 30, 2018 at 07:47:05PM -0500, Tom Lane wrote: >> Noah Misch <noah@leadboat.com> writes: >>> I'm thinking the server should manage this; during startup, create >>> $stats_temp_directory/PostgreSQL.$postmaster_pid and store each stats file >>> therein. >> +1 >> >>> Just before creating that directory, scan $stats_temp_directory and >>> delete subdirectories that no longer correspond to live PIDs. >> Hm, seems potentially racy, if multiple postmasters are starting >> at the same time. The only one you can be *sure* is dead is one >> with your own PID. > True; if a PID=123 postmaster launches and completes startup after a > slightly-older PID=122 postmaster issues kill(123, 0) and before PID=122 > issues unlink()s, PID=122 unlinks files wrongly. I think I would fix that > with fcntl(F_SETLKW)/Windows-equivalent on some file in stats_temp_directory. > One would acquire the lock before deleting a subdirectory, except for a > postmaster deleting the directory it created. (If a postmaster finds a stale > directory for its PID, delete that directory and create a new one. Like any > deletion, one must hold the lock.) > > (Alternately, one could just accept that risk.) > > Another problem comes to mind; if postmasters of different UIDs share a > stats_temp_directory, then a PID=1234 postmaster may find itself unable to > delete the stale PostgreSQL.1234 subdirectory owned by some other UID. To fix > that, the name pattern probably should be > $stats_temp_directory/PostgreSQL.$euid.$postmaster_pid. I like this scheme. It will certainly make using a RAMdisk simpler for buildfarm members. +1 for locking rather than running the risk of incorrect deletions. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Dec 30, 2018 at 10:32:31AM -0500, Andrew Dunstan wrote: > On 12/30/18 12:53 AM, Noah Misch wrote: > > 2. stats_temp_directory is incompatible with TAP suites that start more than > > one node simultaneously. > The obvious quick fix would be to have PostgresNode.pm set this to the > default after inserting the TEMP_CONFIG file. I'd like to get $SUBJECT in place for variables other than stats_temp_directory, using your quick fix idea. Attached. When its time comes, your stats_temp_directory work can delete that section.
Вложения
On Sun, Apr 7, 2019 at 2:41 AM Noah Misch <noah@leadboat.com> wrote: > > On Sun, Dec 30, 2018 at 10:32:31AM -0500, Andrew Dunstan wrote: > > On 12/30/18 12:53 AM, Noah Misch wrote: > > > 2. stats_temp_directory is incompatible with TAP suites that start more than > > > one node simultaneously. > > > The obvious quick fix would be to have PostgresNode.pm set this to the > > default after inserting the TEMP_CONFIG file. > > I'd like to get $SUBJECT in place for variables other than > stats_temp_directory, using your quick fix idea. Attached. When its time > comes, your stats_temp_directory work can delete that section. Looks good. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Apr 07, 2019 at 07:56:02AM -0400, Andrew Dunstan wrote: > On Sun, Apr 7, 2019 at 2:41 AM Noah Misch <noah@leadboat.com> wrote: > > > > On Sun, Dec 30, 2018 at 10:32:31AM -0500, Andrew Dunstan wrote: > > > On 12/30/18 12:53 AM, Noah Misch wrote: > > > > 2. stats_temp_directory is incompatible with TAP suites that start more than > > > > one node simultaneously. > > > > > The obvious quick fix would be to have PostgresNode.pm set this to the > > > default after inserting the TEMP_CONFIG file. > > > > I'd like to get $SUBJECT in place for variables other than > > stats_temp_directory, using your quick fix idea. Attached. When its time > > comes, your stats_temp_directory work can delete that section. > > Looks good. Pushed. This broke 010_dump_connstr.pl on bowerbird, introducing 'invalid byte sequence for encoding "UTF8"' errors. That's because log_connections renders this 010_dump_connstr.pl solution insufficient: # In a SQL_ASCII database, pgwin32_message_to_UTF16() needs to # interpret everything as UTF8. We're going to use byte sequences # that aren't valid UTF-8 strings, so that would fail. Use LATIN1, # which accepts any byte and has a conversion from each byte to UTF-8. $ENV{LC_ALL} = 'C'; $ENV{PGCLIENTENCODING} = 'LATIN1'; The log_connections message prints before CheckMyDatabase() calls pg_perm_setlocale() to activate that LATIN1 database encoding. Since bowerbird does a non-NLS build, GetMessageEncoding()==PG_SQL_ASCII at that time. Some options: 1. Make this one test explicitly set log_connections = off. This workaround restores what we had a day ago. 2. Move the log_connections message after CheckMyDatabase() calls pg_perm_setlocale(), so it gets regular post-startup encoding treatment. That fixes this particular test. It's still wrong when a database's name is not valid in that database's encoding. 3. If GetMessageEncoding()==PG_SQL_ASCII, make pgwin32_message_to_UTF16() assume the text is already UTF8, like it does when not in a transaction. If UTF8->UTF16 conversion fails, the caller will send untranslated bytes to write() or ReportEventA(). 4. If GetMessageEncoding()==PG_SQL_ASCII, make pgwin32_message_to_UTF16() return NULL. The caller will always send untranslated bytes to write() or ReportEventA(). This seems consistent with the SQL_ASCII concept and with pg_do_encoding_conversion()'s interpretation of SQL_ASCII. 5. When including a datname or rolname value in a message, hex-escape non-ASCII bytes. They are byte sequences, not text of known encoding. This preserves the most information, but it's overkill and ugly in the probably-common case of one encoding across all databases of a cluster. I'm inclined to do (1) in back branches and (4) in HEAD only. (If starting fresh today, I would store the encoding of each rolname and dbname or just use UTF8 for those particular fields.) Other preferences? Thanks, nm
Noah Misch <noah@leadboat.com> writes: > Pushed. This broke 010_dump_connstr.pl on bowerbird, introducing 'invalid > byte sequence for encoding "UTF8"' errors. That's because log_connections > renders this 010_dump_connstr.pl solution insufficient: Ugh. > 4. If GetMessageEncoding()==PG_SQL_ASCII, make pgwin32_message_to_UTF16() > return NULL. The caller will always send untranslated bytes to write() or > ReportEventA(). This seems consistent with the SQL_ASCII concept and with > pg_do_encoding_conversion()'s interpretation of SQL_ASCII. > 5. When including a datname or rolname value in a message, hex-escape > non-ASCII bytes. They are byte sequences, not text of known encoding. > This preserves the most information, but it's overkill and ugly in the > probably-common case of one encoding across all databases of a cluster. > I'm inclined to do (1) in back branches and (4) in HEAD only. (If starting > fresh today, I would store the encoding of each rolname and dbname or just use > UTF8 for those particular fields.) Other preferences? I agree that (4) is a fairly reasonable thing to do, and wouldn't mind back-patching that. Taking a wider view, this seems closely related to something I've been thinking about in connection with the recent pg_stat_activity contretemps: that mechanism is also shoving strings across database boundaries without a lot of worry about encodings. Maybe we should try to develop a common solution. One difference from the datname/rolname situation is that for pg_stat_activity we can know the source encoding --- we aren't storing it now, but we easily could. If we're thinking of a future solution only, adding a "name encoding" field to relevant shared catalogs makes sense perhaps. Alternatively, requiring names in shared catalogs to be UTF8 might be a reasonable answer too. In all these cases, throwing an error when we can't translate a character into the destination encoding is not very pleasant. For pg_stat_activity, I was imagining that translating such characters to '?' might be the best answer. I don't know if we can get away with that for the datname/rolname case --- at the very least, it opens problems with apparent duplication of names that should be unique. I don't much like your hex-encoding answer, though; that has its own uniqueness-violation hazards, plus it's ugly. I don't have a strong feeling about what's best. regards, tom lane
On Sat, May 11, 2019 at 10:43:59PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > Pushed. This broke 010_dump_connstr.pl on bowerbird, introducing 'invalid > > byte sequence for encoding "UTF8"' errors. That's because log_connections > > renders this 010_dump_connstr.pl solution insufficient: > > Ugh. > > > 4. If GetMessageEncoding()==PG_SQL_ASCII, make pgwin32_message_to_UTF16() > > return NULL. The caller will always send untranslated bytes to write() or > > ReportEventA(). This seems consistent with the SQL_ASCII concept and with > > pg_do_encoding_conversion()'s interpretation of SQL_ASCII. > > > 5. When including a datname or rolname value in a message, hex-escape > > non-ASCII bytes. They are byte sequences, not text of known encoding. > > This preserves the most information, but it's overkill and ugly in the > > probably-common case of one encoding across all databases of a cluster. > > > I'm inclined to do (1) in back branches and (4) in HEAD only. (If starting > > fresh today, I would store the encoding of each rolname and dbname or just use > > UTF8 for those particular fields.) Other preferences? > > I agree that (4) is a fairly reasonable thing to do, and wouldn't mind > back-patching that. Okay. Absent objections, I'll just do it that way. > Taking a wider view, this seems closely related > to something I've been thinking about in connection with the recent > pg_stat_activity contretemps: that mechanism is also shoving strings > across database boundaries without a lot of worry about encodings. > Maybe we should try to develop a common solution. > > One difference from the datname/rolname situation is that for > pg_stat_activity we can know the source encoding --- we aren't storing > it now, but we easily could. If we're thinking of a future solution > only, adding a "name encoding" field to relevant shared catalogs makes > sense perhaps. Alternatively, requiring names in shared catalogs to be > UTF8 might be a reasonable answer too. > > In all these cases, throwing an error when we can't translate a character > into the destination encoding is not very pleasant. For pg_stat_activity, > I was imagining that translating such characters to '?' might be the best > answer. I don't know if we can get away with that for the datname/rolname > case --- at the very least, it opens problems with apparent duplication of > names that should be unique. I don't much like your hex-encoding answer, > though; that has its own uniqueness-violation hazards, plus it's ugly. Another case of byte sequence masquerading as text is pg_settings.setting. In most contexts, it's important to convey exact values. Error messages can use '?'. I wouldn't let dump/reload of a rolname corrupt it that way, and I wouldn't recognize the '?' version for authentication. While pg_stat_activity.query could use '?', I'd encourage adding bytea and encoding columns for exact transmission. pg_stat_activity can't standardize on UTF8 without shrinking the set of valid queries or inaccurately reporting some, neither of which is attractive. datname/rolname could afford to be more prescriptive, since non-ASCII names are full of bugs today. A useful consequence of UTF8 datname/rolname would be today's "pg_dumpall --globals" remaining simple. If we were to support arbitrary encodings with a "name encoding" field, the general-case equivalent of "pg_dumpall --globals" would connect to several databases of different encodings in order to dump all objects, perhaps even creating a temporary database if no suitable-encoding database existed. MULE_INTERNAL presents trouble since we don't have a UTF8<->MULE_INTERNAL conversion. If we standardized cross-database strings on UTF8, it would be impossible to read such strings, create roles, etc. from a MULE_INTERNAL database. I suppose we'd either add the conversion or deprecate MULE_INTERNAL, forbidding its use as the initdb encoding.