Обсуждение: BUG #13143: Cannot stop and restart a streaming server with a replication slot

Поиск
Список
Период
Сортировка

BUG #13143: Cannot stop and restart a streaming server with a replication slot

От
pdrolet@infodata.ca
Дата:
The following bug has been logged on the website:

Bug reference:      13143
Logged by:          Patrice Drolet
Email address:      pdrolet@infodata.ca
PostgreSQL version: 9.4.1
Operating system:   Windows 2008r2
Description:

I have experienced it many times. The master streams to the slave for days
and no problem (using a replication slot). If I stop the master, it does not
want to restart and I have this error in the log:

2015-04-24 04:47:12 EDT LOG:  le système de bases de données a été arrêté à
2015-04-24 04:44:37 EDT
2015-04-24 04:47:12 EDT PANIC:  n'a pas pu synchroniser sur disque (fsync)
le fichier « pg_replslot/node_win2012sec/state » : Bad file descriptor
2015-04-24 04:47:12 EDT LOG:  processus de lancement (PID 23180) quitte avec
le code de sortie 3
2015-04-24 04:47:12 EDT LOG:  annulation du démarrage à cause d'un échec
dans le processus de lancement

To restart the server, I have to manually delete the folder in pg_replslot.
But then I need to re build the slave. Not very practical for a multi
gigabyte database.

Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot

От
Andres Freund
Дата:
Hi,

On 2015-04-24 10:10:06 +0000, pdrolet@infodata.ca wrote:
> The following bug has been logged on the website:
>
> Bug reference:      13143
> Logged by:          Patrice Drolet
> Email address:      pdrolet@infodata.ca
> PostgreSQL version: 9.4.1
> Operating system:   Windows 2008r2
> Description:
>
> I have experienced it many times. The master streams to the slave for days
> and no problem (using a replication slot). If I stop the master, it does not
> want to restart and I have this error in the log:
>
> 2015-04-24 04:47:12 EDT LOG:  le système de bases de données a été arrêté à
> 2015-04-24 04:44:37 EDT
> 2015-04-24 04:47:12 EDT PANIC:  n'a pas pu synchroniser sur disque (fsync)
> le fichier « pg_replslot/node_win2012sec/state » : Bad file descriptor
> 2015-04-24 04:47:12 EDT LOG:  processus de lancement (PID 23180) quitte avec
> le code de sortie 3
> 2015-04-24 04:47:12 EDT LOG:  annulation du démarrage à cause d'un échec
> dans le processus de lancement
>
> To restart the server, I have to manually delete the folder in pg_replslot.
> But then I need to re build the slave. Not very practical for a multi
> gigabyte database.

Obviously that's not how it supposed to be. I don't have access to a
windows systems, much less a french one unfortunately.

Could you:
1) describe your exact setup
2) Check that it's unrelated to any anti-virus software running?
3) configure 'log_error_verbosity = verbose'? Then we'll get line
   numbers, which will help narrowing down what's happening.
4) You could try to debug it by installing sysinternal's sysmon and
   recording what is exactly done with that file?

Regards,

Andres

Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot

От
Alvaro Herrera
Дата:
Andres Freund wrote:

> On 2015-04-24 10:10:06 +0000, pdrolet@infodata.ca wrote:

> > 2015-04-24 04:47:12 EDT LOG:  le système de bases de données a été arrêté à
> > 2015-04-24 04:44:37 EDT
> > 2015-04-24 04:47:12 EDT PANIC:  n'a pas pu synchroniser sur disque (fsync)
> > le fichier « pg_replslot/node_win2012sec/state » : Bad file descriptor
> > 2015-04-24 04:47:12 EDT LOG:  processus de lancement (PID 23180) quitte avec
> > le code de sortie 3
> > 2015-04-24 04:47:12 EDT LOG:  annulation du démarrage à cause d'un échec
> > dans le processus de lancement
> >
> > To restart the server, I have to manually delete the folder in pg_replslot.
> > But then I need to re build the slave. Not very practical for a multi
> > gigabyte database.
>
> Obviously that's not how it supposed to be. I don't have access to a
> windows systems, much less a french one unfortunately.

I think this is failing in the fsync_fname() call in slot.c line 1045
(REL9_4_STABLE).  Notice it's in a critical section (hence PANIC) and
isdir=false.  This happens just after the rename() from tmppath to path;
maybe the file is "busy" and could not be renamed?  Anyway the rename
itself didn't fail, and the file (under the new name) could be opened by
fd.c, otherwise the error would say "could not open" instead of "could
not fsync".

There are many other callers of rename() and none of them seem to have
special cases for WIN32 specifically; they all assume it works.  (Some
of them are in turn special cases related to link/unlink).

The vast majority of callers of fsync_fname() are related to logical
decoding, so it seems fair game to assume that that code is missing a
trick or two.

> 2) Check that it's unrelated to any anti-virus software running?

It seems likely that something like this is related.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot

От
Andres Freund
Дата:
On 2015-04-27 11:44:47 -0300, Alvaro Herrera wrote:
> I think this is failing in the fsync_fname() call in slot.c line 1045
> (REL9_4_STABLE).

Patrice has since replied with log_error_verbosity=verbose logs, but
that reply is probably still stuck in moderation:

> 2015-04-25 14:25:59 EDT LOG:  00000: le système de bases de données a été arrêté à 2015-04-25 14:25:39 EDT
> 2015-04-25 14:25:59 EDT EMPLACEMENT :  StartupXLOG, src\backend\access\transam\xlog.c:6011
> 2015-04-25 14:25:59 EDT PANIC:  XX000: n'a pas pu synchroniser sur disque (fsync) le fichier «
pg_replslot/node_win2008sec/state» : Bad file descriptor 
> 2015-04-25 14:25:59 EDT EMPLACEMENT :  RestoreSlotFromDisk, src\backend\replication\slot.c:1115
> 2015-04-25 14:25:59 EDT LOG:  00000: processus de lancement (PID 2696) a été arrêté par l'exception 0xC0000409
> 2015-04-25 14:25:59 EDT ASTUCE :  Voir le fichier d'en-tête C « ntstatus.h » pour une description de la valeur
>     hexadécimale.
> 2015-04-25 14:25:59 EDT EMPLACEMENT :  LogChildExit, src\backend\postmaster\postmaster.c:3336
> 2015-04-25 14:25:59 EDT LOG:  00000: annulation du démarrage à cause d'un échec dans le processus de lancement
> 2015-04-25 14:25:59 EDT EMPLACEMENT :  reaper, src\backend\postmaster\postmaster.c:2604

So it looks to me like it's a straight pg_fsync() failing. Given that
the open apparently succeeded I'm unsure how that could be. The error
message appears to be a EBADFD.

Hm. I wonder if it's maybe that the file is opened with O_RDONLY? The
OSs I have access to don't care - for good reason imo, fsync isn't a
write - but it's not inconceivable that windows might.  I very dimly
remember that that was a problem before at some point. Yep:
http://archives.postgresql.org/message-id/10494.1266903446%40sss.pgh.pa.us

So that's easy enough fixed.

Greetings,

Andres Freund

Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot

От
Patrice Drolet
Дата:
Hi,

Here is the log with verbose:

2015-04-25 14:25:59 EDT LOG:  00000: le syst=E8me de bases de donn=E9es =
a =E9t=E9 arr=EAt=E9 =E0 2015-04-25 14:25:39 EDT
2015-04-25 14:25:59 EDT EMPLACEMENT :  StartupXLOG, =
src\backend\access\transam\xlog.c:6011
2015-04-25 14:25:59 EDT PANIC:  XX000: n'a pas pu synchroniser sur =
disque (fsync) le fichier =AB pg_replslot/node_win2008sec/state =BB : =
Bad file descriptor
2015-04-25 14:25:59 EDT EMPLACEMENT :  RestoreSlotFromDisk, =
src\backend\replication\slot.c:1115
2015-04-25 14:25:59 EDT LOG:  00000: processus de lancement (PID 2696) a =
=E9t=E9 arr=EAt=E9 par l'exception 0xC0000409
2015-04-25 14:25:59 EDT ASTUCE :  Voir le fichier d'en-t=EAte C =AB =
ntstatus.h =BB pour une description de la valeur
    hexad=E9cimale.
2015-04-25 14:25:59 EDT EMPLACEMENT :  LogChildExit, =
src\backend\postmaster\postmaster.c:3336
2015-04-25 14:25:59 EDT LOG:  00000: annulation du d=E9marrage =E0 cause =
d'un =E9chec dans le processus de lancement
2015-04-25 14:25:59 EDT EMPLACEMENT :  reaper, =
src\backend\postmaster\postmaster.c:2604


As I said, this is a stream replication between 2 windows 64b using pg =
9.4.1.

Here is my postgresql.conf:
=97=97=97=97=97=97=97=97=97=97=97=97=97=97=97=97=97

wal_level =3D hot_standby
max_wal_senders =3D 3
checkpoint_segments =3D 16
wal_keep_segments =3D 32

=
#-------------------------------------------------------------------------=
-----
# FILE LOCATIONS
=
#-------------------------------------------------------------------------=
-----

# The default values of these variables are driven from the -D =
command-line
# option or PGDATA environment variable, represented here as ConfigDir.

#data_directory =3D 'ConfigDir'        # use data in another directory
                    # (change requires restart)
#hba_file =3D 'ConfigDir/pg_hba.conf'    # host-based authentication file
                    # (change requires restart)
#ident_file =3D 'ConfigDir/pg_ident.conf'    # ident configuration =
file
                    # (change requires restart)

# If external_pid_file is not explicitly set, no extra PID file is =
written.
#external_pid_file =3D ''            # write an extra PID =
file
                    # (change requires restart)


=
#-------------------------------------------------------------------------=
-----
# CONNECTIONS AND AUTHENTICATION
=
#-------------------------------------------------------------------------=
-----

# - Connection Settings -

listen_addresses =3D '*'        # what IP address(es) to listen =
on;
                    # comma-separated list of =
addresses;
                    # defaults to 'localhost'; use =
'*' for all
                    # (change requires restart)
port =3D 5434                # (change requires restart)
max_connections =3D 100            # (change requires restart)
# Note:  Increasing max_connections costs ~400 bytes of shared memory =
per
# connection slot, plus lock space (see max_locks_per_transaction).
#superuser_reserved_connections =3D 3    # (change requires restart)
#unix_socket_directories =3D ''    # comma-separated list of directories
                    # (change requires restart)
#unix_socket_group =3D ''            # (change requires =
restart)
#unix_socket_permissions =3D 0777        # begin with 0 to use =
octal notation
                    # (change requires restart)
#bonjour =3D off                # advertise server via =
Bonjour
                    # (change requires restart)
#bonjour_name =3D ''            # defaults to the computer name
                    # (change requires restart)

# - Security and Authentication -

#authentication_timeout =3D 1min        # 1s-600s
#ssl =3D off                # (change requires restart)
#ssl_ciphers =3D 'HIGH:MEDIUM:+3DES:!aNULL' # allowed SSL ciphers
                    # (change requires restart)
#ssl_prefer_server_ciphers =3D on        # (change requires =
restart)
#ssl_ecdh_curve =3D 'prime256v1'        # (change requires =
restart)
#ssl_renegotiation_limit =3D 512MB    # amount of data between =
renegotiations
#ssl_cert_file =3D 'server.crt'        # (change requires restart)
#ssl_key_file =3D 'server.key'        # (change requires restart)
#ssl_ca_file =3D ''            # (change requires restart)
#ssl_crl_file =3D ''            # (change requires restart)
#password_encryption =3D on
#db_user_namespace =3D off

# GSSAPI using Kerberos
#krb_server_keyfile =3D ''
#krb_caseins_users =3D off

# - TCP Keepalives -
# see "man 7 tcp" for details

#tcp_keepalives_idle =3D 0        # TCP_KEEPIDLE, in seconds;
                    # 0 selects the system default
#tcp_keepalives_interval =3D 0        # TCP_KEEPINTVL, in seconds;
                    # 0 selects the system default
#tcp_keepalives_count =3D 0        # TCP_KEEPCNT;
                    # 0 selects the system default


=
#-------------------------------------------------------------------------=
-----
# RESOURCE USAGE (except WAL)
=
#-------------------------------------------------------------------------=
-----

# - Memory -

shared_buffers =3D 3072MB            # min 128kB
                    # (change requires restart)
#huge_pages =3D try            # on, off, or try
                    # (change requires restart)
temp_buffers =3D 8MB            # min 800kB
#max_prepared_transactions =3D 0        # zero disables the =
feature
                    # (change requires restart)
# Note:  Increasing max_prepared_transactions costs ~600 bytes of shared =
memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless =
you
# actively intend to use prepared transactions.
work_mem =3D 256MB                # LIDI 4MB *** min 64kB
maintenance_work_mem =3D 256MB        # min 1MB
#autovacuum_work_mem =3D -1        # min 1MB, or -1 to use =
maintenance_work_mem
#max_stack_depth =3D 2MB            # min 100kB
dynamic_shared_memory_type =3D windows    # the default is the first =
option
                    # supported by the operating =
system:
                    #   posix
                    #   sysv
                    #   windows
                    #   mmap
                    # use none to disable dynamic =
shared memory

# - Disk -

#temp_file_limit =3D -1            # limits per-session temp file =
space
                    # in kB, or -1 for no limit

# - Kernel Resource Usage -

#max_files_per_process =3D 1000        # min 25
                    # (change requires restart)
#shared_preload_libraries =3D ''        # (change requires =
restart)

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay =3D 0            # 0-100 milliseconds
#vacuum_cost_page_hit =3D 1        # 0-10000 credits
#vacuum_cost_page_miss =3D 10        # 0-10000 credits
#vacuum_cost_page_dirty =3D 20        # 0-10000 credits
#vacuum_cost_limit =3D 200        # 1-10000 credits

# - Background Writer -

#bgwriter_delay =3D 200ms            # 10-10000ms between =
rounds
#bgwriter_lru_maxpages =3D 100        # 0-1000 max buffers =
written/round
#bgwriter_lru_multiplier =3D 2.0        # 0-10.0 multipler on =
buffers scanned/round

# - Asynchronous Behavior -

#effective_io_concurrency =3D 1        # 1-1000; 0 disables prefetching
#max_worker_processes =3D 8


=
#-------------------------------------------------------------------------=
-----
# WRITE AHEAD LOG
=
#-------------------------------------------------------------------------=
-----

# - Settings -

#wal_level =3D minimal            # minimal, archive, hot_standby, =
or logical
                    # (change requires restart)
#fsync =3D on                # turns forced synchronization =
on or off
#synchronous_commit =3D on        # synchronization level;
                    # off, local, remote_write, or =
on
#wal_sync_method =3D fsync        # the default is the first =
option
                    # supported by the operating =
system:
                    #   open_datasync
                    #   fdatasync (default on Linux)
                    #   fsync
                    #   fsync_writethrough
                    #   open_sync
#full_page_writes =3D on            # recover from partial =
page writes
#wal_log_hints =3D off            # also do full page writes of =
non-critical updates
                    # (change requires restart)
#wal_buffers =3D -1            # min 32kB, -1 sets based on =
shared_buffers
                    # (change requires restart)
#wal_writer_delay =3D 200ms        # 1-10000 milliseconds

#commit_delay =3D 0            # range 0-100000, in =
microseconds
#commit_siblings =3D 5            # range 1-1000

# - Checkpoints -

checkpoint_segments =3D 90        # in logfile segments, min 1, =
16MB each
checkpoint_timeout =3D 5min        # range 30s-1h
checkpoint_completion_target =3D 0.8    # checkpoint target duration, =
0.0 - 1.0
#checkpoint_warning =3D 30s        # 0 disables

# - Archiving -

#archive_mode =3D off        # allows archiving to be done
                # (change requires restart)
#archive_command =3D ''        # command to use to archive a logfile =
segment
                # placeholders: %p =3D path of file to =
archive
                #               %f =3D file name only
                # e.g. 'test ! -f =
/mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout =3D 0        # force a logfile segment switch after =
this
                # number of seconds; 0 disables


=
#-------------------------------------------------------------------------=
-----
# REPLICATION
=
#-------------------------------------------------------------------------=
-----

# - Sending Server(s) -

# Set these on the master and on any standby that will send replication =
data.

#max_wal_senders =3D 0        # max number of walsender processes
                # (change requires restart)
#wal_keep_segments =3D 0        # in logfile segments, 16MB =
each; 0 disables
#wal_sender_timeout =3D 60s    # in milliseconds; 0 disables

max_replication_slots =3D 1    # max number of replication slots
                # (change requires restart)

# - Master Server -

# These settings are ignored on a standby server.

#synchronous_standby_names =3D ''    # standby servers that provide =
sync rep
                # comma-separated list of =
application_name
                # from standby(s); '*' =3D all
#vacuum_defer_cleanup_age =3D 0    # number of xacts by which cleanup is =
delayed

# - Standby Servers -

# These settings are ignored on a master server.

#hot_standby =3D off            # "on" allows queries during =
recovery
                    # (change requires restart)
#max_standby_archive_delay =3D 30s    # max delay before canceling =
queries
                    # when reading WAL from archive;
                    # -1 allows indefinite delay
#max_standby_streaming_delay =3D 30s    # max delay before canceling =
queries
                    # when reading streaming WAL;
                    # -1 allows indefinite delay
#wal_receiver_status_interval =3D 10s    # send replies at least this =
often
                    # 0 disables
#hot_standby_feedback =3D off        # send info from standby to =
prevent
                    # query conflicts
#wal_receiver_timeout =3D 60s        # time that receiver waits for
                    # communication from master
                    # in milliseconds; 0 disables


=
#-------------------------------------------------------------------------=
-----
# QUERY TUNING
=
#-------------------------------------------------------------------------=
-----

# - Planner Method Configuration -

#enable_bitmapscan =3D on
#enable_hashagg =3D on
#enable_hashjoin =3D on
#enable_indexscan =3D on
#enable_indexonlyscan =3D on
#enable_material =3D on
#enable_mergejoin =3D on
#enable_nestloop =3D on
#enable_seqscan =3D on
#enable_sort =3D on
#enable_tidscan =3D on

# - Planner Cost Constants -

#seq_page_cost =3D 1.0            # measured on an arbitrary scale
random_page_cost =3D 2.0            # same scale as above
#cpu_tuple_cost =3D 0.01            # same scale as above
#cpu_index_tuple_cost =3D 0.005        # same scale as above
#cpu_operator_cost =3D 0.0025        # same scale as above
effective_cache_size =3D 6GB

# - Genetic Query Optimizer -

#geqo =3D on
geqo_threshold =3D 16
geqo_effort =3D 2            # range 1-10
#geqo_pool_size =3D 0            # selects default based on =
effort
#geqo_generations =3D 0            # selects default based on =
effort
#geqo_selection_bias =3D 2.0        # range 1.5-2.0
#geqo_seed =3D 0.0            # range 0.0-1.0

# - Other Planner Options -

#default_statistics_target =3D 100    # range 1-10000
#constraint_exclusion =3D partition    # on, off, or partition
#cursor_tuple_fraction =3D 0.1        # range 0.0-1.0
#from_collapse_limit =3D 8
#join_collapse_limit =3D 8        # 1 disables collapsing of =
explicit
                    # JOIN clauses


=
#-------------------------------------------------------------------------=
-----
# ERROR REPORTING AND LOGGING
=
#-------------------------------------------------------------------------=
-----

# - Where to Log -

log_destination =3D 'stderr'        # Valid values are combinations =
of
                    # stderr, csvlog, syslog, and =
eventlog,
                    # depending on platform.  csvlog
                    # requires logging_collector to =
be on.

# This is used when logging to stderr:
logging_collector =3D on        # Enable capturing of stderr and =
csvlog
                    # into log files. Required to be =
on for
                    # csvlogs.
                    # (change requires restart)

# These are only used if logging_collector is on:
#log_directory =3D 'pg_log'        # directory where log files are =
written,
                    # can be absolute or relative to =
PGDATA
#log_filename =3D 'postgresql-%Y-%m-%d_%H%M%S.log'    # log file name =
pattern,
                    # can include strftime() escapes
#log_file_mode =3D 0600            # creation mode for log files,
                    # begin with 0 to use octal =
notation
#log_truncate_on_rotation =3D off        # If on, an existing log =
file with the
                    # same name as the new log file =
will be
                    # truncated rather than appended =
to.
                    # But such truncation only =
occurs on
                    # time-driven rotation, not on =
restarts
                    # or size-driven rotation.  =
Default is
                    # off, meaning append to =
existing files
                    # in all cases.
#log_rotation_age =3D 1d            # Automatic rotation of =
logfiles will
                    # happen after that time.  0 =
disables.
#log_rotation_size =3D 10MB        # Automatic rotation of logfiles =
will
                    # happen after that much log =
output.
                    # 0 disables.

# These are relevant when logging to syslog:
#syslog_facility =3D 'LOCAL0'
#syslog_ident =3D 'postgres'

# This is only relevant when logging to eventlog (win32):
#event_source =3D 'PostgreSQL'

# - When to Log -

#client_min_messages =3D notice        # values in order of decreasing =
detail:
                    #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                    #   log
                    #   notice
                    #   warning
                    #   error

#log_min_messages =3D warning        # values in order of decreasing =
detail:
                    #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                    #   info
                    #   notice
                    #   warning
                    #   error
                    #   log
                    #   fatal
                    #   panic

#log_min_error_statement =3D error    # values in order of decreasing =
detail:
                    #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                    #   info
                    #   notice
                    #   warning
                    #   error
                    #   log
                    #   fatal
                    #   panic (effectively off)

#log_min_duration_statement =3D 150    # -1 is disabled, 0 logs all =
statements
                    # and their durations, > 0 logs =
only
                    # statements running at least =
this number
                    # of milliseconds


# - What to Log -

#debug_print_parse =3D off
#debug_print_rewritten =3D off
#debug_print_plan =3D off
#debug_pretty_print =3D on
#log_checkpoints =3D off
log_connections =3D off
log_disconnections =3D off
log_duration =3D off
log_error_verbosity =3D verbose        # terse, default, or verbose =
messages
#log_hostname =3D off
log_line_prefix =3D '%t '            # special values:
                    #   %a =3D application name
                    #   %u =3D user name
                    #   %d =3D database name
                    #   %r =3D remote host and port
                    #   %h =3D remote host
                    #   %p =3D process ID
                    #   %t =3D timestamp without =
milliseconds
                    #   %m =3D timestamp with =
milliseconds
                    #   %i =3D command tag
                    #   %e =3D SQL state
                    #   %c =3D session ID
                    #   %l =3D session line number
                    #   %s =3D session start =
timestamp
                    #   %v =3D virtual transaction =
ID
                    #   %x =3D transaction ID (0 if =
none)
                    #   %q =3D stop here in =
non-session
                    #        processes
                    #   %% =3D '%'
                    # e.g. '<%u%%%d> '
#log_lock_waits =3D off            # log lock waits >=3D =
deadlock_timeout
#log_statement =3D 'none'            # none, ddl, mod, all
#log_temp_files =3D -1            # log temporary files equal or =
larger
                    # than the specified size in =
kilobytes;
                    # -1 disables, 0 logs all temp =
files
log_timezone =3D 'US/Eastern'


=
#-------------------------------------------------------------------------=
-----
# RUNTIME STATISTICS
=
#-------------------------------------------------------------------------=
-----

# - Query/Index Statistics Collector -

#track_activities =3D on
track_counts =3D on
#track_io_timing =3D off
#track_functions =3D none            # none, pl, all
#track_activity_query_size =3D 1024    # (change requires restart)
#update_process_title =3D on
#stats_temp_directory =3D 'pg_stat_tmp'


# - Statistics Monitoring -

#log_parser_stats =3D off
#log_planner_stats =3D off
#log_executor_stats =3D off
#log_statement_stats =3D off


=
#-------------------------------------------------------------------------=
-----
# AUTOVACUUM PARAMETERS
=
#-------------------------------------------------------------------------=
-----

autovacuum =3D on            # Enable autovacuum subprocess?  =
'on'
                    # requires track_counts to also =
be on.
#log_autovacuum_min_duration =3D -1    # -1 disables, 0 logs all =
actions and
                    # their durations, > 0 logs only
                    # actions running at least this =
number
                    # of milliseconds.
#autovacuum_max_workers =3D 3        # max number of autovacuum =
subprocesses
                    # (change requires restart)
#autovacuum_naptime =3D 1min        # time between autovacuum runs
#autovacuum_vacuum_threshold =3D 50    # min number of row updates =
before
                    # vacuum
#autovacuum_analyze_threshold =3D 50    # min number of row updates =
before
                    # analyze
#autovacuum_vacuum_scale_factor =3D 0.2    # fraction of table size before =
vacuum
#autovacuum_analyze_scale_factor =3D 0.1    # fraction of table size =
before analyze
#autovacuum_freeze_max_age =3D 200000000    # maximum XID age before =
forced vacuum
                    # (change requires restart)
#autovacuum_multixact_freeze_max_age =3D 400000000    # maximum =
multixact age
                    # before forced vacuum
                    # (change requires restart)
autovacuum_vacuum_cost_delay =3D 50ms    # default vacuum cost delay for
                    # autovacuum, in milliseconds;
                    # -1 means use vacuum_cost_delay
#autovacuum_vacuum_cost_limit =3D -1    # default vacuum cost limit for
                    # autovacuum, -1 means use
                    # vacuum_cost_limit


=
#-------------------------------------------------------------------------=
-----
# CLIENT CONNECTION DEFAULTS
=
#-------------------------------------------------------------------------=
-----

# - Statement Behavior -

#search_path =3D '"$user",public'        # schema names
#default_tablespace =3D ''        # a tablespace name, '' uses the =
default
#temp_tablespaces =3D ''            # a list of tablespace =
names, '' uses
                    # only default tablespace
#check_function_bodies =3D on
#default_transaction_isolation =3D 'read committed'
#default_transaction_read_only =3D off
#default_transaction_deferrable =3D off
#session_replication_role =3D 'origin'
#statement_timeout =3D 0            # in milliseconds, 0 is =
disabled
#lock_timeout =3D 0            # in milliseconds, 0 is disabled
#vacuum_freeze_min_age =3D 50000000
#vacuum_freeze_table_age =3D 150000000
#vacuum_multixact_freeze_min_age =3D 5000000
#vacuum_multixact_freeze_table_age =3D 150000000
#bytea_output =3D 'hex'            # hex, escape
#xmlbinary =3D 'base64'
#xmloption =3D 'content'

# - Locale and Formatting -

datestyle =3D 'iso, ymd'
#intervalstyle =3D 'postgres'
timezone =3D 'US/Eastern'
#timezone_abbreviations =3D 'Default'     # Select the set of available =
time zone
                    # abbreviations.  Currently, =
there are
                    #   Default
                    #   Australia (historical usage)
                    #   India
                    # You can create your own file =
in
                    # share/timezonesets/.
#extra_float_digits =3D 0            # min -15, max 3
#client_encoding =3D sql_ascii        # actually, defaults to database
                    # encoding

# These settings are initialized by initdb, but they can be changed.
lc_messages =3D 'French_Canada.1252'            # locale for =
system error message
                    # strings
lc_monetary =3D 'French_Canada.1252'            # locale for =
monetary formatting
lc_numeric =3D 'French_Canada.1252'            # locale for =
number formatting
lc_time =3D 'French_Canada.1252'                # locale =
for time formatting

# default configuration for text search
default_text_search_config =3D 'pg_catalog.french'

# - Other Defaults -

#dynamic_library_path =3D '$libdir'
#local_preload_libraries =3D ''
#session_preload_libraries =3D ''


=
#-------------------------------------------------------------------------=
-----
# LOCK MANAGEMENT
=
#-------------------------------------------------------------------------=
-----

#deadlock_timeout =3D 1s
#max_locks_per_transaction =3D 64        # min 10
                    # (change requires restart)
# Note:  Each lock table slot uses ~270 bytes of shared memory, and =
there are
# max_locks_per_transaction * (max_connections + =
max_prepared_transactions)
# lock table slots.
#max_pred_locks_per_transaction =3D 64    # min 10
                    # (change requires restart)


=
#-------------------------------------------------------------------------=
-----
# VERSION/PLATFORM COMPATIBILITY
=
#-------------------------------------------------------------------------=
-----

# - Previous PostgreSQL Versions -

#array_nulls =3D on
#backslash_quote =3D safe_encoding    # on, off, or safe_encoding
#default_with_oids =3D off
#escape_string_warning =3D on
#lo_compat_privileges =3D off
#quote_all_identifiers =3D off
#sql_inheritance =3D on
#standard_conforming_strings =3D on
#synchronize_seqscans =3D on

# - Other Platforms and Clients -

#transform_null_equals =3D off


=
#-------------------------------------------------------------------------=
-----
# ERROR HANDLING
=
#-------------------------------------------------------------------------=
-----

#exit_on_error =3D off            # terminate session on any =
error?
#restart_after_crash =3D on        # reinitialize after backend =
crash?


=
#-------------------------------------------------------------------------=
-----
# CONFIG FILE INCLUDES
=
#-------------------------------------------------------------------------=
-----

# These options allow settings to be loaded from files other than the
# default postgresql.conf.

#include_dir =3D 'conf.d'            # include files ending =
in '.conf' from
                    # directory 'conf.d'
#include_if_exists =3D 'exists.conf'    # include file only if it exists
#include =3D 'special.conf'        # include file


=
#-------------------------------------------------------------------------=
-----
# CUSTOMIZED OPTIONS
=
#-------------------------------------------------------------------------=
-----

# Add settings for extensions here




> Le 2015-04-25 =E0 08:33, Andres Freund <andres@anarazel.de> a =E9crit =
:
>=20
> Hi,
>=20
> On 2015-04-24 10:10:06 +0000, pdrolet@infodata.ca wrote:
>> The following bug has been logged on the website:
>>=20
>> Bug reference:      13143
>> Logged by:          Patrice Drolet
>> Email address:      pdrolet@infodata.ca
>> PostgreSQL version: 9.4.1
>> Operating system:   Windows 2008r2
>> Description:       =20
>>=20
>> I have experienced it many times. The master streams to the slave for =
days
>> and no problem (using a replication slot). If I stop the master, it =
does not
>> want to restart and I have this error in the log:
>>=20
>> 2015-04-24 04:47:12 EDT LOG:  le syst=E8me de bases de donn=E9es a =
=E9t=E9 arr=EAt=E9 =E0
>> 2015-04-24 04:44:37 EDT
>> 2015-04-24 04:47:12 EDT PANIC:  n'a pas pu synchroniser sur disque =
(fsync)
>> le fichier =AB pg_replslot/node_win2012sec/state =BB : Bad file =
descriptor
>> 2015-04-24 04:47:12 EDT LOG:  processus de lancement (PID 23180) =
quitte avec
>> le code de sortie 3
>> 2015-04-24 04:47:12 EDT LOG:  annulation du d=E9marrage =E0 cause =
d'un =E9chec
>> dans le processus de lancement
>>=20
>> To restart the server, I have to manually delete the folder in =
pg_replslot.
>> But then I need to re build the slave. Not very practical for a multi
>> gigabyte database.=20
>=20
> Obviously that's not how it supposed to be. I don't have access to a
> windows systems, much less a french one unfortunately.
>=20
> Could you:
> 1) describe your exact setup
> 2) Check that it's unrelated to any anti-virus software running?
> 3) configure 'log_error_verbosity =3D verbose'? Then we'll get line
>   numbers, which will help narrowing down what's happening.
> 4) You could try to debug it by installing sysinternal's sysmon and
>   recording what is exactly done with that file?
>=20
> Regards,
>=20
> Andres

Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot

От
Alvaro Herrera
Дата:
Andres Freund wrote:
> On 2015-04-27 11:44:47 -0300, Alvaro Herrera wrote:
> > I think this is failing in the fsync_fname() call in slot.c line 1045
> > (REL9_4_STABLE).
>
> Patrice has since replied with log_error_verbosity=verbose logs, but
> that reply is probably still stuck in moderation:

Ah, sorry about that.  Approved.

> Hm. I wonder if it's maybe that the file is opened with O_RDONLY? The
> OSs I have access to don't care - for good reason imo, fsync isn't a
> write - but it's not inconceivable that windows might.

Ah, fsync_fname() explicitely defends against this.

> So that's easy enough fixed.

Nice.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [SPAM] Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot

От
Patrice Drolet
Дата:
Hi,

On Windows, this is really easy to have this problem : create a replication s=
lot then try to restart the server. The server will refuse to start.=20

I checked the file it was not read only. There is nothing I could do to the f=
ile or the directory to succeed in restarting pg.

I put fsync=3Doff and I could start pg. but this is not a good permanent sol=
ution!

Envoy=C3=A9 de mon iPad

> Le 2015-04-27 =C3=A0 12:00, Alvaro Herrera <alvherre@2ndquadrant.com> a =C3=
=A9crit :
>=20
> Andres Freund wrote:
>>> On 2015-04-27 11:44:47 -0300, Alvaro Herrera wrote:
>>> I think this is failing in the fsync_fname() call in slot.c line 1045
>>> (REL9_4_STABLE).
>>=20
>> Patrice has since replied with log_error_verbosity=3Dverbose logs, but
>> that reply is probably still stuck in moderation:
>=20
> Ah, sorry about that.  Approved.
>=20
>> Hm. I wonder if it's maybe that the file is opened with O_RDONLY? The
>> OSs I have access to don't care - for good reason imo, fsync isn't a
>> write - but it's not inconceivable that windows might.
>=20
> Ah, fsync_fname() explicitely defends against this.
>=20
>> So that's easy enough fixed.
>=20
> Nice.
>=20
> --=20
> =C3=81lvaro Herrera                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [SPAM] Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot

От
Andres Freund
Дата:
Hi,

On 2015-04-27 12:50:32 -0400, Patrice Drolet wrote:
> On Windows, this is really easy to have this problem : create a replication slot then try to restart the server. The
serverwill refuse to start.  
>
> I checked the file it was not read only. There is nothing I could do to the file or the directory to succeed in
restartingpg. 
>
> I put fsync=off and I could start pg. but this is not a good permanent solution!

I've pushed a fix for this. It'll be included in the next 9.4 minor
release.

Thanks for the report!

Andres Freund

Re: [SPAM] BUG #13143: Cannot stop and restart a streaming server with a replication slot

От
Patrice Drolet
Дата:
Hi,

When will it be?=20

Do you want me to test it in my environnement?

Thanks,

Patrice Drolet

> Le 2015-04-27 =C3=A0 18:22, Andres Freund <andres@anarazel.de> a =
=C3=A9crit :
>=20
> Hi,
>=20
> On 2015-04-27 12:50:32 -0400, Patrice Drolet wrote:
>> On Windows, this is really easy to have this problem : create a =
replication slot then try to restart the server. The server will refuse =
to start.=20
>>=20
>> I checked the file it was not read only. There is nothing I could do =
to the file or the directory to succeed in restarting pg.
>>=20
>> I put fsync=3Doff and I could start pg. but this is not a good =
permanent solution!
>=20
> I've pushed a fix for this. It'll be included in the next 9.4 minor
> release.
>=20
> Thanks for the report!
>=20
> Andres Freund

Re: [SPAM] BUG #13143: Cannot stop and restart a streaming server with a replication slot

От
Andres Freund
Дата:
Hi,


On 2015-04-30 09:48:52 -0400, Patrice Drolet wrote:
> > Le 2015-04-27 à 18:22, Andres Freund <andres@anarazel.de> a écrit :
> > I've pushed a fix for this. It'll be included in the next 9.4 minor
> > release.

> When will it be?

I don't know exactly. My *guess* is that it's a couple weeks away.

> Do you want me to test it in my environnement?

That would be good, but unfortunately it'd require compiling postgres
yourself. We don't have autogenerated installers except for releases
unfortunately.

Greetings,

Andres Freund