Обсуждение: Re: Schema-only dump dumps no constraints, no triggers
Hi,
Dnia 28 lipca 2012 1:10 Adrian Klaver <adrian.klaver@gmail.com> napisał(a):
> What where the deleted files?
> WAL, Logs, other?
> What type of WAL replication are you doing?
> Streaming, log shipping, etc?
> What are your settings for the WAL replication?
> In particular wal_keep_segments ?
> Is the WAL replication actually working?
at this time - a couple days after restart, the clog hasn't re-formed yet. Thus, I am unable to tell you what files
theywere, we didn't pay that much attention to it then - there were some WAL files but I can't tell what the actual
structurewas. I'll provide this information whenever possible.
The WAL replication is a streaming replication with a hot standby server. The servers have a direct connection with one
another.Configuration appended.
The replication is working fine.
The primary server also has a single mostly-"idle" transaction from any client node - a trait of an "always online"
clientapplication. Immediately after primary server restart this does not add much overhead, though. I can't tell
whetherthis, in the long run, might cause the problem.
Configuration follows:
- primary server postgresql.conf - WAL section
- standby server postgresql.conf - WAL section
- standby server recovery.conf
The primary server WAL configuration is:
#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------
# - Settings -
wal_level = hot_standby # minimal, archive, or hot_standby
# (change requires restart)
#fsync = on # turns forced synchronization on or off
#synchronous_commit = on # immediate fsync at commit
#wal_sync_method = fsync # the default is the first option
# supported by the operating system:
# open_datasync
# fdatasync (default on Linux)
# fsync
# fsync_writethrough
# open_sync
#full_page_writes = on # recover from partial page writes
wal_buffers = 1MB # min 32kB
# (change requires restart)
#wal_writer_delay = 200ms # 1-10000 milliseconds
#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000
# - Checkpoints -
#checkpoint_segments = 30 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min # range 30s-1h
#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s # 0 disables
# - Archiving -
archive_mode = on # allows archiving to be done
# (change requires restart)
archive_command = 'rsync %p <CUT>/%f' # command to use to archive a logfile segment
archive_timeout = 0 # force a logfile segment switch after this
# number of seconds; 0 disables
# - Streaming Replication -
max_wal_senders = 5 # max number of walsender processes
# (change requires restart)
#wal_sender_delay = 200ms # walsender cycle time, 1-10000 milliseconds
wal_keep_segments = 32 # in logfile segments, 16MB each; 0 disables
#vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed
# - Standby Servers -
#hot_standby = off # "on" allows queries during recovery
# (change requires restart)
#max_standby_archive_delay = 30s # max delay before canceling queries
# when reading WAL from archive;
# -1 allows indefinite delay
#max_standby_streaming_delay = 30s # max delay before canceling queries
# when reading streaming WAL;
# -1 allows indefinite delay
The standby server WAL configuration is:
#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------
# - Settings -
#wal_level = minimal # minimal, archive, or hot_standby
# (change requires restart)
#fsync = on # turns forced synchronization on or off
#synchronous_commit = on # immediate fsync at commit
#wal_sync_method = fsync # the default is the first option
# supported by the operating system:
# open_datasync
# fdatasync (default on Linux)
# fsync
# fsync_writethrough
# open_sync
#full_page_writes = on # recover from partial page writes
#wal_buffers = 64kB # min 32kB
# (change requires restart)
#wal_writer_delay = 200ms # 1-10000 milliseconds
#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000
# - Checkpoints -
#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min # range 30s-1h
#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s # 0 disables
# - Archiving -
#archive_mode = on # allows archiving to be done
# (change requires restart)
#archive_command = 'cp %p /backup/repl/%f' # command to use to archive a logfile segment
#archive_timeout = 0 # force a logfile segment switch after this
# number of seconds; 0 disables
# - Streaming Replication -
#max_wal_senders = 5 # max number of walsender processes
# (change requires restart)
#wal_sender_delay = 200ms # walsender cycle time, 1-10000 milliseconds
#wal_keep_segments = 32 # in logfile segments, 16MB each; 0 disables
#vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed
# - Standby Servers -
hot_standby = on # "on" allows queries during recovery
# (change requires restart)
#max_standby_archive_delay = 30s # max delay before canceling queries
# when reading WAL from archive;
# -1 allows indefinite delay
#max_standby_streaming_delay = 30s # max delay before canceling queries
# when reading streaming WAL;
# -1 allows indefinite delay
The standby server recovery.conf file:
standby_mode = 'on'
primary_conninfo = 'host=<CUT> port=<CUT> user=<CUT>'
trigger_file = '<CUT>/repl_trigger'
restore_command = 'rsync <CUT>/%f "%p"'
Re: Clogging problem (was: Schema-only dump dumps no constraints, no triggers)
От
Marek Kielar
Дата:
Hi, to complement information from the previous message: Dnia 29 lipca 2012 12:29 Marek Kielar <mkielar@go2.pl> napisał(a): > Hi, > > > Dnia 28 lipca 2012 1:10 Adrian Klaver <adrian.klaver@gmail.com> napisał(a): > > > What where the deleted files? > > WAL, Logs, other? > > > at this time - a couple days after restart, the clog hasn't re-formed yet. Thus, I am unable to tell you what files theywere, we didn't pay that much attention to it then - there were some WAL files but I can't tell what the actual structurewas. I'll provide this information whenever possible. The clog has somewhat re-formed - the full listing of lsof (filtered for unique files) for postmaster(s) on the databasemount is here: http://BillionUploads.com/ya9kjv78t9es/postmaster_files_sorted.csv.html Consecutive commands were issued in a matter of minutes and differ slightly. Some totals / aggregates: df – /data 83 141 382 144 du – /data 29 170 365 801 lsof – /data 75 348 037 632 lsof – /data/base 74 975 969 280 lsof – /data/base (deleted) 53 769 936 896 lsof – /data/pg_xlog 369 098 752 lsof – /data/pg_xlog (deleted) 201 326 592 lsof – /data/global 2 965 504 It is clear that the server processes are keeping most of the files from being actually deleted.
On 08/06/2012 05:08 AM, Marek Kielar wrote: > Hi, > > to complement information from the previous message: > > > Dnia 29 lipca 2012 12:29 Marek Kielar <mkielar@go2.pl> napisał(a): > >> Hi, >> >> >> Dnia 28 lipca 2012 1:10 Adrian Klaver <adrian.klaver@gmail.com> napisał(a): >> >>> What where the deleted files? >>> WAL, Logs, other? >> >> >> at this time - a couple days after restart, the clog hasn't re-formed yet. Thus, I am unable to tell you what files theywere, we didn't pay that much attention to it then - there were some WAL files but I can't tell what the actual structurewas. I'll provide this information whenever possible. > > > The clog has somewhat re-formed - the full listing of lsof (filtered for unique files) for postmaster(s) on the databasemount is here: > http://BillionUploads.com/ya9kjv78t9es/postmaster_files_sorted.csv.html FYI you might to consider using some other site for uploads. The above is sort of scary and leads you down all sorts of false paths. > > Consecutive commands were issued in a matter of minutes and differ slightly. > > Some totals / aggregates: > df – /data 83 141 382 144 > du – /data 29 170 365 801 > lsof – /data 75 348 037 632 > lsof – /data/base 74 975 969 280 > lsof – /data/base (deleted) 53 769 936 896 > lsof – /data/pg_xlog 369 098 752 > lsof – /data/pg_xlog (deleted) 201 326 592 > lsof – /data/global 2 965 504 > > It is clear that the server processes are keeping most of the files from being actually deleted. Well the nature of database data files is they expand and/or contract as needed. Unless you are getting rid of the actual object they refer to they will not be deleted. The files WAL files in pg_xlog are a different matter, but in the listing you sent they seem to be reasonable. There are a couple of things off the top of my head that can cause data files to expand unnecessarily: 1) Autovacuum is not aggressive enough. 2) There are open transactions keeping old tuples from being removed. From previous posts, you mentioned a 'permanent' connection to the database. Are you sure it is not holding an open transaction? The pg_locks view would be a good place to start: http://www.postgresql.org/docs/9.1/interactive/view-pg-locks.html > > -- Adrian Klaver adrian.klaver@gmail.com
Dnia 6 sierpnia 2012 17:00 Adrian Klaver <adrian.klaver@gmail.com> napisał(a): > > The clog has somewhat re-formed - the full listing of lsof (filtered for unique files) for postmaster(s) on the databasemount is here: > > http://BillionUploads.com/ya9kjv78t9es/postmaster_files_sorted.csv.html > > FYI you might to consider using some other site for uploads. The above > is sort of scary and leads you down all sorts of false paths. > Sorry about that, it's the first time I had used a hosting service and they didn't require creating an account (as in a liston Wikipedia). I guess using NoScript spoils with saving from trouble, but makes one come at wrong assumptions. I'lltry to choose better next time. > > > > Consecutive commands were issued in a matter of minutes and differ slightly. > > > > Some totals / aggregates: > > df – /data 83 141 382 144 > > du – /data 29 170 365 801 > > lsof – /data 75 348 037 632 > > lsof – /data/base 74 975 969 280 > > lsof – /data/base (deleted) 53 769 936 896 > > lsof – /data/pg_xlog 369 098 752 > > lsof – /data/pg_xlog (deleted) 201 326 592 > > lsof – /data/global 2 965 504 > > > > It is clear that the server processes are keeping most of the files from being actually deleted. > > Well the nature of database data files is they expand and/or contract as > needed. Unless you are getting rid of the actual object they refer to > they will not be deleted. The files WAL files in pg_xlog are a different > matter, but in the listing you sent they seem to be reasonable. There > are a couple of things off the top of my head that can cause data files > to expand unnecessarily: > 1) Autovacuum is not aggressive enough. > 2) There are open transactions keeping old tuples from being removed. > > From previous posts, you mentioned a 'permanent' connection to the > database. Are you sure it is not holding an open transaction? > The pg_locks view would be a good place to start: > http://www.postgresql.org/docs/9.1/interactive/view-pg-locks.html > 1) Running, through pgAdmin3, an ordinary VACUUM FULL ANALYZE and REINDEX on all our databases plus the "postgres" databasethat are in the cluster, didn't release a substantial amount of disk space - it might add up to maybe a few percentof the overall. 2) It doesn't seem there are any long-running transactions even though the PIDs do repeat during some time (but since theconnections are kept open this seems reasonable): postgres=# SELECT * FROM pg_locks ORDER BY pid; locktype;database;relation;page;tuple;virtualxid;transactionid;classid;objid;objsubid;virtualtransaction;pid;mode;granted relation;11874;10985;;;;;;;;85/101738;24367;AccessShareLock;t virtualxid;;;;;85/101738;;;;;85/101738;24367;ExclusiveLock;t virtualxid;;;;;20/788838;;;;;20/788838;24505;ExclusiveLock;t virtualxid;;;;;14/923780;;;;;14/923780;24621;ExclusiveLock;t virtualxid;;;;;76/139304;;;;;76/139304;24699;ExclusiveLock;t virtualxid;;;;;55/199999;;;;;55/199999;24703;ExclusiveLock;t virtualxid;;;;;59/363780;;;;;59/363780;24926;ExclusiveLock;t (7 rows) And after some time with a different invocation of psql (to let go of the PID): postgres=# SELECT * FROM pg_locks ORDER BY pid; locktype;database;relation;page;tuple;virtualxid;transactionid;classid;objid;objsubid;virtualtransaction;pid;mode;granted virtualxid;;;;;56/410614;;;;;56/410614;25105;ExclusiveLock;t virtualxid;;;;;3/667499;;;;;3/667499;25145;ExclusiveLock;t relation;11874;10985;;;;;;;;85/101817;25171;AccessShareLock;t virtualxid;;;;;85/101817;;;;;85/101817;25171;ExclusiveLock;t (4 rows) We are again approaching slowly the point that the server restart will be needed. If / when this happens, I'll provide statisticsagain. Best regards, Marek Kielar