Обсуждение: Count backend self-sync calls

Поиск
Список
Период
Сортировка

Count backend self-sync calls

От
Greg Smith
Дата:
The attached patch adds a new field to pg_stat_bgwriter, counting the
number of times backends execute their own fsync calls.  Normally, when
a backend needs to fsync data, it passes a request to the background
writer, which then absorbs the call into its own queue of work to do.
However, under some types of heavy system load, the associated queue can
fill.  When this happens, backends are forced to do their own fsync
call.  This is potentially much worse than when they do a regular write.

The really nasty situation is when the background writer is busy because
it's executing a checkpoint.  In that case, it's possible for the
backend fsync calls to start competing with the ones the background
writer is trying to get done, causing the checkpoint sync phase to
execute slower than it should.  I've seen the sync phase take over 45
minutes on a really busy server once it got into this condition, where
hundreds of clients doing their own backend fsync calls were fighting
against the checkpoint fsync work.  With this patch, you can observe
that happening as an upwards spike in
pg_stat_bgwriter.buffers_backend_sync, which as documented is an
inclusive subset of the total shown in buffers_backend.

While it takes a busier system than I can useful show how to simulate
here to show a really bad situation, I'm able to see some of these
unabsorbed backend fsync calls when initializing a pgbench database, to
prove they happen in the lab.  The attached test program takes as its
input a pgbench scale counter.  It then creates a pgbench database
(deleting any existing pgbench database, so watch out for that) and
shows the values accumulated in pg_stat_bgwriter during that period.
Here's an example, using the script's default scale of 100 on a server
with 8GB of RAM and fake fsync (the hard drives are lying about it):

-[ RECORD 1 ]--------+-----------------------------
now                  | 2010-11-14 16:08:41.36421-05
...
Initializing pgbench
-[ RECORD 1 ]--------+------------------------------
now                  | 2010-11-14 16:09:46.713693-05
checkpoints_timed    | 0
checkpoints_req      | 0
buffers_checkpoint   | 0
buffers_clean        | 0
maxwritten_clean     | 0
buffers_backend      | 654716
buffers_backend_sync | 90
buffers_alloc        | 803

This is with default sizing for memory structures.  As you increase
shared_buffers, one of the queues involved here increases
proportionately, making it less likely to run into this problem.  That
just changes it to the kind of problem I've only seen on a larger system
with a difficult to simulate workload.  The production system getting
hammered with this problem (running a web application) that prompted
writing the patch had shared_buffers=4GB at the time.

The patch also adds some logging to the innards involved here, to help
with understanding problems in this area.  I don't think that should be
in the version committed as is.  May want to drop the logging level or
make it disabled in regular builds, since it is sitting somewhere it
generates a lot of log data and adds overhead.  It is nice for now, as
it lets you get an idea how much fsync work *is* being absorbed by the
BGW, as well as showing what relation is suffering from this issue.
Example of both those things, with the default config for everything
except log_checkpoints (on) and log_min_messages (debug1):

DEBUG:  Absorbing 4096 fsync requests
DEBUG:  Absorbing 150 fsync requests
DEBUG:  Unable to forward fsync request, executing directly
CONTEXT:  writing block 158638 of relation base/16385/16398

Here 4096 is the most entries the BGW will ever absorb at once, and all
90 of the missed sync calls are logged so you can see what files they
came from.

As a high-level commentary about this patch, I'm not sure what most end
users will ever do with this data.  At the same time, I wasn't sure what
a typical end user would do with anything else in pg_stat_bgwriter
either when it was added, and it turns out the answer is "wait for
people who know the internals to write things that monitor it".  For
example, Magnus has updated recent versions of the Munin plug-in for
PostgreSQL to usefully graph pg_stat_bgwriter data over time.  As I have
some data to suggest checkpoint problems on Linux in particular are
getting worse as total system memory increases, I expect that having a
way to easily instrument for this particular problem will be
correspondingly more important in the future too.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us


diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dbf966b..f701b8f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -264,8 +264,10 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
       and the number of times the background writer stopped a cleaning scan
       because it had written too many buffers.  Also includes
       statistics about the shared buffer pool, including buffers written
-      by backends (that is, not by the background writer) and total buffers
-      allocated.
+      by backends (that is, not by the background writer), how many times
+      those backends had to execute their own filesystem synchronize calls
+      (normally the background writer handles those even when the backend
+      writes itself), and total buffers allocated.
      </entry>
      </row>

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 09574c3..8766ccc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -537,6 +537,7 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean,
         pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean,
         pg_stat_get_buf_written_backend() AS buffers_backend,
+        pg_stat_get_buf_sync_backend() AS buffers_backend_sync,
         pg_stat_get_buf_alloc() AS buffers_alloc;

 CREATE VIEW pg_user_mappings AS
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 0690ab5..43a149e 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -102,13 +102,15 @@
  *
  * num_backend_writes is used to count the number of buffer writes performed
  * by non-bgwriter processes.  This counter should be wide enough that it
- * can't overflow during a single bgwriter cycle.
+ * can't overflow during a single bgwriter cycle.  num_backend_fsync
+ * counts the subset of those writes that also had to do their own fsync,
+ * because the background writer failed to absorb their request.
  *
  * The requests array holds fsync requests sent by backends and not yet
  * absorbed by the bgwriter.
  *
- * Unlike the checkpoint fields, num_backend_writes and the requests
- * fields are protected by BgWriterCommLock.
+ * Unlike the checkpoint fields, num_backend_writes, num_backend_fsync, and
+ * the requests fields are protected by BgWriterCommLock.
  *----------
  */
 typedef struct
@@ -132,6 +134,7 @@ typedef struct
     int            ckpt_flags;        /* checkpoint flags, as defined in xlog.h */

     uint32        num_backend_writes;        /* counts non-bgwriter buffer writes */
+    uint32        num_backend_fsync;        /* counts non-bgwriter fsync calls */

     int            num_requests;    /* current # of requests */
     int            max_requests;    /* allocated array size */
@@ -1084,12 +1087,14 @@ ForwardFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,

     LWLockAcquire(BgWriterCommLock, LW_EXCLUSIVE);

-    /* we count non-bgwriter writes even when the request queue overflows */
+    /* Count all backend writes regardless of if they fit in the queue */
     BgWriterShmem->num_backend_writes++;

     if (BgWriterShmem->bgwriter_pid == 0 ||
         BgWriterShmem->num_requests >= BgWriterShmem->max_requests)
     {
+        /* Also count the subset where backends have to do their own fsync */
+        BgWriterShmem->num_backend_fsync++;
         LWLockRelease(BgWriterCommLock);
         return false;
     }
@@ -1137,7 +1142,10 @@ AbsorbFsyncRequests(void)

     /* Transfer write count into pending pgstats message */
     BgWriterStats.m_buf_written_backend += BgWriterShmem->num_backend_writes;
+    BgWriterStats.m_buf_fsync_backend += BgWriterShmem->num_backend_fsync;
+
     BgWriterShmem->num_backend_writes = 0;
+    BgWriterShmem->num_backend_fsync = 0;

     n = BgWriterShmem->num_requests;
     if (n > 0)
@@ -1149,6 +1157,9 @@ AbsorbFsyncRequests(void)

     LWLockRelease(BgWriterCommLock);

+    if (n > 0)
+        elog(DEBUG1,"Absorbing %d fsync requests",n);
+
     for (request = requests; n > 0; request++, n--)
         RememberFsyncRequest(request->rnode, request->forknum, request->segno);

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index a1ad342..c3c136a 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4188,6 +4188,7 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
     globalStats.buf_written_clean += msg->m_buf_written_clean;
     globalStats.maxwritten_clean += msg->m_maxwritten_clean;
     globalStats.buf_written_backend += msg->m_buf_written_backend;
+    globalStats.buf_fsync_backend += msg->m_buf_fsync_backend;
     globalStats.buf_alloc += msg->m_buf_alloc;
 }

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index c72da72..7140b94 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1269,6 +1269,8 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
         if (ForwardFsyncRequest(reln->smgr_rnode, forknum, seg->mdfd_segno))
             return;                /* passed it off successfully */

+        elog(DEBUG1, "Unable to forward fsync request, executing directly");
+
         if (FileSync(seg->mdfd_vfd) < 0)
             ereport(ERROR,
                     (errcode_for_file_access(),
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 6edb8bf..2104b53 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -78,6 +78,7 @@ extern Datum pg_stat_get_bgwriter_buf_written_checkpoints(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_bgwriter_buf_written_clean(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_bgwriter_maxwritten_clean(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_buf_written_backend(PG_FUNCTION_ARGS);
+extern Datum pg_stat_get_buf_sync_backend(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_buf_alloc(PG_FUNCTION_ARGS);

 extern Datum pg_stat_get_xact_numscans(PG_FUNCTION_ARGS);
@@ -1165,6 +1166,12 @@ pg_stat_get_buf_written_backend(PG_FUNCTION_ARGS)
 }

 Datum
+pg_stat_get_buf_sync_backend(PG_FUNCTION_ARGS)
+{
+    PG_RETURN_INT64(pgstat_fetch_global()->buf_fsync_backend);
+}
+
+Datum
 pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 {
     PG_RETURN_INT64(pgstat_fetch_global()->buf_alloc);
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 4f444ae..7fc6f05 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3117,6 +3117,8 @@ DATA(insert OID = 2773 ( pg_stat_get_bgwriter_maxwritten_clean PGNSP PGUID 12 1
 DESCR("statistics: number of times the bgwriter stopped processing when it had written too many buffers while
cleaning");
 DATA(insert OID = 2775 ( pg_stat_get_buf_written_backend PGNSP PGUID 12 1 0 0 f f f t f s 0 0 20 "" _null_ _null_
_null__null_ pg_stat_get_buf_written_backend _null_ _null_ _null_ )); 
 DESCR("statistics: number of buffers written by backends");
+DATA(insert OID = 3063 ( pg_stat_get_buf_sync_backend PGNSP PGUID 12 1 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_
_null_pg_stat_get_buf_sync_backend _null_ _null_ _null_ )); 
+DESCR("statistics: number of backend buffer writtes that did their own sync");
 DATA(insert OID = 2859 ( pg_stat_get_buf_alloc            PGNSP PGUID 12 1 0 0 f f f t f s 0 0 20 "" _null_ _null_
_null__null_ pg_stat_get_buf_alloc _null_ _null_ _null_ )); 
 DESCR("statistics: number of buffer allocations");

diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d49bd80..42bf9c4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -360,6 +360,7 @@ typedef struct PgStat_MsgBgWriter
     PgStat_Counter m_buf_written_clean;
     PgStat_Counter m_maxwritten_clean;
     PgStat_Counter m_buf_written_backend;
+    PgStat_Counter m_buf_fsync_backend;
     PgStat_Counter m_buf_alloc;
 } PgStat_MsgBgWriter;

@@ -562,6 +563,7 @@ typedef struct PgStat_GlobalStats
     PgStat_Counter buf_written_clean;
     PgStat_Counter maxwritten_clean;
     PgStat_Counter buf_written_backend;
+    PgStat_Counter buf_fsync_backend;
     PgStat_Counter buf_alloc;
 } PgStat_GlobalStats;

diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index c7a796a..32cda64 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1293,7 +1293,7 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
  pg_stat_activity            | SELECT s.datid, d.datname, s.procpid, s.usesysid, u.rolname AS usename,
s.application_name,s.client_addr, s.client_port, s.backend_start, s.xact_start, s.query_start, s.waiting,
s.current_queryFROM pg_database d, pg_stat_get_activity(NULL::integer) s(datid, procpid, usesysid, application_name,
current_query,waiting, xact_start, query_start, backend_start, client_addr, client_port), pg_authid u WHERE ((s.datid =
d.oid)AND (s.usesysid = u.oid)); 
  pg_stat_all_indexes         | SELECT c.oid AS relid, i.oid AS indexrelid, n.nspname AS schemaname, c.relname,
i.relnameAS indexrelname, pg_stat_get_numscans(i.oid) AS idx_scan, pg_stat_get_tuples_returned(i.oid) AS idx_tup_read,
pg_stat_get_tuples_fetched(i.oid)AS idx_tup_fetch FROM (((pg_class c JOIN pg_index x ON ((c.oid = x.indrelid))) JOIN
pg_classi ON ((i.oid = x.indexrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY
(ARRAY['r'::"char",'t'::"char"])); 
  pg_stat_all_tables          | SELECT c.oid AS relid, n.nspname AS schemaname, c.relname, pg_stat_get_numscans(c.oid)
ASseq_scan, pg_stat_get_tuples_returned(c.oid) AS seq_tup_read, (sum(pg_stat_get_numscans(i.indexrelid)))::bigint AS
idx_scan,((sum(pg_stat_get_tuples_fetched(i.indexrelid)))::bigint + pg_stat_get_tuples_fetched(c.oid)) AS
idx_tup_fetch,pg_stat_get_tuples_inserted(c.oid) AS n_tup_ins, pg_stat_get_tuples_updated(c.oid) AS n_tup_upd,
pg_stat_get_tuples_deleted(c.oid)AS n_tup_del, pg_stat_get_tuples_hot_updated(c.oid) AS n_tup_hot_upd,
pg_stat_get_live_tuples(c.oid)AS n_live_tup, pg_stat_get_dead_tuples(c.oid) AS n_dead_tup,
pg_stat_get_last_vacuum_time(c.oid)AS last_vacuum, pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum,
pg_stat_get_last_analyze_time(c.oid)AS last_analyze, pg_stat_get_last_autoanalyze_time(c.oid) AS last_autoanalyze,
pg_stat_get_vacuum_count(c.oid)AS vacuum_count, pg_stat_get_autovacuum_count(c.oid) AS autovacuum_count,
pg_stat_get_analyze_count(c.oid)AS analyze_count, pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count FROM
((pg_classc LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
WHERE(c.relkind = ANY (ARRAY['r'::"char", 't'::"char"])) GROUP BY c.oid, n.nspname, c.relname; 
- pg_stat_bgwriter            | SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
pg_stat_get_bgwriter_requested_checkpoints()AS checkpoints_req, pg_stat_get_bgwriter_buf_written_checkpoints() AS
buffers_checkpoint,pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean()
ASmaxwritten_clean, pg_stat_get_buf_written_backend() AS buffers_backend, pg_stat_get_buf_alloc() AS buffers_alloc; 
+ pg_stat_bgwriter            | SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
pg_stat_get_bgwriter_requested_checkpoints()AS checkpoints_req, pg_stat_get_bgwriter_buf_written_checkpoints() AS
buffers_checkpoint,pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean()
ASmaxwritten_clean, pg_stat_get_buf_written_backend() AS buffers_backend, pg_stat_get_buf_sync_backend() AS
buffers_backend_sync,pg_stat_get_buf_alloc() AS buffers_alloc; 
  pg_stat_database            | SELECT d.oid AS datid, d.datname, pg_stat_get_db_numbackends(d.oid) AS numbackends,
pg_stat_get_db_xact_commit(d.oid)AS xact_commit, pg_stat_get_db_xact_rollback(d.oid) AS xact_rollback,
(pg_stat_get_db_blocks_fetched(d.oid)- pg_stat_get_db_blocks_hit(d.oid)) AS blks_read, pg_stat_get_db_blocks_hit(d.oid)
ASblks_hit, pg_stat_get_db_tuples_returned(d.oid) AS tup_returned, pg_stat_get_db_tuples_fetched(d.oid) AS tup_fetched,
pg_stat_get_db_tuples_inserted(d.oid)AS tup_inserted, pg_stat_get_db_tuples_updated(d.oid) AS tup_updated,
pg_stat_get_db_tuples_deleted(d.oid)AS tup_deleted FROM pg_database d; 
  pg_stat_sys_indexes         | SELECT pg_stat_all_indexes.relid, pg_stat_all_indexes.indexrelid,
pg_stat_all_indexes.schemaname,pg_stat_all_indexes.relname, pg_stat_all_indexes.indexrelname,
pg_stat_all_indexes.idx_scan,pg_stat_all_indexes.idx_tup_read, pg_stat_all_indexes.idx_tup_fetch FROM
pg_stat_all_indexesWHERE ((pg_stat_all_indexes.schemaname = ANY (ARRAY['pg_catalog'::name,
'information_schema'::name]))OR (pg_stat_all_indexes.schemaname ~ '^pg_toast'::text)); 
  pg_stat_sys_tables          | SELECT pg_stat_all_tables.relid, pg_stat_all_tables.schemaname,
pg_stat_all_tables.relname,pg_stat_all_tables.seq_scan, pg_stat_all_tables.seq_tup_read, pg_stat_all_tables.idx_scan,
pg_stat_all_tables.idx_tup_fetch,pg_stat_all_tables.n_tup_ins, pg_stat_all_tables.n_tup_upd,
pg_stat_all_tables.n_tup_del,pg_stat_all_tables.n_tup_hot_upd, pg_stat_all_tables.n_live_tup,
pg_stat_all_tables.n_dead_tup,pg_stat_all_tables.last_vacuum, pg_stat_all_tables.last_autovacuum,
pg_stat_all_tables.last_analyze,pg_stat_all_tables.last_autoanalyze, pg_stat_all_tables.vacuum_count,
pg_stat_all_tables.autovacuum_count,pg_stat_all_tables.analyze_count, pg_stat_all_tables.autoanalyze_count FROM
pg_stat_all_tablesWHERE ((pg_stat_all_tables.schemaname = ANY (ARRAY['pg_catalog'::name, 'information_schema'::name]))
OR(pg_stat_all_tables.schemaname ~ '^pg_toast'::text)); 

Вложения

Re: Count backend self-sync calls

От
Robert Haas
Дата:
On Sun, Nov 14, 2010 at 5:07 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> The patch also adds some logging to the innards involved here, to help with
> understanding problems in this area.  I don't think that should be in the
> version committed as is.  May want to drop the logging level or make it
> disabled in regular builds, since it is sitting somewhere it generates a lot
> of log data and adds overhead.

I think this one could be removed:

+    if (n > 0)
+        elog(DEBUG1,"Absorbing %d fsync requests",n);

But if this is generating a lot of log data or adding a lot of
overhead, then you have bigger problems anyway:

+        elog(DEBUG1, "Unable to forward fsync request, executing directly");

I'm inclined to change that to an ereport(), but otherwise it seems
OK.  Also, how about referring to this as buffers_backend_fsync
consistently throughout, instead of dropping the "f" in some places?

With those changes, I think this is committable, and will do so,
barring objections.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Count backend self-sync calls

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> With those changes, I think this is committable, and will do so,
> barring objections.

Obey message style guidelines, please, especially if you're going
to promote any of those to ereports.
        regards, tom lane


Re: Count backend self-sync calls

От
Robert Haas
Дата:
On Sun, Nov 14, 2010 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> With those changes, I think this is committable, and will do so,
>> barring objections.
>
> Obey message style guidelines, please, especially if you're going
> to promote any of those to ereports.

The only new message would be the one Greg has as:

Unable to forward fsync request, executing directly

For that, we could just go with:

could not forward fsync request

(Lower case, avoid use of unable, telegram style with program as
implicit subject.)

It might be even better to mention that the reason why we couldn't
forward the fsync request is that the fsync request queue is full.
I'm not sure exactly how to phrase that.  I thought about:

fsync request queue is full

But that seems not to answer the "so what" question.  There is an
example like this in the docs:

could not forward fsync request (fsync request queue is full)

...but I'm not sure I like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Count backend self-sync calls

От
Greg Smith
Дата:
Robert Haas wrote:
> I think this one could be removed:
>
> +    if (n > 0)
> +        elog(DEBUG1,"Absorbing %d fsync requests",n);
>   

Right; that one is just there to let you know the patch is working, and 
how much the background writer does for you per pass, mainly for the 
purpose of reviewing the patch.


> But if this is generating a lot of log data or adding a lot of
> overhead, then you have bigger problems anyway:
>
> +        elog(DEBUG1, "Unable to forward fsync request, executing directly");
>   

The argument against this log line even existing is that if the field is 
added to pg_stat_bgwriter, that's probably how you want to monitor this 
data anyway.  I don't think there's actually much value to it, which is 
one reason I didn't worry too much about matching style guidelines, 
translation, etc.  You've found the two things that I think both need to 
go away before commit, but I left them in because I think they're 
valuable for testing the patch itself does what it's supposed to.

> Also, how about referring to this as buffers_backend_fsync
> consistently throughout, instead of dropping the "f" in some places?
>   

I started out touching code that called it just "sync", but then crossed 
to other code that called it "fsync", and made the external UI use that 
name.  No objections to sorting that out within my patch so it's consistent.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us




Re: Count backend self-sync calls

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> It might be even better to mention that the reason why we couldn't
> forward the fsync request is that the fsync request queue is full.
> I'm not sure exactly how to phrase that.  I thought about:

> fsync request queue is full

> But that seems not to answer the "so what" question.  There is an
> example like this in the docs:

> could not forward fsync request (fsync request queue is full)

> ...but I'm not sure I like that.

Well, that example is meant to cover cases where you have to assemble a
couple of independently created phrases.  In this case I'd suggestcould not forward fsync request because request queue
isfull
 
or, if you think there might sometime be a need to have a strerror
variant, iecould not forward fsync request: %m
then maybe this would make the most sense:could not forward fsync request: request queue is full
        regards, tom lane


Re: Count backend self-sync calls

От
Robert Haas
Дата:
On Sun, Nov 14, 2010 at 7:19 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>> But if this is generating a lot of log data or adding a lot of
>> overhead, then you have bigger problems anyway:
>>
>> +               elog(DEBUG1, "Unable to forward fsync request, executing
>> directly");
>>
>
> The argument against this log line even existing is that if the field is
> added to pg_stat_bgwriter, that's probably how you want to monitor this data
> anyway.

I'll remove it if you really want it gone, but personally I think it's
useful to have.  I've more than once had to debug a problem given a
PostgreSQL log file with the debug level cranked up and not a whole
lot else.  Rare events that cause performance to tank are worth
logging, IMHO.

> I started out touching code that called it just "sync", but then crossed to
> other code that called it "fsync", and made the external UI use that name.
>  No objections to sorting that out within my patch so it's consistent.

OK, I'll do that before I commit it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Count backend self-sync calls

От
Robert Haas
Дата:
On Sun, Nov 14, 2010 at 7:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> It might be even better to mention that the reason why we couldn't
>> forward the fsync request is that the fsync request queue is full.
>> I'm not sure exactly how to phrase that.  I thought about:
>
>> fsync request queue is full
>
>> But that seems not to answer the "so what" question.  There is an
>> example like this in the docs:
>
>> could not forward fsync request (fsync request queue is full)
>
>> ...but I'm not sure I like that.
>
> Well, that example is meant to cover cases where you have to assemble a
> couple of independently created phrases.  In this case I'd suggest
>        could not forward fsync request because request queue is full

Sounds good to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Count backend self-sync calls

От
Robert Haas
Дата:
On Sun, Nov 14, 2010 at 8:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Nov 14, 2010 at 7:19 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>>> But if this is generating a lot of log data or adding a lot of
>>> overhead, then you have bigger problems anyway:
>>>
>>> +               elog(DEBUG1, "Unable to forward fsync request, executing
>>> directly");
>>>
>>
>> The argument against this log line even existing is that if the field is
>> added to pg_stat_bgwriter, that's probably how you want to monitor this data
>> anyway.
>
> I'll remove it if you really want it gone, but personally I think it's
> useful to have.  I've more than once had to debug a problem given a
> PostgreSQL log file with the debug level cranked up and not a whole
> lot else.  Rare events that cause performance to tank are worth
> logging, IMHO.
>
>> I started out touching code that called it just "sync", but then crossed to
>> other code that called it "fsync", and made the external UI use that name.
>>  No objections to sorting that out within my patch so it's consistent.
>
> OK, I'll do that before I commit it.

Committed with (I think) all the changes discussed, plus a catversion bump.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Count backend self-sync calls

От
Jeff Janes
Дата:
On Sun, Nov 14, 2010 at 2:07 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> The attached patch adds a new field to pg_stat_bgwriter, counting the number
> of times backends execute their own fsync calls.  Normally, when a backend
> needs to fsync data, it passes a request to the background writer, which
> then absorbs the call into its own queue of work to do.  However, under some
> types of heavy system load, the associated queue can fill.  When this
> happens, backends are forced to do their own fsync call.  This is
> potentially much worse than when they do a regular write.
>
> The really nasty situation is when the background writer is busy because
> it's executing a checkpoint.  In that case, it's possible for the backend
> fsync calls to start competing with the ones the background writer is trying
> to get done,


Do you know where this competition is happening?  Is it on the
platters, or is it in the hard drive write cache (I thought high-end
hardware had tagged writes to avoid that), or in the kernel?

...
>
> DEBUG:  Absorbing 4096 fsync requests
> DEBUG:  Absorbing 150 fsync requests
> DEBUG:  Unable to forward fsync request, executing directly
> CONTEXT:  writing block 158638 of relation base/16385/16398
>
> Here 4096 is the most entries the BGW will ever absorb at once, and all 90
> of the missed sync calls are logged so you can see what files they came
> from.

Looking in src/backend/postmaster/bgwriter.c line 1071:
* Note: we presently make no attempt to eliminate duplicate requests* in the requests[] queue.  The bgwriter will have
toeliminate dups* internally anyway, so we may as well avoid holding the lock longer* than we have to here. 

This makes sense if we just need to append to a queue.  But once the
queue is full and we are about to do a backend fsync, might it make
sense to do a little more work to look for dups?

Cheers,

Jeff


Re: Count backend self-sync calls

От
Greg Smith
Дата:
Jeff Janes wrote:
> Do you know where this competition is happening?  Is it on the
> platters, or is it in the hard drive write cache (I thought high-end
> hardware had tagged writes to avoid that), or in the kernel?
>   

Kernel.  Linux systems with lots of memory will happily queue up 
gigabytes of memory in their write cache, only getting serious about 
writing it out to disk when demanded to by fsync.


> This makes sense if we just need to append to a queue.  But once the
> queue is full and we are about to do a backend fsync, might it make
> sense to do a little more work to look for dups?
>   

One of the paths I'd like to follow is experimenting with both sorting 
writes by file and looking for duplication in the queues.  I think a 
basic, simple sync spreading approach needs to get finished first 
through; this sort of thing would then be an optimization on top of it.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us