Re: Use of fsync; was Re: Pg_upgrade speed for many tables

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Use of fsync; was Re: Pg_upgrade speed for many tables
Дата
Msg-id 20121204035032.GA30893@momjian.us
обсуждение исходный текст
Ответ на Re: Use of fsync; was Re: Pg_upgrade speed for many tables  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
Applied.

---------------------------------------------------------------------------

On Fri, Nov 30, 2012 at 10:43:29PM -0500, Bruce Momjian wrote:
> On Mon, Nov 26, 2012 at 02:43:19PM -0500, Bruce Momjian wrote:
> > > >> In any event, I think the documentation should caution that the
> > > >> upgrade should not be deemed to be a success until after a system-wide
> > > >> sync has been done.  Even if we use the link rather than copy method,
> > > >> are we sure that that is safe if the directories recording those links
> > > >> have not been fsynced?
> > > >
> > > > OK, the above is something I have been thinking about, and obviously you
> > > > have too.  If you change fsync from off to on in a cluster, and restart
> > > > it, there is no guarantee that the dirty pages you read from the kernel
> > > > are actually on disk, because Postgres doesn't know they are dirty.
> > > > They probably will be pushed to disk by the kernel in less than one
> > > > minute, but still, it doesn't seem reliable. Should this be documented
> > > > in the fsync section?
> > > >
> > > > Again, another reason not to use fsync=off, though your example of the
> > > > file copy is a good one.  As you stated, this is a problem with the file
> > > > copy/link, independent of how Postgres handles the files.  We can tell
> > > > people to use 'sync' as root on Unix, but what about Windows?
> > > 
> > > I'm pretty sure someone mentioned the way to do that on Windows in
> > > this list in the last few months, but I can't seem to find it.  I
> > > thought it was the initdb fsync thread.
> > 
> > Yep, the code is already in initdb to fsync a directory --- we just need
> > a way for pg_upgrade to access it.
> 
> I have developed the attached patch that does this.  It basically adds
> an --sync-only option to initdb, then turns off all durability in
> pg_upgrade and has pg_upgrade run initdb --sync-only;  this give us
> another nice speedup!
> 
>              ------ SSD ---- -- magnetic ---
>                 git    patch    git    patch
>         1      11.11   11.11   11.10   11.13
>      1000      20.57   19.89   20.72   19.30
>      2000      28.02   25.81   28.50   27.53
>      4000      42.00   43.59   46.71   46.84
>      8000      89.66   74.16   89.10   73.67
>     16000     157.66  135.98  159.97  153.48
>     32000     316.24  296.90  334.74  308.59
>     64000     814.97  715.53  797.34  727.94
> 
> (I am very happy with these times.  Thanks to Jeff Janes for his
> suggestions.)
> 
> I have also added documentation to the 'fsync' configuration variable
> warning about dirty buffers and recommending flushing them to disk
> before the cluster is crash-recovery safe.
> 
> I consider this patch ready for 9.3 application (meaning it is not a
> prototype).
> 
> -- 
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
> 
>   + It's impossible for everything to be true. +

> diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
> new file mode 100644
> index c12f15b..63df529
> *** a/contrib/pg_upgrade/pg_upgrade.c
> --- b/contrib/pg_upgrade/pg_upgrade.c
> *************** main(int argc, char **argv)
> *** 150,155 ****
> --- 150,161 ----
>                 new_cluster.pgdata);
>       check_ok();
>   
> +     prep_status("Sync data directory to disk");
> +     exec_prog(UTILITY_LOG_FILE, NULL, true,
> +               "\"%s/initdb\" --sync-only \"%s\"", new_cluster.bindir,
> +               new_cluster.pgdata);
> +     check_ok();
> + 
>       create_script_for_cluster_analyze(&analyze_script_file_name);
>       create_script_for_old_cluster_deletion(&deletion_script_file_name);
>   
> diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
> new file mode 100644
> index 49d4c8f..05d8cc0
> *** a/contrib/pg_upgrade/server.c
> --- b/contrib/pg_upgrade/server.c
> *************** start_postmaster(ClusterInfo *cluster)
> *** 209,217 ****
>        * a gap of 2000000000 from the current xid counter, so autovacuum will
>        * not touch them.
>        *
> !      *    synchronous_commit=off improves object creation speed, and we only
> !      *    modify the new cluster, so only use it there.  If there is a crash,
> !      *    the new cluster has to be recreated anyway.
>        */
>       snprintf(cmd, sizeof(cmd),
>                "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s%s%s\" start",
> --- 209,217 ----
>        * a gap of 2000000000 from the current xid counter, so autovacuum will
>        * not touch them.
>        *
> !      * Turn off durability requirements to improve object creation speed, and
> !      * we only modify the new cluster, so only use it there.  If there is a
> !      * crash, the new cluster has to be recreated anyway.
>        */
>       snprintf(cmd, sizeof(cmd),
>                "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s%s%s\" start",
> *************** start_postmaster(ClusterInfo *cluster)
> *** 219,225 ****
>                (cluster->controldata.cat_ver >=
>                 BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" :
>                " -c autovacuum=off -c autovacuum_freeze_max_age=2000000000",
> !              (cluster == &new_cluster) ? " -c synchronous_commit=off" : "",
>                cluster->pgopts ? cluster->pgopts : "", socket_string);
>   
>       /*
> --- 219,226 ----
>                (cluster->controldata.cat_ver >=
>                 BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" :
>                " -c autovacuum=off -c autovacuum_freeze_max_age=2000000000",
> !              (cluster == &new_cluster) ?
> !                 " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
>                cluster->pgopts ? cluster->pgopts : "", socket_string);
>   
>       /*
> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> new file mode 100644
> index b56070b..b7df8ce
> *** a/doc/src/sgml/config.sgml
> --- b/doc/src/sgml/config.sgml
> *************** include 'filename'
> *** 1697,1702 ****
> --- 1697,1711 ----
>          </para>
>   
>          <para>
> +         For reliable recovery when changing <varname>fsync</varname>
> +         off to on, it is necessary to force all modified buffers in the
> +         kernel to durable storage.  This can be done while the cluster
> +         is shutdown or while fsync is on by running <command>initdb
> +         --sync-only</command>, running <command>sync</>, unmounting the
> +         file system, or rebooting the server.
> +        </para>
> + 
> +        <para>
>           In many situations, turning off <xref linkend="guc-synchronous-commit">
>           for noncritical transactions can provide much of the potential
>           performance benefit of turning off <varname>fsync</varname>, without
> diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
> new file mode 100644
> index 08ee37e..a1e46eb
> *** a/doc/src/sgml/ref/initdb.sgml
> --- b/doc/src/sgml/ref/initdb.sgml
> *************** PostgreSQL documentation
> *** 245,250 ****
> --- 245,261 ----
>        </varlistentry>
>   
>        <varlistentry>
> +       <term><option>-S</option></term>
> +       <term><option>--sync-only</option></term>
> +       <listitem>
> +        <para>
> +         Safely write all database files to disk and exit.  This does not
> +         perform any of the normal <application>initdb</> operations.
> +        </para>
> +       </listitem>
> +      </varlistentry>
> + 
> +      <varlistentry>
>         <term><option>-T <replaceable>CFG</></option></term>
>         <term><option>--text-search-config=<replaceable>CFG</></option></term>
>         <listitem>
> diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
> new file mode 100644
> index 402504b..8c0a9f4
> *** a/src/bin/initdb/initdb.c
> --- b/src/bin/initdb/initdb.c
> *************** static const char *authmethodlocal = "";
> *** 118,123 ****
> --- 118,124 ----
>   static bool debug = false;
>   static bool noclean = false;
>   static bool do_sync = true;
> + static bool sync_only = false;
>   static bool show_setting = false;
>   static char *xlog_dir = "";
>   
> *************** usage(const char *progname)
> *** 2796,2801 ****
> --- 2797,2803 ----
>       printf(_("  -n, --noclean             do not clean up after errors\n"));
>       printf(_("  -N, --nosync              do not wait for changes to be written safely to disk\n"));
>       printf(_("  -s, --show                show internal settings\n"));
> +     printf(_("  -S, --sync-only           only sync data directory\n"));
>       printf(_("\nOther options:\n"));
>       printf(_("  -V, --version             output version information, then exit\n"));
>       printf(_("  -?, --help                show this help, then exit\n"));
> *************** main(int argc, char *argv[])
> *** 3445,3450 ****
> --- 3447,3453 ----
>           {"show", no_argument, NULL, 's'},
>           {"noclean", no_argument, NULL, 'n'},
>           {"nosync", no_argument, NULL, 'N'},
> +         {"sync-only", no_argument, NULL, 'S'},
>           {"xlogdir", required_argument, NULL, 'X'},
>           {NULL, 0, NULL, 0}
>       };
> *************** main(int argc, char *argv[])
> *** 3476,3482 ****
>   
>       /* process command-line options */
>   
> !     while ((c = getopt_long(argc, argv, "dD:E:L:nNU:WA:sT:X:", long_options, &option_index)) != -1)
>       {
>           switch (c)
>           {
> --- 3479,3485 ----
>   
>       /* process command-line options */
>   
> !     while ((c = getopt_long(argc, argv, "dD:E:L:nNU:WA:sST:X:", long_options, &option_index)) != -1)
>       {
>           switch (c)
>           {
> *************** main(int argc, char *argv[])
> *** 3522,3527 ****
> --- 3525,3533 ----
>               case 'N':
>                   do_sync = false;
>                   break;
> +             case 'S':
> +                 sync_only = true;
> +                 break;
>               case 'L':
>                   share_path = pg_strdup(optarg);
>                   break;
> *************** main(int argc, char *argv[])
> *** 3589,3594 ****
> --- 3595,3608 ----
>           exit(1);
>       }
>   
> +     /* If we only need to fsync, just to it and exit */
> +     if (sync_only)
> +     {
> +         setup_pgdata();
> +         perform_fsync();
> +         return 0;
> +     }
> +     
>       if (pwprompt && pwfilename)
>       {
>           fprintf(stderr, _("%s: password prompt and password file cannot be specified together\n"), progname);

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Glaesemann
Дата:
Сообщение: Re: Tablespaces in the data directory
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: In pg_upgrade, remove 'set -x' from test script.