RE: Disable WAL logging to speed up data loading

Поиск
Список
Период
Сортировка
От tsunakawa.takay@fujitsu.com
Тема RE: Disable WAL logging to speed up data loading
Дата
Msg-id TYAPR01MB2990CB7F6AF30EA037B8A19EFE170@TYAPR01MB2990.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на RE: Disable WAL logging to speed up data loading  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Список pgsql-hackers
From: Osumi, Takamichi/大墨 昂道 <osumi.takamichi@fujitsu.com>
> I wrote and attached the first patch to disable WAL logging.
> This patch passes the regression test of check-world already and is formatted

I think make check-world uses the default setting for wal_level.  You need to set wal_level = none and do make
installcheck-world.


> I compared two wal_levels both 'minimal' and 'none'.
> For both levels, I measured
> (1) cluster's restore from pg_dumpall output,
> (2) COPY initial data to a defined table as initial data loading, and
> (3) COPY data to a defined table with tuples, 3 times each for all cases.
> After that, calculated the average and the ratio of the loading speed.
> The conclusion is that wal_level=none cuts about 20% of the loading speed
> compared to 'minimal' in the three cases above.

Hmm.  I wonder why pg_dumpall's performance increases by as much as 20%.  On the contrary to my previous hope,
pg_dumpalluses COPY to restore data, so it doesn't emit WAL when wal_level = minimal.  (Is it brought by the difference
ofwhether DDL's WAL is emitted or not?) 


> Sharing other scenario to measure is welcome.

How about running multiple concurrent data loading sessions when adding data to existing tables with data, so that WAL
isthe bottleneck?  That's the use case of the target customer, isn't it? 



> The input data was generated from pgbench with 1000 scale factor.
> It's about 9.3GB. For the table definition or the initial data for appended data

IIRC, I thought the scale factor of 1,000 is 1.5 GB.  What displayed 9.3 GB?  SELECT pg_database_size() or something?


Below are review comments:


(1)
@@ -449,6 +449,13 @@ XLogInsert(RmgrId rmid, uint8 info)
         return EndPos;
     }

+    /* Issues WAL only for transaction end and check point */
+    if (wal_level == WAL_LEVEL_NONE && rmid != RM_XLOG_ID)
+    {
+        XLogResetInsertion();
+        return GetLatestCheckPointLSN();
+    }
+
     do

This does not emit transaction completion WAL records.  Their RM ID is RM_XACT_ID.  Also, RM_XLOG_ID includes other
kindsof WAL records than the checkpoint WAL record.  Correct the comment accordingly.  I don't have a good idea on how
torepresent the RM_XLOG_ID,, but the following might help: 

[rmgrlist.h]
/* symbol name, textual name, redo, desc, identify, startup, cleanup */
PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL, NULL, NULL)
PG_RMGR(RM_XACT_ID, "Transaction", xact_redo, xact_desc, xact_identify, NULL, NULL, NULL)


(2)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
@@ -2591,10 +2591,10 @@ include_dir 'conf.d'
         data to support WAL archiving and replication, including running
         read-only queries on a standby server. <literal>minimal</literal> removes all
         logging except the information required to recover from a crash or
-        immediate shutdown.  Finally,
+        immediate shutdown.  <literal>none</literal> generates no WAL in any case. Finally,

According to the previous code, "none" emits some kinds of WAL records.  So I think we need to come up with a good name
and/ordescription. 



(3)
         <literal>logical</literal> adds information necessary to support logical
-        decoding.  Each level includes the information logged at all lower
-        levels.  This parameter can only be set at server start.
+        decoding.  Each level except for <literal>none</literal> includes the
+        information logged at all lower levels.  This parameter can only be set at server start.
        </para>

Why is this change necessary?


(4)
+        On the other hand, an unexpected crash of the server makes the database cluster
+        inconsistent. For that reason, before utilizing this level, get a full backup of the cluster and
+        backup of the entire operations that are done under the condition that
+        <varname>wal_level</varname> is <literal>none</literal>.

This gives the impression that the user can start the database server and see inconsistent data.  The reality is that
thedatabase server does not start, isn't it? 


(5)
@@ -1751,7 +1752,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
     Aside from avoiding the time for the archiver or WAL sender to process the
     WAL data, doing this will actually make certain commands faster, because
     they do not to write WAL at all if <varname>wal_level</varname>
-    is <literal>minimal</literal> and the current subtransaction (or top-level
+    is either <literal>minimal</literal> or <literal>minimal</literal>
+    and the current subtransaction (or top-level
     transaction) created or truncated the table or index they change.  (They
     can guarantee crash safety more cheaply by doing
     an <function>fsync</function> at the end than by writing WAL.)

This is not correct.  In minimal, some conditions need to hold true for WAL to not be generated as described above.
OTOH,wal_level = none does not generate WAL unconditionally. 


(6)
14.4.9. Some Notes about pg_dump
...
If using WAL archiving or streaming replication, consider disabling them during the restore. To do that, set
archive_modeto off, wal_level to minimal, and max_wal_senders to zero before loading the dump. Afterwards, set them
backto the right values and take a fresh base backup. 

Why don't you refer to wal_level = none here as well?


(7)
@@ -918,10 +918,13 @@ PostmasterMain(int argc, char *argv[])
                      ReservedBackends, MaxConnections);
         ExitPostmaster(1);
     }
-    if (XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == WAL_LEVEL_MINIMAL)
+    if ((XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == WAL_LEVEL_NONE) ||
+        (XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == WAL_LEVEL_MINIMAL))
         ereport(ERROR,
-                (errmsg("WAL archival cannot be enabled when wal_level is \"minimal\"")));
-    if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
+                (errmsg("WAL archival cannot be enabled when wal_level is \"%s\"",
+                        wal_level == WAL_LEVEL_MINIMAL ? "minimal" : "none")));
+    if ((max_wal_senders > 0 && wal_level == WAL_LEVEL_NONE) ||
+        (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL))
         ereport(ERROR,
                 (errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));

This style of writing conitions is redundant.  You can just change the code to wal_level <= WAL_LEVEL_MINIMAL.
Also, the first message can be '"minimal" or "none"' like the second one.


(8)
@@ -989,6 +992,12 @@ PostmasterMain(int argc, char *argv[])
     LocalProcessControlFile(false);

     /*
+     * Check some conditions specific to wal_level='none' and ensures the
+     * database isn't inconsistent.
+     */
+    SafelyFinishedNoneLevel();
+
+    /*

This check should be moved to around the beginning of StartupXLOG().  PostmasterMain() is called in multi-user mode.
It'snot called in single-user mode (postgres --single). 
The new function is not necessary.  StartupXLOG() can see the control file contents directly.



(9)
+    /*
+     * Detect if we previously crashed under wal_level='none' or not.
+     */
+    unexpected_shutdown = ControlFile->state != DB_SHUTDOWNED &&
+        ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY;
+    if ((ControlFile->wal_level == WAL_LEVEL_NONE && unexpected_shutdown))
+    {
+        ereport(ERROR,
+                (errmsg("Server was unexpectedly shut down when WAL logging was disabled"),
+                 errhint("It looks like you need to deploy a new cluster from your full backup again.")));
+    }
+}

Refine the message according to the following message guideline.  For example, the primary message has to start with a
lowercase letter. 

https://www.postgresql.org/docs/devel/source.html



(10)
You need to edit this to add none:
src/backend/utils/misc/postgresql.conf.sample


(11)
src/include/access/xlogdefs.h
src/backend/access/transam/varsup.c

Consider modifying the comments in these files that refer to wal_level.  Maybe wal_level <= minimal is enough?


(12)
src/include/utils/rel.h

Modify the RelationNeedsWAL() so that it returns false when wal_level = none.  Adding wal_level != WAL_LEVEL_NONE is
wouldprobably be okay. 


(13)
@@ -161,7 +161,8 @@ extern int    XLogArchiveMode;
 /* WAL levels */
 typedef enum WalLevel
 {
-    WAL_LEVEL_MINIMAL = 0,
+    WAL_LEVEL_NONE = 0,
+    WAL_LEVEL_MINIMAL,
     WAL_LEVEL_REPLICA,
     WAL_LEVEL_LOGICAL
 } WalLevel;

I'm a bit concerned about if we can change the values of existing symbols, because wal_level setting is stored in
pg_controlfile.  Having a quick look at source code, there seems to be no problem.  However, I'm not sure about
pg_upgrade. Can you try, for example, 

1. Create the database cluster with an older version, say, PG 13.
2. Start and stop the database server.
3. Run pg_controldata and see that it prints replica for the wal_level value.
4. Upgrade the database cluster with pg_upgrade.
5. Run pg_controldata and see the wal_level value.

If you change the values of existing symbols like your patch, you need to bump PG_CONTROL_VERSION.
If you find we can't tchange the existing values, you can probably set WAL_LEVEL_NONE to -1.


Regards
Takayuki Tsunakawa




В списке pgsql-hackers по дате отправления:

Предыдущее
От: seinoyu
Дата:
Сообщение: Re: [PATCH] Add features to pg_stat_statements
Следующее
От: Anastasia Lubennikova
Дата:
Сообщение: Re: Implementing Incremental View Maintenance