Re: [HACKERS] WAL logging problem in 9.4.3?
От | Noah Misch |
---|---|
Тема | Re: [HACKERS] WAL logging problem in 9.4.3? |
Дата | |
Msg-id | 20190910114517.GA29650@gust.leadboat.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] WAL logging problem in 9.4.3? (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Список | pgsql-hackers |
[Casual readers with opinions on GUC naming: consider skipping to the end.] MarkBufferDirtyHint() writes WAL even when rd_firstRelfilenodeSubid or rd_createSubid is set; see attached test case. It needs to skip WAL whenever RelationNeedsWAL() returns false. On Tue, Aug 27, 2019 at 03:49:32PM +0900, Kyotaro Horiguchi wrote: > At Sun, 25 Aug 2019 22:08:43 -0700, Noah Misch <noah@leadboat.com> wrote in <20190826050843.GB3153606@rfd.leadboat.com> > > Consider a one-page relfilenode. Doing all the things you list for a single > > page may be cheaper than locking millions of buffer headers. > > If I understand you correctly, I would say that *all* buffers > that don't belong to in-transaction-created files are skipped > before taking locks. No lock conflict happens with other > backends. > > FlushRelationBuffers uses double-checked-locking as follows: I had misread the code; you're right. > > This should be GUC-controlled, especially since this is back-patch material. > > Is this size of patch back-patchable? Its size is not an obstacle. It's not ideal to back-patch such a user-visible performance change, but it would be worse to leave back branches able to corrupt data during recovery. On Wed, Aug 28, 2019 at 03:42:10PM +0900, Kyotaro Horiguchi wrote: > - Use log_newpage instead of fsync for small tables. > I'm trying to measure performance difference on WAL/fsync. I would measure it with simultaneous pgbench instances: 1. DDL pgbench instance repeatedly creates and drops a table of X kilobytes, using --rate to make this happen a fixed number of times per second. 2. Regular pgbench instance runs the built-in script at maximum qps. For each X, try one test run with effective_io_block_size = X-1 and one with effective_io_block_size = X. If the regular pgbench instance gets materially higher qps with effective_io_block_size = X-1, the ideal default is <X. Otherwise, the ideal default is >=X. > + <varlistentry id="guc-effective-io-block-size" xreflabel="effective_io_block_size"> > + <term><varname>effective_io_block_size</varname> (<type>integer</type>) > + <indexterm> > + <primary><varname>effective_io_block_size</varname> configuration parameter</primary> > + </indexterm> > + </term> > + <listitem> > + <para> > + Specifies the expected maximum size of a file for which <function>fsync</function> returns in the minimum requiredduration. It is approximately the size of a track or sylinder for magnetic disks. > + The value is specified in kilobytes and the default is <literal>64</literal> kilobytes. > + </para> > + <para> > + When <xref linkend="guc-wal-level"/> is <literal>minimal</literal>, > + WAL-logging is skipped for tables created in-trasaction. If a table > + is smaller than that size at commit, it is WAL-logged instead of > + issueing <function>fsync</function> on it. > + > + </para> > + </listitem> > + </varlistentry> Cylinder and track sizes are obsolete as user-visible concepts. (They're not constant for a given drive, and I think modern disks provide no way to read the relevant parameters.) I like the name "wal_skip_threshold", and my second choice would be "wal_skip_min_size". Possibly documented as follows: When wal_level is minimal and a transaction commits after creating or rewriting a permanent table, materialized view, or index, this setting determines how to persist the new data. If the data is smaller than this setting, write it to the WAL log; otherwise, use an fsync of the data file. Depending on the properties of your storage, raising or lowering this value might help if such commits are slowing concurrent transactions. The default is 64 kilobytes (64kB). Any other opinions on the GUC name?
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Asim R PДата:
Сообщение: Re: standby recovery fails (tablespace related) (tentative patch and discussion)