Re: [HACKERS] WAL logging problem in 9.4.3?

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: [HACKERS] WAL logging problem in 9.4.3?
Дата
Msg-id 20190910114517.GA29650@gust.leadboat.com
обсуждение исходный текст
Ответ на Re: [HACKERS] WAL logging problem in 9.4.3?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
[Casual readers with opinions on GUC naming: consider skipping to the end.]

MarkBufferDirtyHint() writes WAL even when rd_firstRelfilenodeSubid or
rd_createSubid is set; see attached test case.  It needs to skip WAL whenever
RelationNeedsWAL() returns false.

On Tue, Aug 27, 2019 at 03:49:32PM +0900, Kyotaro Horiguchi wrote:
> At Sun, 25 Aug 2019 22:08:43 -0700, Noah Misch <noah@leadboat.com> wrote in
<20190826050843.GB3153606@rfd.leadboat.com>
> > Consider a one-page relfilenode.  Doing all the things you list for a single
> > page may be cheaper than locking millions of buffer headers.
> 
> If I understand you correctly, I would say that *all* buffers
> that don't belong to in-transaction-created files are skipped
> before taking locks. No lock conflict happens with other
> backends.
> 
> FlushRelationBuffers uses double-checked-locking as follows:

I had misread the code; you're right.

> > This should be GUC-controlled, especially since this is back-patch material.
> 
> Is this size of patch back-patchable?

Its size is not an obstacle.  It's not ideal to back-patch such a user-visible
performance change, but it would be worse to leave back branches able to
corrupt data during recovery.

On Wed, Aug 28, 2019 at 03:42:10PM +0900, Kyotaro Horiguchi wrote:
> - Use log_newpage instead of fsync for small tables.

> I'm trying to measure performance difference on WAL/fsync.

I would measure it with simultaneous pgbench instances:

1. DDL pgbench instance repeatedly creates and drops a table of X kilobytes,
   using --rate to make this happen a fixed number of times per second.
2. Regular pgbench instance runs the built-in script at maximum qps.

For each X, try one test run with effective_io_block_size = X-1 and one with
effective_io_block_size = X.  If the regular pgbench instance gets materially
higher qps with effective_io_block_size = X-1, the ideal default is <X.
Otherwise, the ideal default is >=X.

> +     <varlistentry id="guc-effective-io-block-size" xreflabel="effective_io_block_size">
> +      <term><varname>effective_io_block_size</varname> (<type>integer</type>)
> +      <indexterm>
> +       <primary><varname>effective_io_block_size</varname> configuration parameter</primary>
> +      </indexterm>
> +      </term>
> +      <listitem>
> +       <para>
> +        Specifies the expected maximum size of a file for which <function>fsync</function> returns in the minimum
requiredduration. It is approximately the size of a track or sylinder for magnetic disks.
 
> +        The value is specified in kilobytes and the default is <literal>64</literal> kilobytes.
> +       </para>
> +       <para>
> +        When <xref linkend="guc-wal-level"/> is <literal>minimal</literal>,
> +        WAL-logging is skipped for tables created in-trasaction.  If a table
> +        is smaller than that size at commit, it is WAL-logged instead of
> +        issueing <function>fsync</function> on it.
> +
> +       </para>
> +      </listitem>
> +     </varlistentry>

Cylinder and track sizes are obsolete as user-visible concepts.  (They're not
constant for a given drive, and I think modern disks provide no way to read
the relevant parameters.)  I like the name "wal_skip_threshold", and my second
choice would be "wal_skip_min_size".  Possibly documented as follows:

  When wal_level is minimal and a transaction commits after creating or
  rewriting a permanent table, materialized view, or index, this setting
  determines how to persist the new data.  If the data is smaller than this
  setting, write it to the WAL log; otherwise, use an fsync of the data file.
  Depending on the properties of your storage, raising or lowering this value
  might help if such commits are slowing concurrent transactions.  The default
  is 64 kilobytes (64kB).

Any other opinions on the GUC name?

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Asim R P
Дата:
Сообщение: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Change atoi to strtol in same place