Re: Syncrep and improving latency due to WAL throttling

Поиск
Список
Период
Сортировка
От	Bharath Rupireddy
Тема	Re: Syncrep and improving latency due to WAL throttling
Дата	30 января 2023 г. 08:16:46
Msg-id	CALj2ACWDwB-pF2iZK6dT3mPA3eOH+Mup0JVuyWDHXNA5JthLuQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Syncrep and improving latency due to WAL throttling (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы	Re: Syncrep and improving latency due to WAL throttling
Список	pgsql-hackers
Дерево обсуждения
On Sat, Jan 28, 2023 at 6:06 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> >
> > That's not the sole goal, from my end: I'd like to avoid writing out +
> > flushing the WAL in too small chunks.  Imagine a few concurrent vacuums or
> > COPYs or such - if we're unlucky they'd each end up exceeding their "private"
> > limit close to each other, leading to a number of small writes of the
> > WAL. Which could end up increasing local commit latency / iops.
> >
> > If we instead decide to only ever flush up to something like
> >   last_page_boundary - 1/8 * throttle_pages * XLOG_BLCKSZ
> >
> > we'd make sure that the throttling mechanism won't cause a lot of small
> > writes.
> >
>
> I'm not saying we shouldn't do this, but I still don't see how this
> could make a measurable difference. At least assuming a sensible value
> of the throttling limit (say, more than 256kB per backend), and OLTP
> workload running concurrently. That means ~64 extra flushes/writes per
> 16MB segment (at most). Yeah, a particular page might get unlucky and be
> flushed by multiple backends, but the average still holds. Meanwhile,
> the OLTP transactions will generate (at least) an order of magnitude
> more flushes.

I think measuring the number of WAL flushes with and without this
feature that the postgres generates is great to know this feature
effects on IOPS. Probably it's even better with variations in
synchronous_commit_flush_wal_after or the throttling limits.

On Sat, Jan 28, 2023 at 6:08 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 1/27/23 22:19, Andres Freund wrote:
> > Hi,
> >
> > On 2023-01-27 12:06:49 +0100, Jakub Wartak wrote:
> >> On Thu, Jan 26, 2023 at 4:49 PM Andres Freund <andres@anarazel.de> wrote:
> >>
> >>> Huh? Why did you remove the GUC?
> >>
> >> After reading previous threads, my optimism level of getting it ever
> >> in shape of being widely accepted degraded significantly (mainly due
> >> to the discussion of wider category of 'WAL I/O throttling' especially
> >> in async case, RPO targets in async case and potentially calculating
> >> global bandwidth).
> >
> > I think it's quite reasonable to limit this to a smaller scope. Particularly
> > because those other goals are pretty vague but ambitious goals. IMO the
> > problem with a lot of the threads is precisely that that they aimed at a level
> > of generallity that isn't achievable in one step.
> >
>
> +1 to that

Okay, I agree to limit the scope first to synchronous replication.

Although v2 is a WIP patch, I have some comments:
1. Coding style doesn't confirm to the Postgres standard:
+/*
+ * Called from ProcessMessageInterrupts() to avoid waiting while
being in critical section
+ */
80-line char limit

+void HandleXLogDelayPending()
A new line missing between function return type and functin name

+    elog(WARNING, "throttling WAL down on this session
(backendWalInserted=%d, LSN=%X/%X flushingTo=%X/%X)",
+        backendWalInserted,
+        LSN_FORMAT_ARGS(XactLastThrottledRecEnd),
+        LSN_FORMAT_ARGS(LastFullyWrittenXLogPage));
Indentation issue - space needed in the next lines after elog(WARNING,..

+    elog(WARNING, "throttling WAL down on this session - end (%f
ms)", timediff);
80-line char limit, timediff); be on the new line.

+    //RESUME_INTERRUPTS();
+    //HOLD_INTERRUPTS();
Multi-line comments are used elsewhere in the code.

Better to run pgindent on the patch.

2. It'd be better to add a TAP test hitting the WAL throttling.

3. We generally don't need timings to be calculated explicitly when we
emit before and after log messages. If needed one can calculate the
wait time from timestamps of the log messages. If it still needs an
explicit time difference which I don't think is required, because we
have a special event and before/after log messages, I think it needs
to be under track_wal_io_timing to keep things simple.

4. XLogInsertRecord() can be a hot path for high-write workload, so
effects of adding anything in it needs to be measured. So, it's better
to run benchmarks with this feature enabled and disabled.

5. Missing documentation of this feature and the GUC. I think we can
describe extensively why we'd need a feature of this kind in the
documentation for better adoption or usability.

6. Shouldn't the max limit be MAX_KILOBYTES?
+        &synchronous_commit_flush_wal_after,
+        0, 0, 1024L*1024L,

7. Can this GUC name be something like
synchronous_commit_wal_throttle_threshold to better reflect what it
does for a backend?
+        {"synchronous_commit_flush_wal_after", PGC_USERSET,
REPLICATION_SENDING,

8. A typo - s/confiration/confirmation
+            gettext_noop("Sets the maximum logged WAL in kbytes,
after which wait for sync commit confiration even without commit "),

9. This
"Sets the maximum logged WAL in kbytes, after which wait for sync
commit confiration even without commit "
better be something like below?
"Sets the maximum amount of WAL in kilobytes a backend generates after
which it waits for synchronous commit confiration even without commit
"

10. I think there's nothing wrong in adding some assertions in
HandleXLogDelayPending():
Assert(synchronous_commit_flush_wal_after > 0);
Assert(backendWalInserted > synchronous_commit_flush_wal_after * 1024L);
Assert(XactLastThrottledRecEnd is not InvalidXLogRecPtr);

11. Specify the reason in the comments as to why we had to do the
following things:
Here:
+    /* flush only up to the last fully filled page */
+    XLogRecPtr     LastFullyWrittenXLogPage = XactLastThrottledRecEnd
- (XactLastThrottledRecEnd % XLOG_BLCKSZ);
Talk about how this avoids multiple-flushes for the same page.

Here:
+ * Called from ProcessMessageInterrupts() to avoid waiting while
being in critical section
+ */
+void HandleXLogDelayPending()
Talk about how waiting in a critical section, that is in
XLogInsertRecord() causes locks to be held longer durations and other
effects.

Here:
+        /* WAL throttling */
+        backendWalInserted += rechdr->xl_tot_len;
Be a bit more verbose about why we try to throttle WAL and why only
for sync replication, the advantages, effects etc.

12. This better be a DEBUG1 or 2 message instead of WARNINGs to not
clutter server logs.
+    /* XXX Debug for now */
+    elog(WARNING, "throttling WAL down on this session
(backendWalInserted=%d, LSN=%X/%X flushingTo=%X/%X)",
+        backendWalInserted,
+        LSN_FORMAT_ARGS(XactLastThrottledRecEnd),
+        LSN_FORMAT_ARGS(LastFullyWrittenXLogPage));

+    elog(WARNING, "throttling WAL down on this session - end (%f
ms)", timediff);

13. BTW, we don't need to hold interrupts while waiting for sync
replication ack as it may block query cancels or proc die pendings.
You can remove these, unless there's a strong reason.
+    //HOLD_INTERRUPTS();
+    //RESUME_INTERRUPTS();

14. Add this wait event in the documentation.
+        case WAIT_EVENT_SYNC_REP_THROTTLED:
+            event_name = "SyncRepThrottled";
+            break;

15. Why can XLogDelayPending not be a volatile atomic variable? Is it
because it's not something being set within a signal handler?
 extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT bool XLogDelayPending;

16.
+    instr_time    wal_throttle_time_start, wal_throttle_time_end;
+    double        timediff;
+    XLogDelayPending = false;
An extra line needed after variable declaration and assignment.

17. I think adding how many times a backend throttled WAL to
pg_stat_activity is a good metric.

18. Can you avoid the need of new functions SyncRepWaitForLSNInternal
and SyncRepWaitForLSNThrottled by relying on the global throttling
state to determine the correct waitevent in SyncRepWaitForLSN?

19. I think measuring the number of WAL flushes with and without this
feature that the postgres generates is great to know this feature
effects on IOPS. Probably it's even better with variations in
synchronous_commit_flush_wal_after or the throttling limits.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Syncrep and improving latency due to WAL throttling