Re: [HACKERS] Restricting maximum keep segments by repslots

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: [HACKERS] Restricting maximum keep segments by repslots
Дата
Msg-id 20181120050744.GJ4400@paquier.xyz
обсуждение исходный текст
Ответ на Re: [HACKERS] Restricting maximum keep segments by repslots  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: [HACKERS] Restricting maximum keep segments by repslots  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
On Mon, Nov 19, 2018 at 01:39:58PM +0900, Michael Paquier wrote:
> I was just coming by to look at bit at the patch series, and bumped
> into that:

So I have been looking at the last patch series 0001-0004 posted on this
thread, and coming from here:
https://postgr.es/m/20181025.215518.189844649.horiguchi.kyotaro@lab.ntt.co.jp

/* check that the slot is gone */
SELECT * FROM pg_replication_slots
It could be an idea to switch to the expanded mode here, not that it
matters much still..

+IsLsnStillAvaiable(XLogRecPtr targetLSN, uint64 *restBytes)
You mean Available here, not Avaiable.  This function is only used when
scanning for slot information with pg_replication_slots, so wouldn't it
be better to just return the status string in this case?

Not sure I see the point of the "remain" field, which can be found with
a simple calculation using the current insertion LSN, the segment size
and the amount of WAL that the slot is retaining.  It may be interesting
to document a query to do that though.

GetOldestXLogFileSegNo() has race conditions if WAL recycling runs in
parallel, no?  How is it safe to scan pg_wal on a process querying
pg_replication_slots while another process may manipulate its contents
(aka the checkpointer or just the startup process with an
end-of-recovery checkpoint.).  This routine relies on unsafe
assumptions as this is not concurrent-safe.  You can avoid problems by
making sure instead that lastRemovedSegNo is initialized correctly at
startup, which would be normally one segment older than what's in
pg_wal, which feels a bit hacky to rely on to track the oldest segment.

It seems to me that GetOldestXLogFileSegNo() should also check for
segments matching the current timeline, no?

+           if (prev_lost_segs != lost_segs)
+               ereport(WARNING,
+                       (errmsg ("some replication slots have lost
required WAL segments"),
+                        errdetail_plural(
+                            "The mostly affected slot has lost %ld
segment.",
+                            "The mostly affected slot has lost %ld
segments.",
+                            lost_segs, lost_segs)));
This can become very noisy with the time, and it would be actually
useful to know which replication slot is impacted by that.

+      slot doesn't have valid restart_lsn, this field
Missing a determinant here, and restart_lsn should have a <literal>
markup.

+    many WAL segments that they fill up the space allotted
s/allotted/allocated/.

+      available. The last two states are seen only when
+      <xref linkend="guc-max-slot-wal-keep-size"/> is non-negative. If the
+      slot doesn't have valid restart_lsn, this field
+      is <literal>unknown</literal>.
I am a bit confused by this statement.  The last two states are "lost"
and "keeping", but shouldn't "keeping" be the state showing up by
default as it means that all WAL segments are kept around.

+# Advance WAL by ten segments (= 160MB) on master
+advance_wal($node_master, 10);
+$node_master->safe_psql('postgres', "CHECKPOINT;");
This makes the tests very costly, which is something we should avoid as
much as possible.  One trick which could be used here, on top of
reducing the number of segment switches, is to use initdb
--wal-segsize=1.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: typo fix
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: typo fix