Re: .ready and .done files considered harmful

Поиск
Список
Период
Сортировка
От Bossart, Nathan
Тема Re: .ready and .done files considered harmful
Дата
Msg-id 40DBACB6-963B-434D-99A3-B21E607FF240@amazon.com
обсуждение исходный текст
Ответ на Re: .ready and .done files considered harmful  (Dipesh Pandit <dipesh.pandit@gmail.com>)
Список pgsql-hackers
On 9/14/21, 7:23 AM, "Dipesh Pandit" <dipesh.pandit@gmail.com> wrote:
> I agree that when we are creating a .ready file we should compare 
> the current .ready file with the last .ready file to check if this file is 
> created out of order. We can store the state of the last .ready file 
> in shared memory and compare it with the current .ready file. I
> believe that archiver specific shared memory area can be used
> to store the state of the last .ready file unless I am missing
> something and this needs to be stored in a separate shared
> memory area.
>
> With this change, we have the flexibility to move the current archiver
> state out of shared memory and keep it local to archiver. I have 
> incorporated these changes and updated a new patch.

I wonder if this can be simplified even further.  If we don't bother
trying to catch out-of-order .ready files in XLogArchiveNotify() and
just depend on the per-checkpoint/restartpoint directory scans, we can
probably remove lastReadySegNo from archiver state completely.

+    /* Force a directory scan if we are archiving anything but a regular
+     * WAL file or if this WAL file is being created out-of-order.
+     */
+    if (!IsXLogFileName(xlog))
+        PgArchForceDirScan();
+    else
+    {
+        TimeLineID tli;
+        XLogSegNo last_segno;
+        XLogSegNo this_segno;
+
+        last_segno = PgArchGetLastReadySegNo();
+        XLogFromFileName(xlog, &tli, &this_segno, wal_segment_size);
+
+        /*
+         * Force a directory scan in case if this .ready file created out of
+         * order.
+         */
+        last_segno++;
+        if (last_segno != this_segno)
+            PgArchForceDirScan();
+
+        PgArchSetLastReadySegNo(this_segno);
+    }

This is an interesting idea, but the "else" block here seems prone to
race conditions.  I think we'd have to hold arch_lck to prevent that.
But as I mentioned above, if we are okay with depending on the
fallback directory scans, I think we can remove the "else" block
completely.

+    /* Initialize the current state of archiver */
+    xlogState.lastSegNo = MaxXLogSegNo;
+    xlogState.lastTli = MaxTimeLineID;

It looks like we have two ways to force a directory scan.  We can
either set force_dir_scan to true, or lastSegNo can be set to
MaxXLogSegNo.  Why not just set force_dir_scan to true here so that we
only have one way to force a directory scan?

+                    /*
+                     * Failed to archive, make sure that archiver performs a
+                     * full directory scan in the next cycle to avoid missing
+                     * the WAL file which could not be archived due to some
+                     * failure in current cycle.
+                     */
+                    PgArchForceDirScan();

Don't we also need to force a directory scan in the other cases we
return early from pgarch_ArchiverCopyLoop()?  We will have already
advanced the archiver state in pgarch_readyXlog(), so I think we'd end
up skipping files if we didn't.  For example, if archive_command isn't
set, we'll just return, and the next call to pgarch_readyXlog() might
return the next file.

+            /* Continue directory scan until we find a regular WAL file */
+            SpinLockAcquire(&PgArch->arch_lck);
+            PgArch->force_dir_scan = true;
+            SpinLockRelease(&PgArch->arch_lck);

nitpick: I think we should just call PgArchForceDirScan() here.

Nathan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Remove duplicate static function check_permissions in slotfuncs.c and logicalfuncs.c
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Getting ERROR "subplan "SubPlan 1" was not initialized" in EXISTS subplan when using for list partition.