Re: [HACKERS] Proposal : For Auto-Prewarm.

Поиск
Список
Период
Сортировка
От Mithun Cy
Тема Re: [HACKERS] Proposal : For Auto-Prewarm.
Дата
Msg-id CAD__OuhXaGPy7Xfhvm+aCaipXQ967tmfPXjYHbnbr2TqcJOM+g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Proposal : For Auto-Prewarm.  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Proposal : For Auto-Prewarm.  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, Aug 16, 2017 at 2:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jul 14, 2017 at 8:17 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> [ new patch ]
> It's quite possible that in making all of these changes I've
> introduced some bugs, so I think this needs some testing and review.
> It's also possible that some of the changes that I made are actually
> not improvements and should be reverted, but it's always hard to tell
> that about your own code.  Anyway, please see the attached version.

Sorry, Robert, I was on vacation so could not pick this immediately. I
have been testing and reviewing the patch and I found following
issues.

1. Hang in apw_detach_shmem.
+/*
+ * Clear our PID from autoprewarm shared state.
+ */
+static void
+apw_detach_shmem(int code, Datum arg)
+{
+   LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+   if (apw_state->pid_using_dumpfile == MyProcPid)
+   apw_state->pid_using_dumpfile = InvalidPid;
+   if (apw_state->bgworker_pid == MyProcPid)
+   apw_state->bgworker_pid = InvalidPid;
+   LWLockRelease(&apw_state->lock);
+}

The reason is that we might already be under the apw_state->lock when
we error out and jump to apw_detach_shmem. So we should not be trying
to take the lock again. For example, in autoprewarm_dump_now(),
apw_dump_now() will error out under the lock if bgworker is already
using dump file.

=======
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+    int64 num_blocks;
+
+    apw_init_shmem();
+
+    PG_TRY();
+    {
+         num_blocks = apw_dump_now(false, true);
+    }
+    PG_CATCH();
+    {
+         apw_detach_shmem(0, 0);
+         PG_RE_THROW();
+    }
+    PG_END_TRY();
+
+    PG_RETURN_INT64(num_blocks);
+}

=======
+ LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+ if (apw_state->pid_using_dumpfile == InvalidPid)
+ apw_state->pid_using_dumpfile = MyProcPid;
+ else
+ {
+     if (!is_bgworker)
+         ereport(ERROR,
+                    (errmsg("could not perform block dump because
dump file is being used by PID %d",
+                     apw_state->pid_using_dumpfile)));

This attempt to take lock again hangs the autoprewarm module. I think
there is no need to take lock while we reset those variables as we
reset only if we have set it ourselves.

2) I also found one issue which was my own mistake in my previous patch 19.
In "apw_dump_now" I missed calling FreeFile() on first write error,
whereas on othercases I am already calling the same.
ret = fprintf(file, "<<" INT64_FORMAT ">>\n", num_blocks);
+ if (ret < 0)
+ {
+ int save_errno = errno;
+
+ unlink(transient_dump_file_path);

Other than this, the patch is working as it was previously doing. If
you think my presumed fix(not to take lock) to hang issue is right I
will produce a patch for the same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Khandekar
Дата:
Сообщение: Re: [HACKERS] expanding inheritance in partition bound order
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] recovery_target_time = 'now' is not an error but stillimpractical setting