On Wed, Aug 16, 2017 at 2:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jul 14, 2017 at 8:17 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> [ new patch ]
> It's quite possible that in making all of these changes I've
> introduced some bugs, so I think this needs some testing and review.
> It's also possible that some of the changes that I made are actually
> not improvements and should be reverted, but it's always hard to tell
> that about your own code. Anyway, please see the attached version.
Sorry, Robert, I was on vacation so could not pick this immediately. I
have been testing and reviewing the patch and I found following
issues.
1. Hang in apw_detach_shmem.
+/*
+ * Clear our PID from autoprewarm shared state.
+ */
+static void
+apw_detach_shmem(int code, Datum arg)
+{
+ LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+ if (apw_state->pid_using_dumpfile == MyProcPid)
+ apw_state->pid_using_dumpfile = InvalidPid;
+ if (apw_state->bgworker_pid == MyProcPid)
+ apw_state->bgworker_pid = InvalidPid;
+ LWLockRelease(&apw_state->lock);
+}
The reason is that we might already be under the apw_state->lock when
we error out and jump to apw_detach_shmem. So we should not be trying
to take the lock again. For example, in autoprewarm_dump_now(),
apw_dump_now() will error out under the lock if bgworker is already
using dump file.
=======
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+ int64 num_blocks;
+
+ apw_init_shmem();
+
+ PG_TRY();
+ {
+ num_blocks = apw_dump_now(false, true);
+ }
+ PG_CATCH();
+ {
+ apw_detach_shmem(0, 0);
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ PG_RETURN_INT64(num_blocks);
+}
=======
+ LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+ if (apw_state->pid_using_dumpfile == InvalidPid)
+ apw_state->pid_using_dumpfile = MyProcPid;
+ else
+ {
+ if (!is_bgworker)
+ ereport(ERROR,
+ (errmsg("could not perform block dump because
dump file is being used by PID %d",
+ apw_state->pid_using_dumpfile)));
This attempt to take lock again hangs the autoprewarm module. I think
there is no need to take lock while we reset those variables as we
reset only if we have set it ourselves.
2) I also found one issue which was my own mistake in my previous patch 19.
In "apw_dump_now" I missed calling FreeFile() on first write error,
whereas on othercases I am already calling the same.
ret = fprintf(file, "<<" INT64_FORMAT ">>\n", num_blocks);
+ if (ret < 0)
+ {
+ int save_errno = errno;
+
+ unlink(transient_dump_file_path);
Other than this, the patch is working as it was previously doing. If
you think my presumed fix(not to take lock) to hang issue is right I
will produce a patch for the same.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com