Обсуждение: Re: [HACKERS] Proposal : For Auto-Prewarm.

Поиск
Список
Период
Сортировка

Re: [HACKERS] Proposal : For Auto-Prewarm.

От
Amit Kapila
Дата:
On Wed, Jul 5, 2017 at 6:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Mon, Jul 3, 2017 at 3:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> Few comments on the latest patch:
>>
>> 1.
>> + LWLockRelease(&apw_state->lock);
>> + if (!is_bgworker)
>> + ereport(ERROR,
>> + (errmsg("could not perform block dump because dump file is being
>> used by PID %d",
>> + apw_state->pid_using_dumpfile)));
>> + ereport(LOG,
>> + (errmsg("skipping block dump because it is already being performed by PID %d",
>> + apw_state->pid_using_dumpfile)));
>>
>>
>> The above code looks confusing as both the messages are saying the
>> same thing in different words.  I think you keep one message (probably
>> the first one) and decide error level based on if this is invoked for
>> bgworker.  Also, you can move LWLockRelease after error message,
>> because if there is any error, then it will automatically release all
>> lwlocks.
>
> ERROR is used for autoprewarm_dump_now which is called from the backend.
> LOG is used for bgworker.
> wordings used are to match the context if failing to dump is
> acceptable or not. In the case of bgworker, it is acceptable we are
> not particular about the start time of dump but the only interval
> between the dumps. So if already somebody doing it is acceptable. But
> one who calls autoprewarm_dump_now might be particular about the start
> time of dump so we throw error making him retry same.
>
> The wording's are suggested by Robert(below snapshot) in one of his
> previous comments and I also agree with it. If you think I should
> reconsider this and I am missing something I am open to suggestions.
>

Not an issue, if you and Robert think having two different messages is
better, then let's leave it.  One improvement we could do here is to
initialize a boolean global variable for AutoPrewarmWorker, then use
it wherever required.


>> 3.
>> +dump_now(bool is_bgworker)
>> {
>> ..
>> + if (buf_state & BM_TAG_VALID)
>> + {
>> + block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
>> + block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
>> + block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
>> + block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
>> + block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
>> + ++num_blocks;
>> + }
>> ..
>> }
>>I think there is no use of writing Unlogged buffers unless the dump is
>>for the shutdown.  You might want to use BufferIsPermanent to detect the same.
>
> -- I do not think that is true pages of the unlogged table are also
> read into buffers for read-only purpose. So if we miss to dump them
> while we shut down then the previous dump should be used.
>

I am not able to understand what you want to say. Unlogged tables
should be empty in case of crash recovery.  Also, we never flush
unlogged buffers except for shutdown checkpoint, refer BufferAlloc and
in particular below comment:

* Make sure BM_PERMANENT is set for buffers that must be written at every
* checkpoint.  Unlogged buffers only need to be written at shutdown
* checkpoints, except for their "init" forks, which need to be treated
* just like permanent relations.


>> 4.
>> +static uint32
>> +dump_now(bool is_bgworker)
>> {
>> ..
>> + for (num_blocks = 0, i = 0; i < NBuffers; i++)
>> + {
>> + uint32 buf_state;
>> +
>> + if (!is_bgworker)
>> + CHECK_FOR_INTERRUPTS();
>> ..
>> }
>>
>> Why checking for interrupts is only for non-bgwroker cases?
>
> -- autoprewarm_dump_now is directly called from the backend. In such
> case, we have to handle signals registered for backend in dump_now().
> For bgworker dump_block_info_periodically caller of dump_now() handles
> SIGTERM, SIGUSR1 which we are interested in.
>

Okay, but what about signal handler for SIGUSR1
(procsignal_sigusr1_handler).  Have you verified that it will never
set the InterruptPending flag?

>
>> 6.
>> +dump_now(bool is_bgworker)
>> {
>> ..
>> + (void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
>> + apw_state->pid_using_dumpfile = InvalidPid;
>> ..
>> }
>>
>> How will pid_using_dumpfile be set to InvalidPid in the case of error
>> for non-bgworker cases?
>
> -- I have a try() - catch() in autoprewarm_dump_now I think that is okay.
>

Okay, then that will work.

>>
>> 7.
>> +dump_now(bool is_bgworker)
>> {
>> ..
>> + (void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
>>
>> ..
>> }
>>
>> How will transient_dump_file_path be unlinked in the case of error in
>> durable_rename?  I think you need to use PG_TRY..PG_CATCH to ensure
>> same?
>
> -- If durable_rename is failing that seems basic functionality of
> autoperwarm is failing so I want it to be an ERROR. I do not want to
> remove the temp file as we always truncate before reusing it again. So
> I think there is no need to catch all ERROR in dump_now() just to
> remove the temp file.
>

I am not getting your argument here, do you mean to say that if
writing to a transient file is failed then we should remove the
transient file but if the rename is failed then there is no need to
remove it?  It sounds strange to me, but maybe you have reason to do
it like that.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Proposal : For Auto-Prewarm.

От
Mithun Cy
Дата:
On Thu, Jul 6, 2017 at 10:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I am not able to understand what you want to say. Unlogged tables
> should be empty in case of crash recovery.  Also, we never flush
> unlogged buffers except for shutdown checkpoint, refer BufferAlloc and
> in particular below comment:

-- Sorry I said that because of my lack of knowledge about unlogged
tables. Yes, what you say is right "an unlogged table is automatically
truncated after a crash or unclean shutdown". So it will be enough if
we just dump them during shutdown time.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Proposal : For Auto-Prewarm.

От
Mithun Cy
Дата:
On Thu, Jul 6, 2017 at 10:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> 3.
>> -- I do not think that is true pages of the unlogged table are also
>> read into buffers for read-only purpose. So if we miss to dump them
>> while we shut down then the previous dump should be used.
>>
>
> I am not able to understand what you want to say. Unlogged tables
> should be empty in case of crash recovery.  Also, we never flush
> unlogged buffers except for shutdown checkpoint, refer BufferAlloc and
> in particular below comment:
>
> * Make sure BM_PERMANENT is set for buffers that must be written at every
> * checkpoint.  Unlogged buffers only need to be written at shutdown
> * checkpoints, except for their "init" forks, which need to be treated
> * just like permanent relations.

+ if (buf_state & BM_TAG_VALID &&
+ ((buf_state & BM_PERMANENT) || dump_unlogged))
I have changed it now the final call to dump_now during shutdown or if
called through autoprewarm_dump_now() only we dump blockinfo of
unlogged tables.

>>
>> -- autoprewarm_dump_now is directly called from the backend. In such
>> case, we have to handle signals registered for backend in dump_now().
>> For bgworker dump_block_info_periodically caller of dump_now() handles
>> SIGTERM, SIGUSR1 which we are interested in.
>>
>
> Okay, but what about signal handler for c
> (procsignal_sigusr1_handler).  Have you verified that it will never
> set the InterruptPending flag?

Okay now CHECK_FOR_INTERRUPTS is called for both.

>
>>>
>>> 7.
>>> +dump_now(bool is_bgworker)
>>> {
>>> ..
>>> + (void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
>>>
>>> ..
>>> }
>>>
>>> How will transient_dump_file_path be unlinked in the case of error in
>>> durable_rename?  I think you need to use PG_TRY..PG_CATCH to ensure
>>> same?
>>
>> -- If durable_rename is failing that seems basic functionality of
>> autoperwarm is failing so I want it to be an ERROR. I do not want to
>> remove the temp file as we always truncate before reusing it again. So
>> I think there is no need to catch all ERROR in dump_now() just to
>> remove the temp file.
>>
> I am not getting your argument here, do you mean to say that if
> writing to a transient file is failed then we should remove the
> transient file but if the rename is failed then there is no need to
> remove it?  It sounds strange to me, but maybe you have reason to do
> it like that.

my intention is to unlink when ever possible and when ever control is
within the function. I thought it is okay if we error inside called
function. If temp file is left there that will not be a problem as it
will be reused(truncated first) for next dump. If you think it is
needed I will add a try() catch() around, to catch any error and then
remove the file.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Proposal : For Auto-Prewarm.

От
Robert Haas
Дата:
On Fri, Jul 14, 2017 at 8:17 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> [ new patch ]

I spent some time going over this patch.  I initially thought it only
needed minor cosmetic tweaking but the more I poked at it the more
things I found that seemed like they should be changed, so the
attached version looks pretty significantly different from what was
last posted.  It's not actually as different as it looks because a lot
of the changes are purely cosmetic.

Changes:

- Rewrote the documentation, many of the comments, and some of the
other messages significantly.
- Renamed private functions so they all start with apw_ instead of
having what seemed to be a mix of naming conventions.
- Reorganized the file so that the important functions are at the top.
- Added prototypes for the static functions that lacked them.
- Got rid of AutoPrewarmTask.
- Got rid of skip_prewarm_on_restart.
- Added LWLockAcquire/LWLockRelease calls in many places where they
were left out.  This may make no difference but it seems safer.
- Refactored the worker-starting code into two separate functions, one
for the main worker and one for the per-database worker.
- Inlined some functions that were only called from one place.
- Rewrote the delay loop.  Previously this used a struct timeval but
tv_usec was always 0 and the actual struct was never passed to any
system function, so I think this loop couldn't have been accurate to
more than the nearest second and depending unnecessarily on the
operating system structure seems pointless.  I changed also changed it
to be more explicit about the autoprewarm_interval == 0 case and to
bump the last dump time before, rather than after, dumping.
Otherwise, the time between dumps will be increased by the amount of
time the dump itself takes, which is not what the user will expect.
- Used the correct PG_RETURN macro -- the return type of
autoprewarm_dump_now is int8, so PG_RETURN_INT64 must be used.
- Updated various other places to use int64 for consistency.
- Possibly a few other things I'm forgetting about right now.

It's quite possible that in making all of these changes I've
introduced some bugs, so I think this needs some testing and review.
It's also possible that some of the changes that I made are actually
not improvements and should be reverted, but it's always hard to tell
that about your own code.  Anyway, please see the attached version.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Proposal : For Auto-Prewarm.

От
Mithun Cy
Дата:
On Wed, Aug 16, 2017 at 2:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jul 14, 2017 at 8:17 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> [ new patch ]
> It's quite possible that in making all of these changes I've
> introduced some bugs, so I think this needs some testing and review.
> It's also possible that some of the changes that I made are actually
> not improvements and should be reverted, but it's always hard to tell
> that about your own code.  Anyway, please see the attached version.

Sorry, Robert, I was on vacation so could not pick this immediately. I
have been testing and reviewing the patch and I found following
issues.

1. Hang in apw_detach_shmem.
+/*
+ * Clear our PID from autoprewarm shared state.
+ */
+static void
+apw_detach_shmem(int code, Datum arg)
+{
+   LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+   if (apw_state->pid_using_dumpfile == MyProcPid)
+   apw_state->pid_using_dumpfile = InvalidPid;
+   if (apw_state->bgworker_pid == MyProcPid)
+   apw_state->bgworker_pid = InvalidPid;
+   LWLockRelease(&apw_state->lock);
+}

The reason is that we might already be under the apw_state->lock when
we error out and jump to apw_detach_shmem. So we should not be trying
to take the lock again. For example, in autoprewarm_dump_now(),
apw_dump_now() will error out under the lock if bgworker is already
using dump file.

=======
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+    int64 num_blocks;
+
+    apw_init_shmem();
+
+    PG_TRY();
+    {
+         num_blocks = apw_dump_now(false, true);
+    }
+    PG_CATCH();
+    {
+         apw_detach_shmem(0, 0);
+         PG_RE_THROW();
+    }
+    PG_END_TRY();
+
+    PG_RETURN_INT64(num_blocks);
+}

=======
+ LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+ if (apw_state->pid_using_dumpfile == InvalidPid)
+ apw_state->pid_using_dumpfile = MyProcPid;
+ else
+ {
+     if (!is_bgworker)
+         ereport(ERROR,
+                    (errmsg("could not perform block dump because
dump file is being used by PID %d",
+                     apw_state->pid_using_dumpfile)));

This attempt to take lock again hangs the autoprewarm module. I think
there is no need to take lock while we reset those variables as we
reset only if we have set it ourselves.

2) I also found one issue which was my own mistake in my previous patch 19.
In "apw_dump_now" I missed calling FreeFile() on first write error,
whereas on othercases I am already calling the same.
ret = fprintf(file, "<<" INT64_FORMAT ">>\n", num_blocks);
+ if (ret < 0)
+ {
+ int save_errno = errno;
+
+ unlink(transient_dump_file_path);

Other than this, the patch is working as it was previously doing. If
you think my presumed fix(not to take lock) to hang issue is right I
will produce a patch for the same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Proposal : For Auto-Prewarm.

От
Robert Haas
Дата:
On Fri, Aug 18, 2017 at 2:23 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> 1. Hang in apw_detach_shmem.
> +/*
> + * Clear our PID from autoprewarm shared state.
> + */
> +static void
> +apw_detach_shmem(int code, Datum arg)
> +{
> +   LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
> +   if (apw_state->pid_using_dumpfile == MyProcPid)
> +   apw_state->pid_using_dumpfile = InvalidPid;
> +   if (apw_state->bgworker_pid == MyProcPid)
> +   apw_state->bgworker_pid = InvalidPid;
> +   LWLockRelease(&apw_state->lock);
> +}
>
> The reason is that we might already be under the apw_state->lock when
> we error out and jump to apw_detach_shmem. So we should not be trying
> to take the lock again. For example, in autoprewarm_dump_now(),
> apw_dump_now() will error out under the lock if bgworker is already
> using dump file.

Ah, good catch.  While I agree that there is probably no great harm
from skipping the lock here, I think it would be better to just avoid
throwing an error while we hold the lock.  I think apw_dump_now() is
the only place where that could happen, and in the attached version,
I've fixed it so it doesn't do that any more.  Independent of the
correctness issue, I think the code is easier to read this way.

I also realize that it's not formally sufficient to use
PG_TRY()/PG_CATCH() here, because a FATAL would leave us in a bad
state.  Changed to PG_ENSURE_ERROR_CLEANUP().

> 2) I also found one issue which was my own mistake in my previous patch 19.
> In "apw_dump_now" I missed calling FreeFile() on first write error,
> whereas on othercases I am already calling the same.
> ret = fprintf(file, "<<" INT64_FORMAT ">>\n", num_blocks);
> + if (ret < 0)
> + {
> + int save_errno = errno;
> +
> + unlink(transient_dump_file_path);

Changed in the attached version.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Proposal : For Auto-Prewarm.

От
Mithun Cy
Дата:
On Fri, Aug 18, 2017 at 9:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Aug 18, 2017 at 2:23 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> Ah, good catch.  While I agree that there is probably no great harm
> from skipping the lock here, I think it would be better to just avoid
> throwing an error while we hold the lock.  I think apw_dump_now() is
> the only place where that could happen, and in the attached version,
> I've fixed it so it doesn't do that any more.  Independent of the
> correctness issue, I think the code is easier to read this way.
>
> I also realize that it's not formally sufficient to use
> PG_TRY()/PG_CATCH() here, because a FATAL would leave us in a bad
> state.  Changed to PG_ENSURE_ERROR_CLEANUP().
>
>> 2) I also found one issue which was my own mistake in my previous patch 19.
>> In "apw_dump_now" I missed calling FreeFile() on first write error,
>> whereas on othercases I am already calling the same.
>> ret = fprintf(file, "<<" INT64_FORMAT ">>\n", num_blocks);
>> + if (ret < 0)
>> + {
>> + int save_errno = errno;
>> +
>> + unlink(transient_dump_file_path);
>
> Changed in the attached version.

Thanks for the patch, I have tested the above fix now it works as
described. From my test patch looks good, I did not find any other
issues.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Proposal : For Auto-Prewarm.

От
Robert Haas
Дата:
On Mon, Aug 21, 2017 at 2:42 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> Thanks for the patch, I have tested the above fix now it works as
> described. From my test patch looks good, I did not find any other
> issues.

Considering the totality of the circumstances, it seemed appropriate
to me to commit this.  So I did.

Thanks for all your work on this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company