Re: Progress report removal of temp files and temp relation files using ereport_startup_progress

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: Progress report removal of temp files and temp relation files using ereport_startup_progress
Дата
Msg-id CALj2ACW-ELOF5JT2zPavs95wbZ0BrLPrqvSZ7Ac+pjxCkmXtEQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Progress report removal of temp files and temp relation files using ereport_startup_progress  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Ответы Re: Progress report removal of temp files and temp relation files using ereport_startup_progress  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
On Mon, May 2, 2022 at 6:26 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Hi Bharath,
>
>
> On Sat, Apr 30, 2022 at 11:08 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Hi,
> >
> > At times, there can be many temp files (under pgsql_tmp) and temp
> > relation files (under removal which after crash may take longer during
> > which users have no clue about what's going on in the server before it
> > comes up online.
> >
> > Here's a proposal to use ereport_startup_progress to report the
> > progress of the file removal.
> >
> > Thoughts?
>
> The patch looks good to me.
>
> With this patch, the user would at least know which directory is being
> scanned and how much time has elapsed.

There's a problem with the patch, the timeout mechanism isn't being
used by the postmaster process. Postmaster doesn't
InitializeTimeouts() and doesn't register STARTUP_PROGRESS_TIMEOUT, I
tried to make postmaster do that (attached a v2 patch) but make check
fails.

Now, I'm thinking if it's a good idea to let postmaster use timeouts at all?

> It would be better to know how
> much work is remaining. I could not find a way to estimate the number
> of files in the directory so that we can extrapolate elapsed time and
> estimate the remaining time. Well, we could loop the output of
> opendir() twice, first to estimate and then for the actual work. This
> might actually work, if the time to delete all the files is very high
> compared to the time it takes to scan all the files/directories.
>
> Another possibility is to scan the sorted output of opendir() thus
> using the current file name to estimate remaining files in a very
> crude and inaccurate way. That doesn't look attractive either. I can't
> think of any better way to estimate the remaining time.

I think 'how much work/how many files remaining to process' is a
generic problem, for instance, snapshot, mapping files, old WAL file
processing and so on. I don't think we can do much about it.

> But at least with this patch, a user knows which files have been
> deleted, guessing how far, in the directory structure, the process has
> reached. S/he can then take a look at the remaining contents of the
> directory to estimate how much it should wait.

Not sure we will be able to use the timeout mechanism within
postmaster. Another idea is to have a generic GUC something like
log_file_processing_traffic = {none, medium, high} (similar idea is
proposed for WAL files processing while replaying/recovering at [1]),
default being none, when set to medium a log message gets emitted for
every say 128 or 256 (just a random number) files processed. when set
to high, log messages get emitted for every file processed (too
verbose). I think this generic GUC log_file_processing_traffic can be
used in many other file processing areas.

Thoughts?

[1] https://www.postgresql.org/message-id/CALj2ACVnhbx4pLZepvdqOfeOekvZXJ2F%3DwJeConGzok%2B6kgCVA%40mail.gmail.com

Regards,
Bharath Rupireddy.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: Perform streaming logical transactions by background workers and parallel apply
Следующее
От: Jakub Wartak
Дата:
Сообщение: RE: strange slow query - lost lot of time somewhere