Re: [PATCH] Initial progress reporting for COPY command

Поиск
Список
Период
Сортировка
От Josef Šimánek
Тема Re: [PATCH] Initial progress reporting for COPY command
Дата
Msg-id CAFp7QwqWSwhmEcCEoJqRJofURMQ2Sffu0+-Brt+LBUqU-ds-cw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Initial progress reporting for COPY command  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: [PATCH] Initial progress reporting for COPY command  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers


po 22. 6. 2020 v 14:14 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com> napsal:
On Sun, Jun 21, 2020 at 01:40:34PM +0200, Josef Šimánek wrote:
>Thanks for all comments. I have updated code to support more options
>(including STDIN/STDOUT) and added some documentation.
>
>Patch is attached and can be found also at
>https://github.com/simi/postgres/pull/5.
>
>Diff version: https://github.com/simi/postgres/pull/5.diff
>Patch version: https://github.com/simi/postgres/pull/5.patch
>
>I'm also attaching screenshot of HTML documentation and html documentation
>file.
>
>I'll do my best to get this to commitfest now.
>

I see we're not showing the total number of bytes the COPY is expected
to process, which makes it hard to estimate how far we actually are.
Clearly there are cases when we really don't know that (exports, import
from stdin/program), but why not to show file size for imports from a
file? I'd expect that to be the most common case.

For COPY FROM file fstat is done and info is available already at https://github.com/postgres/postgres/blob/fe186b4c200b76a5c0f03379fe8645ed1c70a844/src/backend/commands/copy.c#L1934. It should be easy to update some param (param6 for example) with file size and expose it in report view. When not available, this column can be NULL.

Would that be enough?

On the other side everyone can check file size manually to get total value expected and just compare to reported bytes_processed. Alt. "wc -l" can be checked to get amount of lines and check lines_processed column to get progress. Should it check amount of lines and populate another column with lines total (using a configured separator) as well? AFAIK that would need full file scan which can be slow for huge files.
 
I wonder if it made sense to show some estimates in the other cases. For
example when exporting query result, maybe we could show the estimated
number of rows and size? Of course, that's prone to estimation errors
and it's more a wild idea for the future, I don't expect this patch to
implement that.

My plan here was to expose numbers not being currently available and let clients get the rest of info on their own.

For example:
- for "COPY (query) TO file" - EXPLAIN or COUNT variant of query could be executed before to get the amount of expected rows
- for "COPY table FROM file" - file size or amount of lines in file can be inspected first to get amount of expected rows or bytes to be processed

I see the current system view in my patch (and also all other report views currently available) more as a scaffold to build own tools.

For example CLI tools can use this to provide some kind of progress.
 
regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Следующее
От: Robert Haas
Дата:
Сообщение: Re: suggest to rename enable_incrementalsort