Re: Add new COPY option REJECT_LIMIT

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: Add new COPY option REJECT_LIMIT
Дата
Msg-id 644a094c-2c0c-4b5f-b151-3677c7de4ee7@oss.nttdata.com
обсуждение исходный текст
Ответ на Re: Add new COPY option REJECT_LIMIT  (torikoshia <torikoshia@oss.nttdata.com>)
Ответы Re: Add new COPY option REJECT_LIMIT
Список pgsql-hackers

On 2024/07/22 21:37, torikoshia wrote:
> On Fri, Jul 19, 2024 at 11:48 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> Thanks for the comment.
> 
>> In patch 0002, the ratio is calculated by the already skipped/processed
>> rows, but what if a user wants to copy 1000 rows, and he/she can tolerate
>> 10 error rows, so he/she might set *reject_limit 0.01*, but one bad row in the
>> first 100 rows will fail the entire command, this might surprise the user.
> 
> Since the ratio is calculated after all data is processed, the case "one bad row in the first 100 rows will fail the
entirecommand" doesn't happen:
 

Yes, but is this the desired behavior when using the ratio threshold?
I was thinking that COPY should fail once the error ratio (errors vs.
total rows in the input file) reaches the threshold. Since it's
difficult to know the total number of rows in the input file,
implementing this seems not easy, though.

> Updated the patch.

Thanks for updating the patch!


+      This option must be used with <literal>ON_ERROR</literal> to be set to
+      other than <literal>stop</literal>.

Regarding the ON_ERROR option, now it only has two values.
Instead of saying "other than stop," we should simply use "ignore"
for clarity and intuition?


+      When specified <literal>INFINITY</literal>, <command>COPY</command> ignores all
+      the errors. This is a synonym for <literal>ON_ERROR</literal> <literal>ignore</literal>.

For the INFINITY value, the description is a bit unclear.
As I understand it, INFINITY is the default for REJECT_LIMIT.
So, if ON_ERROR=ignore is set without specifying REJECT_LIMIT,
COPY will ignore all errors, similar to when REJECT_LIMIT=INFINITY is used.


In line with my previous suggestion, if we support only REJECT_LIMIT
without ON_ERROR, having INFINITY makes sense. However,
with ON_ERROR supported, REJECT_LIMIT=INFINITY seems redundant.
Users can just set ON_ERROR=ignore to ignore all errors,
making INFINITY unnecessary for REJECT_LIMIT. This is open for
discussion, but I believe it's better to remove INFINITY from
the REJECT_LIMIT options.


+        else if (strcmp(defel->defname, "reject_limit") == 0)
+        {
+            if (reject_limit_specified)
+                errorConflictingDefElem(defel, pstate);
+            if (!opts_out->on_error)
+                ereport(ERROR,
+                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                         errmsg("REJECT_LIMIT requires ON_ERROR to be set to other than stop")));

Using "ignore" instead of "other than stop" in the error message
is clearer and more intuitive.


Checking if ON_ERROR and REJECT_LIMIT are specified should be
done after the foreach() processing of the options. Otherwise,
if REJECT_LIMIT is specified before ON_ERROR, unexpected errors can occur.

---------------
=# copy test from '...' WITH (REJECT_LIMIT 7, ON_ERROR 'ignore');
ERROR:  REJECT_LIMIT requires ON_ERROR to be set to other than stop
---------------


+                ereport(ERROR,
+                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                         errmsg("exceeded the number specified by REJECT_LIMIT \"%lld\"",
+                                (long long) cstate->opts.reject_limits.num_err)));

The error message isn't clear about what exceeded REJECT_LIMIT.
How about saying "skipped more than REJECT_LIMIT rows" or something instead?


+/*
+ * A struct to hold reject_limit options, in a parsed form.
+ * More values to be added in another patch.
+ */

The latter comment doesn't seem necessary or helpful.


Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Allow logical failover slots to wait on synchronous replication
Следующее
От: Bakhtiyar Neyman
Дата:
Сообщение: Restrictive combination of GRANT and POLICY