Re: Add new COPY option REJECT_LIMIT

Поиск
Список
Период
Сортировка
От torikoshia
Тема Re: Add new COPY option REJECT_LIMIT
Дата
Msg-id 920a29a36032befbeb355cbdfbbe0f63@oss.nttdata.com
обсуждение исходный текст
Ответ на Re: Add new COPY option REJECT_LIMIT  (torikoshia <torikoshia@oss.nttdata.com>)
Ответы Re: Add new COPY option REJECT_LIMIT
Re: Add new COPY option REJECT_LIMIT
Список pgsql-hackers
On 2024-07-03 02:07, Fujii Masao wrote:
> However, if we support REJECT_LIMIT, I'm not sure if the ON_ERROR 
> option is still necessary.

I remembered another reason for the necessity of ON_ERROR.

ON_ERROR defines how to behave when encountering an error and it just 
accepts 'ignore' and 'stop' currently, but is expected to support other 
options such as saving details of errors to a table[1].
ON_ERROR=stop is a synonym for REJECT_LIMIT=infinity, but I imagine 
REJECT_LIMIT would not replace future options of ON_ERROR.

Considering this and the option we want to add this time is to specify 
an upper limit on the number or ratio of errors, the name of this option 
like "reject_limit" seems better than "ignore_errors".

On Fri, Jul 5, 2024 at 4:13 PM torikoshia <torikoshia@oss.nttdata.com> 
wrote:
> On 2024-07-05 12:59, Fujii Masao wrote:
>> On 2024/07/04 12:05, torikoshia wrote:
>>> I'm going to update it after discussing the option format as 
>>> described
>>> below.

Updated the patch.
0001 sets limit by the absolute number of error rows and 0002 sets limit 
by ratio of the error.

>> If we choose "all" as the keyword, renaming the option to 
>> IGNORE_ERRORS
>> might be more intuitive and easier to understand than REJECT_LIMIT.

> I feel that 'infinite' and 'unlimited' are unfamiliar values for
> PostgreSQL parameters, so 'all' might be better and IGNORE_ERRORS would
> be a better parameter name as your suggestion.

As described above, attached patch adopts REJECT_LIMIT, so it uses 
"infinity".

>> This makes me think it might be better to treat REJECT_LIMIT as
>> an additional option for ON_ERROR=stop instead of ON_ERROR=ignore
>> if we adopt your patch. Since ON_ERROR=stop is the default,
>> users could set the maximum number of allowed errors by specifying
>> only REJECT_LIMIT. Otherwise, they would need to specify both
>> ON_ERROR=ignore and REJECT_LIMIT.

> That makes sense.

On my second thought, whatever value ON_ERROR is specified(e.g. ignore, 
stop, table), it seems fine to use REJECT_LIMIT.
I feel REJECT_LIMIT has both "ignore" and "stop" characteristics, 
meaning it ignores errors until it reaches REJECT_LIMIT and stops when 
it exceeds the REJECT_LIMIT.
And REJECT_LIMIT seems orthogonal to 'table',  which specifies where to 
save error details.

Attached patch allows using REJECT_LIMIT regardless of the ON_ERROR 
option value.


[1] 
https://www.postgresql.org/message-id/flat/CACJufxH_OJpVra=0c4ow8fbxHj7heMcVaTNEPa5vAurSeNA-6Q@mail.gmail.com

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alena Rybakina
Дата:
Сообщение: Re: POC, WIP: OR-clause support for indexes
Следующее
От: Alexander Pyhalov
Дата:
Сообщение: Asynchronous MergeAppend