Re: Conflict handling for COPY FROM

Поиск
Список
Период
Сортировка
От Surafel Temesgen
Тема Re: Conflict handling for COPY FROM
Дата
Msg-id CALAY4q8rs26AOHf48mC67bGgN4yHfrycnq5BgihyVpUe7hypyQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Conflict handling for COPY FROM  (Alexey Kondratov <a.kondratov@postgrespro.ru>)
Ответы Re: Conflict handling for COPY FROM  (Anthony Nowocien <anowocien@gmail.com>)
Список pgsql-hackers
Hi Alexey,
Thank you for looking at it

On Tue, Jul 2, 2019 at 7:57 PM Alexey Kondratov <a.kondratov@postgrespro.ru> wrote:
On 28.06.2019 16:12, Alvaro Herrera wrote:
>> On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres@anarazel.de> wrote:
>>> Or even just return it as a row. CopyBoth is relatively widely supported
>>> these days.
>> i think generating warning about it also sufficiently meet its propose of
>> notifying user about skipped record with existing logging facility
>> and we use it for similar propose in other place too. The different
>> i see is the number of warning that can be generated
> Warnings seem useless for this purpose.  I'm with Andres: returning rows
> would make this a fine feature.  If the user wants the rows in a table
> as Andrew suggests, she can use wrap the whole thing in an insert.

I agree with previous commentators that returning rows will make this
feature more versatile.

I agree. am looking at the options

Also, I would prefer having an option to ignore all errors, e.g. with
option ERROR_LIMIT set to -1. Because it is rather difficult to estimate
a number of future errors if you are playing with some badly structured
data, while always setting it to 100500k looks ugly.


Good idea 

 
1) Calculation of processed rows isn't correct (I've checked). You do it
in two places, and

-            processed++;
+            if (!cstate->error_limit)
+                processed++;

is never incremented if ERROR_LIMIT is specified and no errors
occurred/no constraints exist, so the result will always be 0. However,
if primary column with constraints exists, then processed is calculated
correctly, since another code path is used:


Correct. i will fix

+                        if (specConflict)
+                        {
+                            ...
+                        }
+                        else
+                            processed++;

I would prefer this calculation in a single place (as it was before
patch) for simplicity and in order to avoid such problems.


ok


2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT
is specified and was exceeded, which doesn't seem to be correct, does it?

-                        if (resultRelInfo->ri_NumIndices > 0)
+                        if (resultRelInfo->ri_NumIndices > 0 &&
cstate->error_limit == 0)
                              recheckIndexes = ExecInsertIndexTuples(myslot,


No it alwase executed . I did it this way to avoid
inserting index tuple twice but i see its unlikely


3) Trailing whitespaces added to error messages and tests for some reason:

+                    ereport(WARNING,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("skipping \"%s\" --- missing data
for column \"%s\" ",

+                    ereport(ERROR,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("missing data for column \"%s\" ",

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
  CONTEXT:  COPY x, line 1: "2000    230    23    23"

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
  CONTEXT:  COPY x, line 1: "2001    231    \N    \N"
 

regards
Surafel

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Surafel Temesgen
Дата:
Сообщение: Re: FETCH FIRST clause WITH TIES option
Следующее
От: John Naylor
Дата:
Сообщение: Re: benchmarking Flex practices