Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling

Поиск
Список
Период
Сортировка
От Alexey Kondratov
Тема Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
Дата
Msg-id 09534CC6-D594-4070-97BC-5AA93F107477@gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
Thank you for your comments Peter, there are some points that I did not think about before.

On 9 Jun 2017, at 01:09, Peter Geoghegan <pg@bowt.ie> wrote:

Adding a full support of ON CONFLICT DO NOTHING/UPDATE to COPY seems
to be a large separated task and is out of the current project scope, but
maybe there is
a relatively simple way to somehow perform internally tuples insert with
ON CONFLICT DO NOTHING? I have added Peter Geoghegan to cc, as
I understand he is the major contributor of UPSERT in PostgreSQL. It would
be great
if he will answer this question.

I think that there is a way of making COPY use "speculative
insertion", so that it behaves the same as ON CONFLICT DO NOTHING with
no inference specification. Whether or not this is useful depends on a
lot of things.


I am not going to start with "speculative insertion" right now, but it would be very 
useful, if you give me a point, where to start. Maybe I will at least try to evaluate 
the complexity of the problem.

I think that you need to more formally identify what errors your new
COPY error handling will need to swallow.
...
My advice right now is: see if you can figure out a way of doing what
you want without subtransactions at all, possibly by cutting some
scope. For example, maybe it would be satisfactory to have the
implementation just ignore constraint violations, but still raise
errors for invalid input for types.

Initially I was thinking only about malformed rows, e.g. less or extra columns. 
Honestly, I did not know that there are so many levels and ways where error 
can occur. So currently (and especially after your comments) I prefer to focus 
only on the following list of errors:

1) File format issues
a. Less columns than needed
b. Extra columns

2) I am doubt about type mismatch. It is possible to imagine a situation when, 
e.g. some integers are exported as int, and some as "int", but I am not sure 
that is is a common situation.

3) Some constraint violations, e.g. unique index.

First appeared to be easy achievable without subtransactions. I have created a 
proof of concept version of copy, where the errors handling is turned on by default. 
Please, see small patch attached (applicable to 76b11e8a43eca4612dfccfe7f3ebd293fb8a46ec) 
It throws warnings instead of errors for malformed lines with less/extra columns 
and reports line number.

Second is probably achievable without subtransactions via the PG_TRY/PG_CATCH 
around heap_form_tuple, since it is not yet inserted into the heap.

But third is questionable without subtransactions, since even if we check 
constraints once, there maybe various before/after triggers which can modify 
tuple, so it will not satisfy them. Corresponding comment inside copy.c states: 
"Note that a BR trigger might modify tuple such that the partition constraint is 
no satisfied, so we need to check in that case." Thus, there are maybe different 
situations here, as I understand. However, it a point where "speculative insertion"
is able to help.

These three cases should cover most real-life scenarios.

Is there really much value in ignoring errors due to invalid encoding?

Now, I have some doubts about it too. If there is an encoding problem, 
it is probably about the whole file, not only a few rows.


Alexey


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: amul sul
Дата:
Сообщение: Re: [HACKERS] remove unnecessary flag has_null from PartitionBoundInfoData
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: [HACKERS] Statement-level rollback