Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
Дата
Msg-id 20180123022122.GD2416@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling  (Alexey Kondratov <kondratov.aleksey@gmail.com>)
Ответы Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
Список pgsql-hackers
Alexey,

* Alexey Kondratov (kondratov.aleksey@gmail.com) wrote:
> On Fri, Dec 1, 2017 at 1:58 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > On a *very* quick look, please use an enum to return from NextCopyFrom
> > rather than 'int'.  The chunks that change bool to int are very
> > odd-looking.  This would move the comment that explains the value from
> > copy.c to copy.h, obviously.  Also, you seem to be using non-ASCII dashes
> > in the descriptions of those values; please don't.
>
> I will fix it, thank you.
>
> > Or maybe I misunderstood the patch completely.
>
> I hope so. Here is my thoughts how it all works, please correct me,
> where I am wrong:
>
> 1) First, I have simply changed ereport level to WARNING for specific
> validations (extra or missing columns, etc) if INGONE_ERRORS option is
> used. All these checks are inside NextCopyFrom. Thus, this patch
> performs here pretty much the same as before, excepting that it is
> possible to skip bad lines, and this part should be safe as well
>
> 2) About PG_TRY/CATCH. I use it to catch the only one specific
> function call inside NextCopyFrom--it is InputFunctionCall--which is
> used just to parse datatype from the input string. I have no idea how
> WAL write or trigger errors could get here
>
> All of these is done before actually forming a tuple, putting it into
> the heap, firing insert-related triggers, etc. I am not trying to
> catch all errors during the row processing, only input data errors. So
> why is it unsafe?

The issue is that InputFunctionCall is actually calling out to other
functions and they could be allocating memory and doing other things.
What you're missing by using PG_TRY() here without doing a PG_RE_THROW()
is all the cleanup work that AbortTransaction() and friends do- ensuring
there's a valid memory context (cleaning up the ones that might have
been allocated by the input function), releasing locks, resetting user
and security contexts, and a whole slew of other things.

Is all of that always needed for an error thrown by an input function?
Hard to say, really, since input functions can be provided by
extensions.  You might get away with doing this for int4in(), but we
also have to be thinking about extensions like PostGIS which introduce
their own geometry and geography data types that are a great deal more
complicated.

In the end, this isn't an acceptable approach to this problem.  The
approach outlined previously using sub-transactions which attempted
insert some number of records and then, on failure, restarted and
inserted up until the failed record and then skipped it, starting over
again with the next batch, seemed pretty reasonable to me.  If you have
time to work on that, that'd be great, otherwise we'll probably need to
mark this as returned with feedback.

Thanks!

Stephen

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] A design for amcheck heapam verification
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Handling better supported channel binding types for SSLimplementations