Re: VLDB Features

Поиск
Список
Период
Сортировка
От Trent Shipley
Тема Re: VLDB Features
Дата
Msg-id 200712141830.16877.trent_shipley@qwest.net
обсуждение исходный текст
Ответ на Re: VLDB Features  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (Alex Shulgin <ash@commandprompt.com>)
Список pgsql-hackers
On Friday 2007-12-14 16:22, Tom Lane wrote:
> Neil Conway <neilc@samurai.com> writes:
> > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
> > to drop (and log) rows that contain malformed data. That is, rows with
> > too many or too few columns, rows that result in constraint violations,
> > and rows containing columns where the data type's input function raises
> > an error. The last case is the only thing that would be a bit tricky to
> > implement, I think: you could use PG_TRY() around the InputFunctionCall,
> > but I guess you'd need a subtransaction to ensure that you reset your
> > state correctly after catching an error.
>
> Yeah.  It's the subtransaction per row that's daunting --- not only the
> cycles spent for that, but the ensuing limitation to 4G rows imported
> per COPY.

You could extend the COPY FROM syntax with a COMMIT EVERY n clause.  This 
would help with the 4G subtransaction limit.  The cost to the ETL process is 
that a simple rollback would not be guaranteed send the process back to it's 
initial state.  There are easy ways to deal with the rollback issue though.  

A {NO} RETRY {USING algorithm} clause might be useful.   If the NO RETRY 
option is selected then the COPY FROM can run without subtransactions and in 
excess of the 4G per transaction limit.  NO RETRY should be the default since 
it preserves the legacy behavior of COPY FROM.

You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not give the 
option of sending exceptions to a table since they are presumably malformed, 
otherwise they would not be exceptions.  (Users should re-process exception 
files if they want an if good then table a else exception to table b ...)

EXCEPTIONS TO and NO RETRY would be mutually exclusive.


> If we could somehow only do a subtransaction per failure, things would
> be much better, but I don't see how.




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Neil Conway
Дата:
Сообщение: Re: EXPLAIN ANALYZE printing logical and hardware I/O per-node
Следующее
От: Tom Lane
Дата:
Сообщение: Re: VLDB Features