Re: error-tolerant COPY FROM

Поиск
Список
Период
Сортировка
От Christopher Browne
Тема Re: error-tolerant COPY FROM
Дата
Msg-id 60zmyk7y2c.fsf@dba2.int.libertyrms.com
обсуждение исходный текст
Ответ на error-tolerant COPY FROM  ("Joolz" <joolz@arbodienst-limburg.nl>)
Список pgsql-general
joolz@arbodienst-limburg.nl ("Joolz") writes:

> Hello everyone,
>
> I'm building a postgresql db which will have to get lots of data
> from "the outside" (customers, that is). The db has lots of
> constraints, and I'm sure that our customers will offer lots of
> invalid information. We receive the information in csv format. My
> first thought was to read them into the database with COPY, but
> "COPY stops operation at the first error."
>
> What I need is an import where all valid lines from the csv files
> are read into the db, and I also get a logfile for all invalid
> lines, stating the line number plus the pg error message so I can
> see which constraint was violated.
>
> I can't think of a direct, elegant solution for this, does anyone
> have any suggestions? Thanks a lot!

First, load it into a table that has really, really _weak_
constraints, but which also adds a sequence column at the end that
gets automagically populated.

- The sequence gets you the "line numbers" you want.

- You then run queries that separate the "good" from the "crud."

This might ultimately turn into having 9 tables:

 - The original data, that you NEVER touch again, as the "pristine"
   form to look at if you find a problem;

 - 3 tables that collect entries with 3 different sorts of problems
   that mandate discarding the data [well, sending it back for
   retreading...]

 - 4 tables that contain entries that could get rewritten in some
   automatic fashion based on 4 policies you discovered

 - 1 "final results" table that aggregates those 4 tables along with
   the entries that were OK, which are fine to use as the resulting data
   import that can safely go into your application.

I'm making up numbers, but you hopefully get the idea...
--
"cbbrowne","@","ca.afilias.info"
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Safely Killing Backends (Was: Applications that leak connections)
Следующее
От: phil campaigne
Дата:
Сообщение: Re: Duration between two timestamps