Re: [GENERAL] import CSV file to a table

Поиск
Список
Период
Сортировка
От Karl Czajkowski
Тема Re: [GENERAL] import CSV file to a table
Дата
Msg-id 20170308161341.GA12449@moraine.isi.edu
обсуждение исходный текст
Ответ на Re: [GENERAL] import CSV file to a table  (John McKown <john.archie.mckown@gmail.com>)
Ответы Re: [GENERAL] import CSV file to a table
Re: [GENERAL] import CSV file to a table
Список pgsql-general
On Mar 08, John McKown modulated:
...
> ​I agree. I went with a "pure BASH" approach because it is what the
> user asked for & I wasn't sure what language she might be comfortable
> with. I use PERL a lot. Or maybe I should say that I abuse PERL a lot.
> Such as a PERL script with writes out another PERL script, based on
> some input files & parameters, then runs the just written PERL script,
> which does the load into a PostgreSQL database (multiple tables). Ya, a
> bit perverted.​
>

Well, you could follow a similar meta-programming/code-generating
pattern to have the BASH script output a single SQL file to run with
psql. You could even generate PL/pgsql code to defer more data
processing to the database itself.

I think the only robust "pure BASH" approach is to use a temporary
table, so you aren't trying to parse CSV content in BASH. Using csvkit
sounds good if you can introduce these third-party dependencies.

With the temporary table, you can use SQL for most validation or data
interrogation, but you need to know at least enough schema information
in advance to form the COPY statement. Parsing the CSV header row to
plan your work puts you right back to requiring a robust CSV parser
unless you can constrain your input scenarios to only handle very
trivial headers.

If you play games with a defaulting serial column and fixed column
names like "id, c1, c2, ..., cN" for the temporary table, you might
use the id column as a DB-assigned "row number" during COPY and
validation. In this case, you could even tell Postgres there is no
header, and then let it parse the header as another data record so you
can use SQL statements to determine the actual header names and
ordering in the input.  But this still requires knowing the column
count in advance of the COPY.

I also think using something like Python with structured data
processing would be wiser, unless you know enough about the schema in
advance to avoid any CSV parsing on the client side.


Karl


В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: [GENERAL] Unable to start postgresql
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: [GENERAL] Unable to start postgresql