Re: [GENERAL] import CSV file to a table

Поиск
Список
Период
Сортировка
От Rob Sargent
Тема Re: [GENERAL] import CSV file to a table
Дата
Msg-id 7858a99b-0ae4-4d0d-c2cc-56571f00d51f@gmail.com
обсуждение исходный текст
Ответ на Re: [GENERAL] import CSV file to a table  (Karl Czajkowski <karlcz@isi.edu>)
Ответы Re: [GENERAL] import CSV file to a table  (Karl Czajkowski <karlcz@isi.edu>)
Список pgsql-general

Since bash has been bandied about in this thread I presume awk is available.  Here's how I would check just how 'csv'ish the incoming file is.

awk -F"," '{a[$NF]++}END{for(i in a){printf "%d lines have %d fields(columns)\n", a[i], i}}' csvfilename

If this doesn't produce one line you have to suspect quoted values including commas (or what ever char you choose). then you need a real csv parser.

If just one line, I'ld use cut to get rid of unwanted columns, then let COPY do it's thing

On 03/08/2017 09:13 AM, Karl Czajkowski wrote:
On Mar 08, John McKown modulated:
...
​I agree. I went with a "pure BASH" approach because it is what the
user asked for & I wasn't sure what language she might be comfortable
with. I use PERL a lot. Or maybe I should say that I abuse PERL a lot.
Such as a PERL script with writes out another PERL script, based on
some input files & parameters, then runs the just written PERL script,
which does the load into a PostgreSQL database (multiple tables). Ya, a
bit perverted.​

Well, you could follow a similar meta-programming/code-generating
pattern to have the BASH script output a single SQL file to run with
psql. You could even generate PL/pgsql code to defer more data
processing to the database itself.

I think the only robust "pure BASH" approach is to use a temporary
table, so you aren't trying to parse CSV content in BASH. Using csvkit
sounds good if you can introduce these third-party dependencies.

With the temporary table, you can use SQL for most validation or data
interrogation, but you need to know at least enough schema information
in advance to form the COPY statement. Parsing the CSV header row to
plan your work puts you right back to requiring a robust CSV parser
unless you can constrain your input scenarios to only handle very
trivial headers.

If you play games with a defaulting serial column and fixed column
names like "id, c1, c2, ..., cN" for the temporary table, you might
use the id column as a DB-assigned "row number" during COPY and
validation. In this case, you could even tell Postgres there is no
header, and then let it parse the header as another data record so you
can use SQL statements to determine the actual header names and
ordering in the input.  But this still requires knowing the column
count in advance of the COPY.

I also think using something like Python with structured data
processing would be wiser, unless you know enough about the schema in
advance to avoid any CSV parsing on the client side.


Karl



В списке pgsql-general по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: [GENERAL] import CSV file to a table
Следующее
От: Karl Czajkowski
Дата:
Сообщение: Re: [GENERAL] import CSV file to a table