Обсуждение: [GENERAL] import CSV file to a table

Поиск
Список
Период
Сортировка

[GENERAL] import CSV file to a table

От
Günce Kaya
Дата:
Hi all,

I want to import content of CSV file to a table via bash script without creating temporary table and I also want to skip some columns in CSV file (for instance, CSV file has 12 column and main table has only 2 column, If possible I would use only 2 column in CSV file) Is there any way to do it? 

Regards,

--
Gunce Kaya

Re: [GENERAL] import CSV file to a table

От
Vick Khera
Дата:
Since you're using bash, I will assume you are not averse to using a slightly complicated pipeline. First, install this: https://github.com/wireservice/csvkit

Then use that to cut out the columns, you want and pipe the result into psql with an appropriate \copy command.

On Wed, Mar 8, 2017 at 4:13 AM, Günce Kaya <guncekaya14@gmail.com> wrote:
Hi all,

I want to import content of CSV file to a table via bash script without creating temporary table and I also want to skip some columns in CSV file (for instance, CSV file has 12 column and main table has only 2 column, If possible I would use only 2 column in CSV file) Is there any way to do it? 

Regards,

--
Gunce Kaya

Re: [GENERAL] import CSV file to a table

От
John McKown
Дата:
On Wed, Mar 8, 2017 at 3:13 AM, Günce Kaya <guncekaya14@gmail.com> wrote:
Hi all,

I want to import content of CSV file to a table via bash script without creating temporary table and I also want to skip some columns in CSV file (for instance, CSV file has 12 column and main table has only 2 column, If possible I would use only 2 column in CSV file) Is there any way to do it? 

Regards,

--
Gunce Kaya

​Not too difficult, but "unusual". However, there is a restriction that the data cannot have an embedded comma. That is, you can't have something like: "a,b",c and want two columns with a,b and c as the values.

[tsh009@it-johnmckown-linux junk]$ cat ptest.csv 
a,b,c,d,e,f,g,h,i
1,2,3,4,5,6,7,8,9
z,y,x,w,v,u,t,s,r,q

[tsh009@it-johnmckown-linux junk]$ cat ptest.sh 
#!/bin/bash
printf "%s\n" 'COPY schema1.table1(column1from5, column2from7) FROM stdin;'
export IFS=','
while read i;do # read in the records until EOF
        test -n "${i}" && { # ignore blank lines!
                set ${i} # set shell variables $1, $2, ...
                printf "%s\t%s\n" $5 $7 #write out columns 5 & 7
        }
done
printf "%s\n" '\.' # write EOF delimiter for COPY

[tsh009@it-johnmckown-linux junk]$ ./ptest.sh <ptest.csv 
COPY schema1.table1(column1from5, column2from7) FROM stdin;
e       g
5       7
v       t
\.



--
"Irrigation of the land with seawater desalinated by fusion power is ancient. It's called 'rain'." -- Michael McClary, in alt.fusion

Maranatha! <><
John McKown

Re: [GENERAL] import CSV file to a table

От
"Martijn Tonies \(Upscene Productions\)"
Дата:
Hi,
 
If this is a one-time thing, you can use the Import Data tool in Database Workbench, see
 
Hope this helps.
 
With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com

Database Workbench - developer tool for Oracle, MS SQL Server, PostgreSQL,
SQL Anywhere, MySQL, InterBase, NexusDB and Firebird.
 
Sent: Wednesday, March 08, 2017 10:13 AM
Subject: [GENERAL] import CSV file to a table
 
Hi all,
 
I want to import content of CSV file to a table via bash script without creating temporary table and I also want to skip some columns in CSV file (for instance, CSV file has 12 column and main table has only 2 column, If possible I would use only 2 column in CSV file) Is there any way to do it?
 
Regards,
 
--
Gunce Kaya

Re: [GENERAL] import CSV file to a table

От
vinny
Дата:
On 2017-03-08 10:13, Günce Kaya wrote:
> Hi all,
>
> I want to import content of CSV file to a table via bash script
> without creating temporary table and I also want to skip some columns
> in CSV file (for instance, CSV file has 12 column and main table has
> only 2 column, If possible I would use only 2 column in CSV file) Is
> there any way to do it?
>
> Regards,
>
> --
>
> Gunce Kaya

This is more a programming question than a database question, and there
are many possible solutions.
Do *not*, whatever you do, try to write your own piece of code to read
the CSV. There are lots of unexpected
ways that the CSV file can be slightly different from what you expect,
and figuring all those out is a waste of time.
The example of embedded comma's is just one way, there could also be
newlines, linebreaks, utf8-escape characters etc.

Personally I'd go the python route because it's simple and
straightforward, but anything you are comfortable with will do.
If you are going to install additional software to do this then remember
that you'll need that same software again if
you need to do this again, or when you need to move this code to a
different server.


Re: [GENERAL] import CSV file to a table

От
John McKown
Дата:
On Wed, Mar 8, 2017 at 8:45 AM, vinny <vinny@xs4all.nl> wrote:
On 2017-03-08 10:13, Günce Kaya wrote:
Hi all,

I want to import content of CSV file to a table via bash script
without creating temporary table and I also want to skip some columns
in CSV file (for instance, CSV file has 12 column and main table has
only 2 column, If possible I would use only 2 column in CSV file) Is
there any way to do it?

Regards,

--

Gunce Kaya

This is more a programming question than a database question, and there are many possible solutions.
Do *not*, whatever you do, try to write your own piece of code to read the CSV. There are lots of unexpected
ways that the CSV file can be slightly different from what you expect, and figuring all those out is a waste of time.
The example of embedded comma's is just one way, there could also be newlines, linebreaks, utf8-escape characters etc.

Personally I'd go the python route because it's simple and straightforward, but anything you are comfortable with will do.
If you are going to install additional software to do this then remember that you'll need that same software again if
you need to do this again, or when you need to move this code to a different server.

​I agree. I went with a "pure BASH" approach because it is what the user asked for & I wasn't sure what language she might be comfortable with. I use PERL a lot. Or maybe I should say that I abuse PERL a lot. Such as a PERL script with writes out another PERL script, based on some input files & parameters, then runs the just written PERL script, which does the load into a PostgreSQL database (multiple tables). Ya, a bit perverted.​

--
"Irrigation of the land with seawater desalinated by fusion power is ancient. It's called 'rain'." -- Michael McClary, in alt.fusion

Maranatha! <><
John McKown

Re: [GENERAL] import CSV file to a table

От
Karl Czajkowski
Дата:
On Mar 08, John McKown modulated:
...
> ​I agree. I went with a "pure BASH" approach because it is what the
> user asked for & I wasn't sure what language she might be comfortable
> with. I use PERL a lot. Or maybe I should say that I abuse PERL a lot.
> Such as a PERL script with writes out another PERL script, based on
> some input files & parameters, then runs the just written PERL script,
> which does the load into a PostgreSQL database (multiple tables). Ya, a
> bit perverted.​
>

Well, you could follow a similar meta-programming/code-generating
pattern to have the BASH script output a single SQL file to run with
psql. You could even generate PL/pgsql code to defer more data
processing to the database itself.

I think the only robust "pure BASH" approach is to use a temporary
table, so you aren't trying to parse CSV content in BASH. Using csvkit
sounds good if you can introduce these third-party dependencies.

With the temporary table, you can use SQL for most validation or data
interrogation, but you need to know at least enough schema information
in advance to form the COPY statement. Parsing the CSV header row to
plan your work puts you right back to requiring a robust CSV parser
unless you can constrain your input scenarios to only handle very
trivial headers.

If you play games with a defaulting serial column and fixed column
names like "id, c1, c2, ..., cN" for the temporary table, you might
use the id column as a DB-assigned "row number" during COPY and
validation. In this case, you could even tell Postgres there is no
header, and then let it parse the header as another data record so you
can use SQL statements to determine the actual header names and
ordering in the input.  But this still requires knowing the column
count in advance of the COPY.

I also think using something like Python with structured data
processing would be wiser, unless you know enough about the schema in
advance to avoid any CSV parsing on the client side.


Karl


Re: [GENERAL] import CSV file to a table

От
"David G. Johnston"
Дата:
On Wed, Mar 8, 2017 at 9:13 AM, Karl Czajkowski <karlcz@isi.edu> wrote:

With the temporary table, you can use SQL for most validation or data
interrogation, but you need to know at least enough schema information
in advance to form the COPY statement. Parsing the CSV header row to
plan your work puts you right back to requiring a robust CSV parser
unless you can constrain your input scenarios to only handle very
trivial headers.

​You can write the entire contents of the CSV into a psql variable and process the text blob from there using intermediate arrays.

David J.

Re: [GENERAL] import CSV file to a table

От
Rob Sargent
Дата:

Since bash has been bandied about in this thread I presume awk is available.  Here's how I would check just how 'csv'ish the incoming file is.

awk -F"," '{a[$NF]++}END{for(i in a){printf "%d lines have %d fields(columns)\n", a[i], i}}' csvfilename

If this doesn't produce one line you have to suspect quoted values including commas (or what ever char you choose). then you need a real csv parser.

If just one line, I'ld use cut to get rid of unwanted columns, then let COPY do it's thing

On 03/08/2017 09:13 AM, Karl Czajkowski wrote:
On Mar 08, John McKown modulated:
...
​I agree. I went with a "pure BASH" approach because it is what the
user asked for & I wasn't sure what language she might be comfortable
with. I use PERL a lot. Or maybe I should say that I abuse PERL a lot.
Such as a PERL script with writes out another PERL script, based on
some input files & parameters, then runs the just written PERL script,
which does the load into a PostgreSQL database (multiple tables). Ya, a
bit perverted.​

Well, you could follow a similar meta-programming/code-generating
pattern to have the BASH script output a single SQL file to run with
psql. You could even generate PL/pgsql code to defer more data
processing to the database itself.

I think the only robust "pure BASH" approach is to use a temporary
table, so you aren't trying to parse CSV content in BASH. Using csvkit
sounds good if you can introduce these third-party dependencies.

With the temporary table, you can use SQL for most validation or data
interrogation, but you need to know at least enough schema information
in advance to form the COPY statement. Parsing the CSV header row to
plan your work puts you right back to requiring a robust CSV parser
unless you can constrain your input scenarios to only handle very
trivial headers.

If you play games with a defaulting serial column and fixed column
names like "id, c1, c2, ..., cN" for the temporary table, you might
use the id column as a DB-assigned "row number" during COPY and
validation. In this case, you could even tell Postgres there is no
header, and then let it parse the header as another data record so you
can use SQL statements to determine the actual header names and
ordering in the input.  But this still requires knowing the column
count in advance of the COPY.

I also think using something like Python with structured data
processing would be wiser, unless you know enough about the schema in
advance to avoid any CSV parsing on the client side.


Karl



Re: [GENERAL] import CSV file to a table

От
Karl Czajkowski
Дата:
I believe that in its fully glory, you cannot reliably locate CSV
record boundaries except by parsing each field in order including
quote processing.  Individual records may have arbitrary numbers of
field and record separator characters within the values.

Karl


On Mar 08, Rob Sargent modulated:
> Since bash has been bandied about in this thread I presume awk is
> available.  Here's how I would check just how 'csv'ish the incoming
> file is.
> ...


Re: [GENERAL] import CSV file to a table

От
Rob Sargent
Дата:

On 03/08/2017 09:36 AM, Karl Czajkowski wrote:
> I believe that in its fully glory, you cannot reliably locate CSV
> record boundaries except by parsing each field in order including
> quote processing.  Individual records may have arbitrary numbers of
> field and record separator characters within the values.
>
> Karl
>
>
> On Mar 08, Rob Sargent modulated:
>> Since bash has been bandied about in this thread I presume awk is
>> available.  Here's how I would check just how 'csv'ish the incoming
>> file is.
>> ...
Yes Karl, I agree.  I admitted as much.  But if it's clean, as in free
of quoted commas, life is much more simple.  I've lost site of whether
or not the OP knows his situation w.r.t. to this.  The awk line will
tell him and for a one-off load this can make a world of difference in
complexity - two bash lines and a COPY.



Re: [GENERAL] import CSV file to a table

От
Karl Czajkowski
Дата:
On Mar 08, Rob Sargent modulated:

> Yes Karl, I agree.  I admitted as much.  But if it's clean, as in
> free of quoted commas, life is much more simple.  I've lost site of
> whether or not the OP knows his situation w.r.t. to this.  The awk
> line will tell him and for a one-off load this can make a world of
> difference in complexity - two bash lines and a COPY.
>

Maybe I didn't understand your awk? I thought it was counting commas
in lines.  This isn't the same as counting commas in records.

    this,is,record,one
    "this,,","is
    ,,record","two
    ,,,"

this has three commas on each line and definitely is not suitable
for naive CSV handling.


Karl


Re: [GENERAL] import CSV file to a table

От
Bret Stern
Дата:
I'll throw in.

If tab delimited is available, perhaps that option will work better...or..
use Access to find the violations of the quote comma delimited assumptions, then
export from Access an import

Bret


On Wed, 2017-03-08 at 08:36 -0800, Karl Czajkowski wrote:
I believe that in its fully glory, you cannot reliably locate CSV
record boundaries except by parsing each field in order including
quote processing.  Individual records may have arbitrary numbers of
field and record separator characters within the values.

Karl


On Mar 08, Rob Sargent modulated:
> Since bash has been bandied about in this thread I presume awk is
> available.  Here's how I would check just how 'csv'ish the incoming
> file is.
> ...



Re: [GENERAL] import CSV file to a table

От
Rob Sargent
Дата:

On 03/08/2017 09:52 AM, Karl Czajkowski wrote:
> On Mar 08, Rob Sargent modulated:
>
>> Yes Karl, I agree.  I admitted as much.  But if it's clean, as in
>> free of quoted commas, life is much more simple.  I've lost site of
>> whether or not the OP knows his situation w.r.t. to this.  The awk
>> line will tell him and for a one-off load this can make a world of
>> difference in complexity - two bash lines and a COPY.
>>
> Maybe I didn't understand your awk? I thought it was counting commas
> in lines.  This isn't the same as counting commas in records.
>
>      this,is,record,one
>      "this,,","is
>      ,,record","two
>      ,,,"
>
> this has three commas on each line and definitely is not suitable
> for naive CSV handling.
>
>
> Karl
In essence it does count commas but plus one :).  $NF is number of
fields defined by commas so one more field than number of commas. If you
think/hope the file is simple and well formatted, this is a pretty quick
check. But if you're looking for a general solution, you need a real csv
parser.  I recall being quite surprised and amused to learn there is an
actual standard for csv format. (Naturally if you have one to hand, you
don't need the awk line.)


Re: [GENERAL] import CSV file to a table

От
Günce Kaya
Дата:
Hi,

When I open a new thread, I didn't know exactly what is true words to research. I read all of your posts and I think CSV parsing is the point to me. I've created my script and I share it to record. 

#cat cargo2.sh
#!/bin/bash
while IFS=, read uor_desc crime_type zip_code ucr_hierarchy date_reported date_occured
do

echo "select * from insertorders('$uor_desc', '$crime_type', '$zip_code', '$fucr_hierarchy', '$date_reported', '$date_occured');"

done < test.txt;

Thank you for your advices and helps.

Regards,

Gunce

2017-03-08 20:00 GMT+03:00 Rob Sargent <robjsargent@gmail.com>:


On 03/08/2017 09:52 AM, Karl Czajkowski wrote:
On Mar 08, Rob Sargent modulated:

Yes Karl, I agree.  I admitted as much.  But if it's clean, as in
free of quoted commas, life is much more simple.  I've lost site of
whether or not the OP knows his situation w.r.t. to this.  The awk
line will tell him and for a one-off load this can make a world of
difference in complexity - two bash lines and a COPY.

Maybe I didn't understand your awk? I thought it was counting commas
in lines.  This isn't the same as counting commas in records.

     this,is,record,one
     "this,,","is
     ,,record","two
     ,,,"

this has three commas on each line and definitely is not suitable
for naive CSV handling.


Karl
In essence it does count commas but plus one :).  $NF is number of fields defined by commas so one more field than number of commas. If you think/hope the file is simple and well formatted, this is a pretty quick check. But if you're looking for a general solution, you need a real csv parser.  I recall being quite surprised and amused to learn there is an actual standard for csv format. (Naturally if you have one to hand, you don't need the awk line.)



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Gunce Kaya