Обсуждение: copy command

Поиск
Список
Период
Сортировка

copy command

От
"Zhang, Anna"
Дата:
I have a problem to load data to postgres database using copy command. The
problem is that we have one column called address which is multi-line text,
the taxt file looks like this:

aab770|awkc.com administration|sultan 23
Bogota, na00000|CO

above shows one record with '|' as delimiters. Column 3 has two lines which
caused the record is splited into two lines. If I use copy command to load
this record, it is always treated as two records. Is there any way to tell
postgres to read until reach a specified char? not stop at '\n'?
Or maybe edit the text file to make it one line, but how?  We have milions
of such records in the text file.
Anyone can provide a solution, thanks a million!

Anna Zhang


Re: copy command

От
Isabelle Brette
Дата:
Hello,

On Thu, Jan 03, 2002 at 12:10:01PM -0500, Zhang, Anna wrote:
> I have a problem to load data to postgres database using copy command. The
> problem is that we have one column called address which is multi-line text,
> the taxt file looks like this:
>
> aab770|awkc.com administration|sultan 23
> Bogota, na00000|CO
>
> above shows one record with '|' as delimiters. Column 3 has two lines which
> caused the record is splited into two lines. If I use copy command to load
> this record, it is always treated as two records. Is there any way to tell
> postgres to read until reach a specified char? not stop at '\n'?
> Or maybe edit the text file to make it one line, but how?  We have milions
> of such records in the text file.

You can use some powerful editor, such as VI, to recognize any line that
does not begin with a proper pattern, and join it with the one before.
Not to lose the line feed you'll probably have to add something to
replace it (such as, let's say, a tab) and convert it back once you've
imported (postgres has some nice pattern-matching features).

I've done this before to prepare an import (through a perl script,
because I needed the information previously in one table to be split
within several ones, but this does not make a major difference, because
a perl script only reads one line at a time), the condition is that the
beginning of each line is easily recognizable. If your first column is
_not_ easily recognizable, maybe you can find a pattern showing a line
is incomplete and join with the next line (bad number of |, for
example).

Hope I've been clear enough (I'm not that good at explaining things in
English) and that it helps.

--
Isabelle Brette - isabelle@apartia.fr

Re: copy command

От
DHSC Webmaster
Дата:

Isabelle Brette wrote:
>
> Hello,
>
> On Thu, Jan 03, 2002 at 12:10:01PM -0500, Zhang, Anna wrote:
> > I have a problem to load data to postgres database using copy command. The
> > problem is that we have one column called address which is multi-line text,
> > the taxt file looks like this:
> >
> > aab770|awkc.com administration|sultan 23
> > Bogota, na00000|CO
> >
> > above shows one record with '|' as delimiters. Column 3 has two lines which
> > caused the record is splited into two lines. If I use copy command to load
> > this record, it is always treated as two records. Is there any way to tell
> > postgres to read until reach a specified char? not stop at '\n'?
> > Or maybe edit the text file to make it one line, but how?  We have milions
> > of such records in the text file.
>
> You can use some powerful editor, such as VI, to recognize any line that
> does not begin with a proper pattern, and join it with the one before.
> Not to lose the line feed you'll probably have to add something to
> replace it (such as, let's say, a tab) and convert it back once you've
> imported (postgres has some nice pattern-matching features).
>
> I've done this before to prepare an import (through a perl script,
> because I needed the information previously in one table to be split
> within several ones, but this does not make a major difference, because
> a perl script only reads one line at a time), the condition is that the
> beginning of each line is easily recognizable. If your first column is
> _not_ easily recognizable, maybe you can find a pattern showing a line
> is incomplete and join with the next line (bad number of |, for
> example).
>
> Hope I've been clear enough (I'm not that good at explaining things in
> English) and that it helps.
>
> --
> Isabelle Brette - isabelle@apartia.fr
>

If you use a perl script to 'count' your columns, you can escape your
newlines with a \ in your addresses and postgres will transparently
convert them to newlines in the data. By doing this it will process more
than one line in your text file as a row.

network=# create table tmp2 (id int, name text, address text);
CREATE
network=# insert into tmp2 (id,name,address) values(100,'bill','456
harmony ln.
network'# hickup, NH');
INSERT 86195110 1
network=# select * from tmp2;
 id  | name |          address
-----+------+----------------------------
 100 | bill | 456 harmony ln.
hickup, NH
(1 row)

Insert another row.

network=# insert into tmp2 (id,name,address)values(200,'Sam','Is
linebreak here?\newline');
INSERT 86195203 1
network=# select * from tmp2;
 id  | name |           address
-----+------+-----------------------------
 100 | bill | 456 harmony ln.
hickup, NH
 200 | Sam  | Is linebreak here?
ewline
(3 rows)

Here is the export file I copied out from tmp2
100|bill|456 harmony ln.\
hickup, NH
200|Sam|Is linebreak here?\
ewline

This file imported correctly to another table with the newlines embedded
in the data.

> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html

--
Bill MacArthur
Webmaster
DHS Club