Обсуждение: Using psql to feed a file line by line to a table column

Поиск
Список
Период
Сортировка

Using psql to feed a file line by line to a table column

От
Alexander Farber
Дата:
Hello,

I have a list of 400000 non-english words,
each on a separate line and in UTF8 format,
which I'd like to put in the "word" column
of the following table (also in UTF8 and 8.4.13):

create table good_words (
        word varchar(64) primary key,
        verified boolean not null default false,
        stamp timestamp default current_timestamp
);

Is there maybe a psql trick for that
(the "psql --help" doesn't mention
an "input field separator" option)?

Or do I have to write a Perl-script for that task?

Thank you
Alex


Re: Using psql to feed a file line by line to a table column

От
Ian Lawrence Barwick
Дата:
2013/3/13 Alexander Farber <alexander.farber@gmail.com>:
> Hello,
>
> I have a list of 400000 non-english words,
> each on a separate line and in UTF8 format,
> which I'd like to put in the "word" column
> of the following table (also in UTF8 and 8.4.13):
>
> create table good_words (
>         word varchar(64) primary key,
>         verified boolean not null default false,
>         stamp timestamp default current_timestamp
> );
>
> Is there maybe a psql trick for that
> (the "psql --help" doesn't mention
> an "input field separator" option)?
>
> Or do I have to write a Perl-script for that task?

This should work from psql:

\copy good_words(word) from '/path/to/file.txt'

HTH

Ian Barwick


Re: Using psql to feed a file line by line to a table column

От
Alexander Farber
Дата:
Unfortunately doesn't work -

On Tue, Mar 12, 2013 at 5:53 PM, Ian Lawrence Barwick <barwick@gmail.com> wrote:
> 2013/3/13 Alexander Farber <alexander.farber@gmail.com>:
>>
>> I have a list of 400000 non-english words,
>> each on a separate line and in UTF8 format,
>> which I'd like to put in the "word" column
>> of the following table (also in UTF8 and 8.4.13):
>>
>> create table good_words (
>>         word varchar(64) primary key,
>>         verified boolean not null default false,
>>         stamp timestamp default current_timestamp
>> );
>>

>
> This should work from psql:
>
> \copy good_words(word) from '/path/to/file.txt'

I try:

bukvy=> \copy bukvy_good_words(word) from WORDS ;
\copy: parse error at ";"
bukvy=> \copy bukvy_good_words(word) from 'WORDS' ;
\copy: parse error at ";"
bukvy=> \copy bukvy_good_words(word) from "WORDS" ;
\copy: parse error at ";"
bukvy=> \copy bukvy_good_words(word) from '/home/afarber/WORDS' ;
\copy: parse error at ";"
bukvy=> \copy bukvy_good_words(word) from "/home/afarber/WORDS" ;
\copy: parse error at ";"

(sorry, lied you about the table name :-)

The file is in the home dir and readable:

# ls -al WORDS
-rw-rw-r-- 1 afarber afarber 8263539 Mar 12  2013 WORDS

Any ideas, what is wrong there for 8.4.13 ?

Thank you
Alex


Re: Using psql to feed a file line by line to a table column

От
Ian Lawrence Barwick
Дата:
2013/3/13 Alexander Farber <alexander.farber@gmail.com>:
> Unfortunately doesn't work -
>
> On Tue, Mar 12, 2013 at 5:53 PM, Ian Lawrence Barwick <barwick@gmail.com> wrote:
>> 2013/3/13 Alexander Farber <alexander.farber@gmail.com>:
>>>
>>> I have a list of 400000 non-english words,
>>> each on a separate line and in UTF8 format,
>>> which I'd like to put in the "word" column
>>> of the following table (also in UTF8 and 8.4.13):
>>>
>>> create table good_words (
>>>         word varchar(64) primary key,
>>>         verified boolean not null default false,
>>>         stamp timestamp default current_timestamp
>>> );
>>>
>
>>
>> This should work from psql:
>>
>> \copy good_words(word) from '/path/to/file.txt'
>
> I try:
>
> bukvy=> \copy bukvy_good_words(word) from WORDS ;
> \copy: parse error at ";"
> bukvy=> \copy bukvy_good_words(word) from 'WORDS' ;
> \copy: parse error at ";"
> bukvy=> \copy bukvy_good_words(word) from "WORDS" ;
> \copy: parse error at ";"
> bukvy=> \copy bukvy_good_words(word) from '/home/afarber/WORDS' ;
> \copy: parse error at ";"
> bukvy=> \copy bukvy_good_words(word) from "/home/afarber/WORDS" ;
> \copy: parse error at ";"
>
> (sorry, lied you about the table name :-)
>
> The file is in the home dir and readable:
>
> # ls -al WORDS
> -rw-rw-r-- 1 afarber afarber 8263539 Mar 12  2013 WORDS
>
> Any ideas, what is wrong there for 8.4.13 ?

Yup:

  \copy: parse error at ";"

which is psql telling you it doesn't like the semicolon. Try
leaving it out...


Re: Using psql to feed a file line by line to a table column

От
Alexander Farber
Дата:
Thank you, this was indeed the
(uneeded) semicolon at end of the COPY line.

May I ask another question -

On Tue, Mar 12, 2013 at 6:24 PM, Ian Lawrence Barwick <barwick@gmail.com> wrote:
>>> 2013/3/13 Alexander Farber <alexander.farber@gmail.com>:
>>>>
>>>> I have a list of 400000 non-english words,
>>>> each on a separate line and in UTF8 format,
>>>> which I'd like to put in the "word" column
>>>> of the following table (also in UTF8 and 8.4.13):
>>>>
>>>> create table good_words (
>>>>         word varchar(64) primary key,
>>>>         verified boolean not null default false,
>>>>         stamp timestamp default current_timestamp
>>>> );

>> bukvy=> \copy good_words(word) from '/home/afarber/WORDS' ;
>> \copy: parse error at ";"


When I add few more words to my text file
and then try to load it into my table again,
then the COPY command will fail,
because of the already stored words:

bukvy=> \copy good_words(word) from WORDS
ERROR:  duplicate key value violates unique constraint "good_words_pkey"
CONTEXT:  COPY good_words, line 1: "абажур"

Can't I change the behaviour to silently
ignore inserting such words?

I also have an INSERT trigger on my table,
can I return a NULL from it or something similar?

Below is my complete code:

create table good_words (
        word varchar(64) primary key,
        letters integer[33],
        verified boolean not null default false,
        stamp timestamp default current_timestamp
);

create or replace function count_letters() returns trigger as $body$
        declare
                alphabet varchar[];
                i integer;
        begin
                alphabet :=
'{А,Б,В,Г,Д,Е,Ё,Ж,З,И,Й,К,Л,М,Н,О,П,Р,С,Т,У,Ф,Х,Ц,Ч,Ш,Щ,Ъ,Ы,Ь,Э,Ю,Я}';

                for i in 1 .. 33 loop
                        -- raise notice '%: %', i, alphabet[i];
                        new.letters[i] := length(new.word) -
length(replace(new.word, alphabet[i], ''));
                end loop;
                return new;
        end;
$body$ language plpgsql;

create trigger count_letters
        before insert on good_words
        for each row execute procedure count_letters();



Regards
Alex


Re: Using psql to feed a file line by line to a table column

От
Ian Lawrence Barwick
Дата:
2013/3/13 Alexander Farber <alexander.farber@gmail.com>:
> Thank you, this was indeed the
> (uneeded) semicolon at end of the COPY line.
>
> May I ask another question -
(...)
> When I add few more words to my text file
> and then try to load it into my table again,
> then the COPY command will fail,
> because of the already stored words:
>
> bukvy=> \copy good_words(word) from WORDS
> ERROR:  duplicate key value violates unique constraint "good_words_pkey"
> CONTEXT:  COPY good_words, line 1: "абажур"
>
> Can't I change the behaviour to silently
> ignore inserting such words?
>
> I also have an INSERT trigger on my table,
> can I return a NULL from it or something similar?

Yes, if you test for the presence of the word you can return NULL
and the row will be discarded. See example below.

Regards

Ian Barwick

testdb=# CREATE TABLE foo (word TEXT NOT NULL PRIMARY KEY);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"foo_pkey" for table "foo"
CREATE TABLE
testdb=#
testdb=# INSERT INTO foo VALUES('bar'),('baz');
INSERT 0 2
testdb=# INSERT INTO foo VALUES('bar');
ERROR:  duplicate key value violates unique constraint "foo_pkey"
DETAIL:  Key (word)=(bar) already exists.


CREATE OR REPLACE FUNCTION foo_check()
  RETURNS TRIGGER
  LANGUAGE 'plpgsql'
AS
$$
  BEGIN
    PERFORM TRUE
      FROM foo
     WHERE word = NEW.word;
    IF FOUND THEN
      RETURN NULL;
    END IF;
    RETURN NEW;
  END;
$$;

CREATE TRIGGER tr_foo_check
  BEFORE INSERT ON foo
  FOR EACH ROW EXECUTE PROCEDURE foo_check();

testdb=# INSERT INTO foo VALUES('bar');
INSERT 0 0


Re: Using psql to feed a file line by line to a table column

От
Alexander Farber
Дата:
Thank you -

On Thu, Mar 14, 2013 at 10:40 AM, Ian Lawrence Barwick
<barwick@gmail.com> wrote:
>> I also have an INSERT trigger on my table,
>> can I return a NULL from it or something similar?
>
> Yes, if you test for the presence of the word you can return NULL
> and the row will be discarded. See example below.
>
>
> testdb=# CREATE TABLE foo (word TEXT NOT NULL PRIMARY KEY);
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> "foo_pkey" for table "foo"
> CREATE TABLE
> testdb=#
> testdb=# INSERT INTO foo VALUES('bar'),('baz');
> INSERT 0 2
> testdb=# INSERT INTO foo VALUES('bar');
> ERROR:  duplicate key value violates unique constraint "foo_pkey"
> DETAIL:  Key (word)=(bar) already exists.
>
>
> CREATE OR REPLACE FUNCTION foo_check()
>   RETURNS TRIGGER
>   LANGUAGE 'plpgsql'
> AS
> $$
>   BEGIN
>     PERFORM TRUE
>       FROM foo
>      WHERE word = NEW.word;
>     IF FOUND THEN
>       RETURN NULL;
>     END IF;
>     RETURN NEW;
>   END;
> $$;
>
> CREATE TRIGGER tr_foo_check
>   BEFORE INSERT ON foo
>   FOR EACH ROW EXECUTE PROCEDURE foo_check();
>
> testdb=# INSERT INTO foo VALUES('bar');
> INSERT 0 0

so the return value of an insert trigger
is actually what get's inserted?

And it has to be an BEFORE trigger?

Regards
Alex


Re: Using psql to feed a file line by line to a table column

От
Ian Lawrence Barwick
Дата:
2013/3/14 Alexander Farber <alexander.farber@gmail.com>:
> Thank you -
>
> On Thu, Mar 14, 2013 at 10:40 AM, Ian Lawrence Barwick
> <barwick@gmail.com> wrote:
>>> I also have an INSERT trigger on my table,
>>> can I return a NULL from it or something similar?
>>
>> Yes, if you test for the presence of the word you can return NULL
>> and the row will be discarded. See example below.
>>
>>
>> testdb=# CREATE TABLE foo (word TEXT NOT NULL PRIMARY KEY);
>> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
>> "foo_pkey" for table "foo"
>> CREATE TABLE
>> testdb=#
>> testdb=# INSERT INTO foo VALUES('bar'),('baz');
>> INSERT 0 2
>> testdb=# INSERT INTO foo VALUES('bar');
>> ERROR:  duplicate key value violates unique constraint "foo_pkey"
>> DETAIL:  Key (word)=(bar) already exists.
>>
>>
>> CREATE OR REPLACE FUNCTION foo_check()
>>   RETURNS TRIGGER
>>   LANGUAGE 'plpgsql'
>> AS
>> $$
>>   BEGIN
>>     PERFORM TRUE
>>       FROM foo
>>      WHERE word = NEW.word;
>>     IF FOUND THEN
>>       RETURN NULL;
>>     END IF;
>>     RETURN NEW;
>>   END;
>> $$;
>>
>> CREATE TRIGGER tr_foo_check
>>   BEFORE INSERT ON foo
>>   FOR EACH ROW EXECUTE PROCEDURE foo_check();
>>
>> testdb=# INSERT INTO foo VALUES('bar');
>> INSERT 0 0
>
> so the return value of an insert trigger
> is actually what get's inserted?

Yup, normally that would be the contents of the NEW record.

> And it has to be an BEFORE trigger?

Yes, because an AFTER trigger is fired after the row is updated, so
the row can't be changed.

Regards

Ian Barwick