Обсуждение: numbering rows on import from file

Поиск
Список
Период
Сортировка

numbering rows on import from file

От
Alexy Khrabrov
Дата:
Greetings -- I have a huge table of the form  
(integer,integer,smallint,date).  Its origin is an ASCII file which I  
load with \copy.  Now I want to number the rows, adding an id column  
as an autoincrement from a sequence.  How should I do the import now  
for the sequence to work -- should I add the id column last, so it  
will not be filled by copy and presumably autoincrement?

Or, once the table is already in, can I add a column and force it to  
be filled with consecutive numbers, effectively numbering the rows?

Cheers,
Alexy


Re: numbering rows on import from file

От
chester c young
Дата:
--- Alexy Khrabrov <deliverable@gmail.com> wrote:

> Now I want to number the rows, adding an id column 
> as an autoincrement from a sequence.  How should I do the import now 
> for the sequence to work -- should I add the id column last, so it  
> will not be filled by copy and presumably autoincrement?

use a sequence
restart sequence to 1 before copy
have column id default to nextval('seq')

when doing copy don't have a column matching your id (duh)


     ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ


Re: numbering rows on import from file

От
"Scott Marlowe"
Дата:
On Fri, May 2, 2008 at 3:26 PM, Alexy Khrabrov <deliverable@gmail.com> wrote:
> Greetings -- I have a huge table of the form
> (integer,integer,smallint,date).  Its origin is an ASCII file which I load
> with \copy.  Now I want to number the rows, adding an id column as an
> autoincrement from a sequence.  How should I do the import now for the
> sequence to work -- should I add the id column last, so it will not be
> filled by copy and presumably autoincrement?

Add the sequence to the column.  something like this:

create table mytable (id serial primary key, int1 integer, int2
integer, smallint1 smallint, date1 date);
copy mytable (int1,int2, smallint1, date1) from STDIN;
120,2304,4,'2007-01-01'
204,3204,2,'2007-01-02'
(and so on)
\.

with a lot of other dbs, and a lot of languages, you're taught to
perform discrete steps when operating on your data.  Generally
speaking, PostgreSQL is much better at doing the most NOW, not later.
If you've got derived data you want to put into the table, put all the
data into a loading table, and transfer it to the real table with ONE
insert into select from query.

>  Or, once the table is already in, can I add a column and force it to be
> filled with consecutive numbers, effectively numbering the rows?

Bad idea.  As mentioned before, every update or insert, whether it
succeeds or not will create a dead row in the table.  If you update
(or attempt to update) all rows in a 10,000,000 row table three times,
you now have 30,000,000 dead rows in your table.


Re: numbering rows on import from file

От
Steve Crawford
Дата:
Alexy Khrabrov wrote:
> Greetings -- I have a huge table of the form 
> (integer,integer,smallint,date).  Its origin is an ASCII file which I 
> load with \copy.  Now I want to number the rows, adding an id column 
> as an autoincrement from a sequence.  How should I do the import now 
> for the sequence to work -- should I add the id column last, so it 
> will not be filled by copy and presumably autoincrement?
>
> Or, once the table is already in, can I add a column and force it to 
> be filled with consecutive numbers, effectively numbering the rows?
>
Depends on your goal.

Do you just want a unique ID, make the id column a serial. Done. If you 
already have the table, add the column, update the table setting 
id=nextval('the serial columns sequence name') before bringing in 
additional data. If you were using plain \copy, you will now have to 
name the input columns since you don't have data for the id column: 
\copy (int1, int2, smallint3, date4) from ..... This is usually good 
practice anyway as additions to your table or column ordering changes 
won't affect your import.

Do you want the IDs to match row numbers in the source file? Use 
something like nl or whatever scripting language you like to add numbers 
in the source file.

Alternately, if the table is static, you can create a temporary sequence 
to fill the id column on import.

You can also fill the id column after import if necessary by updating 
the table setting id=nextval('yoursequence') but this will generate lots 
of empty space by updating all tuples (not good with a "huge" table) and 
may not associate tuples with source-file line-numbers.

Regardless of how you initially fill the id column, if you add/delete 
records your numbers will not be consecutive. Deletes and rollbacks will 
cause holes.

If you just need consecutive row-numbering on output (not in the table) 
and if the row numbering doesn't need to match the same record each 
time, you can create a temporary sequence and select 
nextval('tempsequence'),.... from yourtable.

Cheers,
Steve