Обсуждение: Newbie question about importing text files...
So I've been looking at the documentation for COPY, and I'm curious about a number of features which do not appear to be included, and whether these functions are found someplace else: 1) How do I skip an arbitrary # of "header" lines (e.g. > 1 header line) to begin reading in data? 2) Is it possible to screen out lines which begin with a comment character (common outputs for csv/txt files from various programs)? 3) Is there a way to read in fixed width files? Thanks! --j -- Jonathan A. Greenberg, PhD NRC Research Associate NASA Ames Research Center MS 242-4 Moffett Field, CA 94035-1000 Office: 650-604-5896 Cell: 415-794-5043 AIM: jgrn307 MSN: jgrn307@hotmail.com
On 10/10/06, Jonathan Greenberg <jgreenberg@arc.nasa.gov> wrote: > So I've been looking at the documentation for COPY, and I'm curious about a > number of features which do not appear to be included, and whether these > functions are found someplace else: > 1) How do I skip an arbitrary # of "header" lines (e.g. > 1 header line) to > begin reading in data? if in 'csv' mode, you can set the header flag. if not, standard unix tools fit the bill: cat import.txt | tail -n +2 > output.txt <-- from bash copy table foo from '/home/import.txt'; on windows? get cygwin! or, if you prefer a more sql-ish solution, load your text data into scratch tables (all text fields) as is and do filtering there. this works pretty well actually. copy table foo from '/home/import.txt'; create table bar as select * from foo offset 3; theres a million way to do this, most inolve processing before or after the copy statement, unless you happen to be importing csv (often, but not always works as is) or data generated from postgresql. > 2) Is it possible to screen out lines which begin with a comment character > (common outputs for csv/txt files from various programs)? see notes above. 1. import all data to scratch table 2. use sql alternative: master perl (i prefer sql approach usually, perl scares me!) > 3) Is there a way to read in fixed width files? here again you could load the data into postgresql, one field per row even, and process as such create table import(bigfield text); copy tabe import from 'foo.txt'; create table foo as select substr(bigfield, 1, 3)::int as a, substr(bigfield, 4, 2)::char(2) as b, [...]; voila! merlin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/10/06 01:44, Merlin Moncure wrote: > On 10/10/06, Jonathan Greenberg <jgreenberg@arc.nasa.gov> wrote: >> So I've been looking at the documentation for COPY, and I'm curious >> about a >> number of features which do not appear to be included, and whether these >> functions are found someplace else: > >> 1) How do I skip an arbitrary # of "header" lines (e.g. > 1 header >> line) to >> begin reading in data? > > if in 'csv' mode, you can set the header flag. if not, standard unix > tools fit the bill: > cat import.txt | tail -n +2 > output.txt <-- from bash That's not a very efficient way to handle large data sets. - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFK2HGS9HxQb37XmcRAtxxAKCLKYg/RU23PUbmxQvl8MQVnVw/7ACgtwjj eZJX4zGcck0Zq8INAHXEYlA= =9EwP -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/09/06 22:43, Jonathan Greenberg wrote: > So I've been looking at the documentation for COPY, and I'm curious about a > number of features which do not appear to be included, and whether these > functions are found someplace else: > > 1) How do I skip an arbitrary # of "header" lines (e.g. > 1 header line) to > begin reading in data? > 2) Is it possible to screen out lines which begin with a comment character > (common outputs for csv/txt files from various programs)? > 3) Is there a way to read in fixed width files? Both Python & Perl have CSV parsing modules, and can of course deal with fixed-width data, let you skip comments, commit every N rows, skip over committed records in can the load crashes, etc, etc, etc. Probably not what you wanted to read, though, since performance takes a big hit. But it definitely works... - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFK2UAS9HxQb37XmcRAmOqAJ4um4xLJnKBVQ2LWB1kYYIJyBNHZgCgv/gk griA2wHFOEogj2WToM5mxpc= =gn/u -----END PGP SIGNATURE-----
On Tue, 2006-10-10 at 04:16, Ron Johnson wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 10/09/06 22:43, Jonathan Greenberg wrote: > > So I've been looking at the documentation for COPY, and I'm curious about a > > number of features which do not appear to be included, and whether these > > functions are found someplace else: > > > > 1) How do I skip an arbitrary # of "header" lines (e.g. > 1 header line) to > > begin reading in data? Using something like bash, you can do this: tail -n $(( `wc -l bookability-pg.sql|grep -oP "[0-9]+"` -2 )) bookability-pg.sql|wc -l make it an alias and call it skip and have it take an argument: Put this in .bashrc and run the .bashrc file ( . ~/.bashrc ): skipper(){ tail -n $(( `wc -l $1|grep -oP "[0-9]+"` -$2 )) $1 } > > 2) Is it possible to screen out lines which begin with a comment character > > (common outputs for csv/txt files from various programs)? grep -vP "^#" filename will remove all lines that start with #. grep is your friend in unix. If you don't have unix, get cygwin as recommended elsewhere. > > 3) Is there a way to read in fixed width files? If you don't mind playing about with sed, you could use it and bash scripting to do it. I have before. It's ugly looking but easy enough to do. But I'd recommend a beginner use a scripting language they like, one of the ones that starts with p is usually a good choice (perl, python, php, ruby (wait, that's not a p!) etc...) > > Both Python & Perl have CSV parsing modules, and can of course deal > with fixed-width data, let you skip comments, commit every N rows, > skip over committed records in can the load crashes, etc, etc, etc. php has a fgetcsv() built in as well. It breaks down csv into an array and is really easy to work with.
I made a HUGE mistake, and used 'UK' as the abbreviation for the united kingdom ( the ISO abbv is 'GB' ) I've got a database where 8 tables have an FKEY on a table 'location_country' , using the text 'uk' as the value -- so i've got 9 tables that I need to swap data out on can anyone suggest a non-nightmarish way for me to do this ?
> I made a HUGE mistake, and used 'UK' as the abbreviation for the > united kingdom ( the ISO abbv is 'GB' ) > > I've got a database where 8 tables have an FKEY on a table > 'location_country' , using the text 'uk' as the value -- so i've got > 9 tables that I need to swap data out on > > can anyone suggest a non-nightmarish way for me to do this ? If your tables are setup to "ON UPDATE CASCASE" then you are fine. Just updated the main table and PostgreSQL will take care of the rest. http://www.postgresql.org/docs/8.1/interactive/sql-createtable.html Regards, Richard Broersma Jr.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/11/06 16:10, Richard Broersma Jr wrote: >> I made a HUGE mistake, and used 'UK' as the abbreviation for the >> united kingdom ( the ISO abbv is 'GB' ) >> >> I've got a database where 8 tables have an FKEY on a table >> 'location_country' , using the text 'uk' as the value -- so i've got >> 9 tables that I need to swap data out on >> >> can anyone suggest a non-nightmarish way for me to do this ? > > If your tables are setup to "ON UPDATE CASCASE" then you are fine. > > Just updated the main table and PostgreSQL will take care of the rest. I doesn't appear that ALTER TABLE can change constraint characteristics. You'd have to drop/recreate, no? - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFLXqUS9HxQb37XmcRAhNZAJ9sNEgOoEivv89oVuH5yF7rm71ztQCeM9Cn r6BtOzSoE3+4SxvAlu7teDk= =KxPL -----END PGP SIGNATURE-----
> >> can anyone suggest a non-nightmarish way for me to do this ? > > If your tables are setup to "ON UPDATE CASCASE" then you are fine. > > Just updated the main table and PostgreSQL will take care of the rest. > I doesn't appear that ALTER TABLE can change constraint characteristics. > You'd have to drop/recreate, no? Now that you mention it, I've never tried it or seen it done. Here I what I came up with: CREATE TABLE foo ( id text constraint foo_pri_key Primary Key ); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "foo_pri_key" for table "foo" CREATE TABLE INSERT INTO foo (id) values('hello'); INSERT 0 1 CREATE TABLE bar ( id serial constraint bar_pri_key Primary key, fooid text constraint bar_foo_for_key References foo (id) ); NOTICE: CREATE TABLE will create implicit sequence "bar_id_seq" for serial column "bar.id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "bar_pri_key" for table "bar" CREATE TABLE INSERT INTO bar (fooid) VALUES ('hello'); INSERT 0 1 update foo SET id = 'goodbye'; ERROR: update or delete on "foo" violates foreign key constraint "bar_foo_for_key" on "bar" DETAIL: Key (id)=(hello) is still referenced from table "bar". alter TABLE bar DROP CONSTRAINT bar_foo_for_key; ALTER TABLE ALTER TABLE bar ADD constraint new_bar_foo_for_key foreign key (fooid) references foo (id) on update cascade; ALTER TABLE update foo set id = 'goodbye'; UPDATE 1 select * from bar; id | fooid ----+--------- 1 | goodbye (1 row) It is nice to see things work so well. :-) Regards, Richard Broersma Jr.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/11/06 18:53, Richard Broersma Jr wrote: >>>> can anyone suggest a non-nightmarish way for me to do this >>>> ? >>> If your tables are setup to "ON UPDATE CASCASE" then you are >>> fine. Just updated the main table and PostgreSQL will take >>> care of the rest. > >> I doesn't appear that ALTER TABLE can change constraint >> characteristics. You'd have to drop/recreate, no? > > Now that you mention it, I've never tried it or seen it done. > Here I what I came up with: > [snip] > > > It is nice to see things work so well. :-) It would be interesting to see how well it works on a 50M row table. - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFLYaSS9HxQb37XmcRAmO+AKDH1ILeOKJEYitfY6LUTrRZKjk33wCdGIZW ZN0L1iAJphWTNGlAoz63E3E= =3zQr -----END PGP SIGNATURE-----
> >> I doesn't appear that ALTER TABLE can change constraint > >> characteristics. You'd have to drop/recreate, no? > > > > Now that you mention it, I've never tried it or seen it done. > > Here I what I came up with: > > > [snip] > > > > > > It is nice to see things work so well. :-) > > It would be interesting to see how well it works on a 50M row table. Good point, but hopefully only small fraction of the 50 M rows would be affected and hopefully updating primary keys isn't a common occurrence. Regards, Richard Broersma Jr.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/11/06 19:15, Richard Broersma Jr wrote: >>>> I doesn't appear that ALTER TABLE can change constraint >>>> characteristics. You'd have to drop/recreate, no? >>> Now that you mention it, I've never tried it or seen it done. >>> Here I what I came up with: >>> >> [snip] >>> >>> It is nice to see things work so well. :-) >> It would be interesting to see how well it works on a 50M row >> table. > > Good point, but hopefully only small fraction of the 50 M rows > would be affected and hopefully updating primary keys isn't a > common occurrence. I was thinking of the ALTER TABLE ... ADD CONSTRAINT column. - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFLY2nS9HxQb37XmcRAjYNAKDibxx/tTAXZxg8uCp3jJfaTGeb/ACgqQdT yCvtOQMz0vGak51Gg1sC/OE= =Q+Fz -----END PGP SIGNATURE-----
Jonathan Vanasco wrote: > > I made a HUGE mistake, and used 'UK' as the abbreviation for the > united kingdom ( the ISO abbv is 'GB' ) > > I've got a database where 8 tables have an FKEY on a table > 'location_country' , using the text 'uk' as the value -- so i've got > 9 tables that I need to swap data out on > > can anyone suggest a non-nightmarish way for me to do this ? Umm, I think this should work, & isn't all that bad: insert a 'gb' record in location_country update <each of 8 tables> set country='gb' where country='uk' delete the 'uk' record from location_country Brent Wood