I'm trying to update a table but it's taking a very long time. I would appreciate any tips folks may have about ways to speed it up.
The table is paprospect2, as below:
Column | Type | Modifiers
pfeature_id | integer | not null default nextval('unison.pfeature_pfeature_id_seq'::text)pseq_id | integer | not nullpftype_id | integer | not nullstart | integer |stop | integer |confidence | real |run_id | integer | not null[snip 13 integer and real columns]
run_id_new | integer |
Indexes: paprospect2_redundant_alignment unique btree (pseq_id, "start", stop, run_id, pmodel_id),
p2thread_p2params_id btree (run_id),
p2thread_pmodel_id btree (pmodel_id)
Foreign Key constraints: pftype_id_exists FOREIGN KEY (pftype_id) REFERENCES pftype(pftype_id) ON UPDATE CASCADE ON DELETE CASCADE,
p2thread_pmodel_id_exists FOREIGN KEY (pmodel_id) REFERENCES pmprospect2(pmodel_id) ON UPDATE CASCADE ON DELETE CASCADE,
pseq_id_exists FOREIGN KEY (pseq_id) REFERENCES pseq(pseq_id) ON UPDATE CASCADE ON DELETE CASCADE
The columns pfeature_id..confidence and run_id_new (in red) are from an inherited table. Although the inheritance itself is probably not relevant here (correction welcome), I suspect it may be relevant that all existing rows were written before the table definition included run_id_new. p2thread_i_trigger is defined fires on insert only (not update).
paprospect2 contains ~40M rows. The goal now is to migrate the data to the supertable-inherited column with
update paprospect2 set run_id_new=run_id;
The update's been running for 5 hours (unloaded dual 2.4 GHz Xeon w/2GB RAM, SCSI160 10K drive). There are no other jobs running. Load is ~1.2 and the update's using ~3-5% of the CPU.
Presumably the large number of blocks written (bo) versus blocks read (bi) reflects an enormous amount of bookkeeping that has to be done for MVCC, logging, perhaps rewriting a row for the new definition (a guess -- I don't know how this is handled), indicies, etc. There's no swapping and no processes are waiting. In short, it seems that this is ENTIRELY an I/O issue. Obviously, faster drives will help (but probably only by small factor).
Any ideas how I might speed this up? Presumably this is all getting wrapped in a transaction -- does that hurt me for such a large update?
Bonus diversionary topic: In case it's not obvious, the motivation for this is that the subtable (paprospect2) contains a column (run_id) whose definition I would like to migrate to the inherited table (i.e., the 'super-table'). Although postgresql permits adding a column to a supertable with the same name as an extant column in a subtable, it appears that such "merged definition" columns do not have the same properties as a typical inherited column. In particular, dropping the column from the supertable does not drop it from the subtable (but renaming it does change both names). Hmm.
Reece Hart, Ph.D. , http://www.gene.com/
Genentech, Inc. 650/225-6133 (voice), -5389 (fax)
Bioinformatics and Protein Engineering
1 DNA Way, MS-93 http://www.in-machina.com/~reece/
South San Francisco, CA 94080-4990 , GPG: 0x25EC91A0