Scott,
> > This only works up to the limit of the memory you have available for
> > Postgres. If the updates in one transaction exceed your available
> > memory, you'll see a lot of swaps to disk log that will slow things
> > down by a factor of 10-50 times.
>
> Sorry, but that isn't true. MVCC means we don't have to hold all the data
> in memory, we can have multiple versions of the same tuples on disk, and
> use memory for what it's meant for, buffering.
Sorry, you're absolutely correct. I don't know what I was thinking of; 's the
problem with an off-the-cuff response.
Please disregard the previous quote. Instead:
Doing several large updates in a single transaction can lower performance if
the number of updates is sufficient to affect index usability and a VACUUM is
really needed between them. For example, a series of large data
transformation statements on a single table or set of related tables should
have VACCUUM statements between them, thus preventing you from putting them
in a single transaction.
Example, the series:
1. INSERT 10,000 ROWS INTO table_a;
2. UPDATE 100,000 ROWS IN table_a WHERE table_b;
3. UPDATE 100,000 ROWS IN table_c WHERE table_a;
WIll almost certainly need a VACUUM or even VACUUM FULL table_a after 2),
requiring you to split the update series into 2 transactions. Otherwise, the
"where table_a" condition in step 3) will be extremely slow.
> Also note that many folks have replaced foreign keys with triggers and
> gained in performance, as fks in pgsql still have some deadlock issues to
> be worked out.
Yeah. I think Neil Conway is overhauling FKs, which everyone considers a bit
of a hack in the current implementation, including Jan who wrote it.
> > It can be dangerous though ... in the event of a power outage, for
> > example, your database could be corrupted and difficult to recover. So
> > ... "at your own risk".
>
> No, the database will not be corrupted, at least not in my experience.
> however, you MAY lose data from transactions that you thought were
> committed. I think Tom posted something about this a few days back.
Hmmm ... have you done this? I'd like the performance gain, but I don't want
to risk my data integrity. I've seen some awful things in databases (such as
duplicate primary keys) from yanking a power cord repeatedly.
> update table set field1=field1+1
>
> are killers in an MVCC database as well.
Yeah -- don't I know it.
--
-Josh Berkus
Aglio Database Solutions
San Francisco