Re: Update table performance

Поиск
Список
Период
Сортировка
От Erik Jones
Тема Re: Update table performance
Дата
Msg-id 1C6EEA75-603F-4B02-8127-920F17AC5411@myemma.com
обсуждение исходный текст
Ответ на Re: Update table performance  (Mark Makarowsky <bedrockconstruction@yahoo.com>)
Ответы Re: Update table performance  (Heikki Linnakangas <heikki@enterprisedb.com>)
Re: Update table performance  (Decibel! <decibel@decibel.org>)
Список pgsql-performance
On Aug 7, 2007, at 6:13 PM, Mark Makarowsky wrote:

> Can you provide more detail on what you mean by your
> two suggestions below:
>
> Yeah, I've used "vertical partitioning" very
> successfully in the past, though I've never done it
> for just a single field. I'll typically leave the few
> most common fields in the "main" table and pull
> everything else into a second table.

Vertical partitioning is where you split up your table on disk by
columns, i.e on the vertical lines.  He quoted it because Postgres
doesn't actually support it transparently but you can always fake it
by splitting up your table.  For example, given the following table
wherein column bar gets updated a lot but the others don't:

create table foo (
id    int     not null,
bar    int,
baz     int,

primary key (id)
);

You could split it up like so:

create table foo_a (
id     int,
baz    int,

primary key (id)
);

create table foo_b (
foo_id    int,
bar        int,

foreign key foo_a_id (foo_id) references foo_a (id)
);

The reason you'd ever want to do this is that when Postgres goes to
update a row what it actually does is inserts a new row with the new
value(s) that you changed and marks the old one as deleted.  So, if
you have a wide table and frequently update only certain columns,
you'll take a performance hit as you're having to re-write a lot of
static values.

>
> I should mention that if you can handle splitting the
> update into multiple transactions, that will help a
> lot since it means you won't be doubling the size of
> the table.

As I mentioned above, when you do an update you're actually inserting
a new row and deleting the old one.  That deleted row is still
considered part of the table (for reasons of concurrency, read up on
the concurrency chapter in the manual for the details) and once it is
no longer visible by any live transactions can be re-used by future
inserts.  So, if you update one column on every row of a one million
row table all at once, you have to allocate and write out one million
new rows.  But, if you do the update a quarter million at a time, the
last three updates would be able to re-use many of the rows deleted
in earlier updates.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



В списке pgsql-performance по дате отправления:

Предыдущее
От: Mark Makarowsky
Дата:
Сообщение: Re: Update table performance
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Update table performance