Обсуждение: tune postgres for UPDATE

Поиск
Список
Период
Сортировка

tune postgres for UPDATE

От
Sebastian Böhm
Дата:
Hi,

I have a table with a lot of columns (text and integer).

It currently has 3Mio Rows.

Updating a column in all rows (integer) takes endless (days).

The column I update is not indexed.

How can I tune postgres to do this much more quickly?

VMstat looks like this:
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
sy id wa
  0  1    188  14160  16080 867064    0    0   880   888  168  479  1
2  0 97
  1  1    188  15288  16080 865980    0    0   832   512  152  474  7
2  0 91
  0  1    188  15464  16080 865348    0    0   872   592  144  461  2
1  0 97

so mostly iowait.

iostat shows about  10000 block writes per second.

My systems is debian-lenny (postgresql 8.3.5)

I already increased checkpoint_segments to 32, shared_buffers to 200MB

I also tried do disable autovacuum


here is a sample statement:

update users set price = (select price from prices where type =
'normal_price' and currency = users.currency)

(the table price only has 30 rows)

thank you very much!
sebastian


Re: tune postgres for UPDATE

От
"Grzegorz Jaśkiewicz"
Дата:
try rewriting it to something like:

update users set price = p.price from prices p where p.type =
'normal_price' and p.currency = users.currency;

Re: tune postgres for UPDATE

От
"Filip Rembiałkowski"
Дата:


2008/12/8 Grzegorz Jaśkiewicz <gryzman@gmail.com>
try rewriting it to something like:

update users set price = p.price from prices p where p.type =
'normal_price' and p.currency = users.currency;

also avoid "fake" updates:

update users set price = p.price from prices p where p.type =
'normal_price' and p.currency = users.currency
and users.price is distinct from p.price;



If price change is a frequent operation, rethink the design
- maybe you could keep a pointer to "pricing group" instead of keeping separate price for every user.




--
Filip Rembiałkowski

Re: tune postgres for UPDATE

От
"Scott Marlowe"
Дата:
On Mon, Dec 8, 2008 at 3:28 AM, Sebastian Böhm <seb@exse.net> wrote:
> Hi,
>
> I have a table with a lot of columns (text and integer).
>
> It currently has 3Mio Rows.
>
> Updating a column in all rows (integer) takes endless (days).

I'm afraid you may not understand how postgresql's MVCC implementation
works here.  Updating a row creates a new copy of the row and leaves
the old copy in place.  Running such an update several times in a row
can result in a table that is mostly dead space and very slow to
access, both for reads and writes.

What does vacuum verbose tablename say about your table?

Is there a valid reason you're updating every row?  Do they all really
need to change?

> How can I tune postgres to do this much more quickly?

Get a faster hard drive.

> VMstat looks like this:
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  0  1    188  14160  16080 867064    0    0   880   888  168  479  1  2  0
> 97
>  1  1    188  15288  16080 865980    0    0   832   512  152  474  7  2  0
> 91
>  0  1    188  15464  16080 865348    0    0   872   592  144  461  2  1  0
> 97

Wow, that's a REALLY REALLY slow drive subsystem.  Here's the numbers
from my laptop while updating a similar table, with 1.2 million rows
(update table xxx set y=y+1 kinda query):

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 2  0  43124  30344  79804 2709708    0    0    16 21156  913 2456 22  4 38 36
 0  3  43124  26472  79808 2713384    0    0    80 20232  725 2163 22  2 44 32
 0  2  43124  25656  79508 2714084    0    0   148 24200  706 2187 31  4 36 29
 0  2  43124  29336  79400 2710700    0    0     0 23616  788 2577 36  5 33 26

Note that I'm writing out at 20+megs a second, you're not even hitting
1Meg.  I've got pretty slow USB memory sticks that hit 8 to 10 megs a
second.

> so mostly iowait.
>
> iostat shows about  10000 block writes per second.

Then either iostat or vmstat are lying to you.  10000 1k blocks per
second is about 10 times as fast as we're seeing in vmstat.

>
> My systems is debian-lenny (postgresql 8.3.5)
>
> I already increased checkpoint_segments to 32, shared_buffers to 200MB
>
> I also tried do disable autovacuum

Probably not your best move.  it's there for a good reason.  You can
tune it to make it more or less aggresive, but this kind of update is
likely causing plenty of bloating and turning off autovacuum is likely
counterproductive.

>
>
> here is a sample statement:
>
> update users set price = (select price from prices where type =
> 'normal_price' and currency = users.currency)

Any way to make that selective so it only updates the prices that need
to be updated?

> (the table price only has 30 rows)

Then why don't you just FK to point to it instead of this?

Re: tune postgres for UPDATE

От
"Albe Laurenz"
Дата:
Scott Marlowe wrote:
> Sebastian Böhm wrote:
> > I have a table with a lot of columns (text and integer).
> >
> > It currently has 3Mio Rows.
> >
> > Updating a column in all rows (integer) takes endless (days).
>
> I'm afraid you may not understand how postgresql's MVCC implementation
> works here.  Updating a row creates a new copy of the row and leaves
> the old copy in place.  Running such an update several times in a row
> can result in a table that is mostly dead space and very slow to
> access, both for reads and writes.

As far as I know, that problem can be alleviated to some extent by using
PostgreSQL 8.3 and creating the table with a fillfactor substantially less
than 100.

Then free space is left in database blocks during insert which can be
used for later updates. This reduces the number of blocks accessed per
update and also the number of index updates if the changed column is not
indexed.

Moreover, row pruning can kick in if the row is updated more than once,
reducing the amount of dead space.

Yours,
Laurenz Albe

Re: tune postgres for UPDATE

От
"Scott Marlowe"
Дата:
On Tue, Dec 9, 2008 at 3:17 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
> Scott Marlowe wrote:
>> Sebastian Böhm wrote:
>> > I have a table with a lot of columns (text and integer).
>> >
>> > It currently has 3Mio Rows.
>> >
>> > Updating a column in all rows (integer) takes endless (days).
>>
>> I'm afraid you may not understand how postgresql's MVCC implementation
>> works here.  Updating a row creates a new copy of the row and leaves
>> the old copy in place.  Running such an update several times in a row
>> can result in a table that is mostly dead space and very slow to
>> access, both for reads and writes.
>
> As far as I know, that problem can be alleviated to some extent by using
> PostgreSQL 8.3 and creating the table with a fillfactor substantially less
> than 100.
>
> Then free space is left in database blocks during insert which can be
> used for later updates. This reduces the number of blocks accessed per
> update and also the number of index updates if the changed column is not
> indexed.
>
> Moreover, row pruning can kick in if the row is updated more than once,
> reducing the amount of dead space.

I'm pretty sure you'd have to vacuum still in between runs or the
extra fill factor space would only get used the first time.  I.e.:

create table fill factor 50%
load data into table
update whole table -- 50% free space gets used.
(should vacuum here but didn't)
update whole table -- boom, new tuples are added onto the end of the table.

What I don't know is if the new tuples added at the end of the table
will have a fill factor of 50%.  I'd expect so, in which case it might
help a bit.

Re: tune postgres for UPDATE

От
"Pavan Deolasee"
Дата:


On Tue, Dec 9, 2008 at 9:13 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:


I'm pretty sure you'd have to vacuum still in between runs or the
extra fill factor space would only get used the first time.  I.e.:

create table fill factor 50%
load data into table
update whole table -- 50% free space gets used.
(should vacuum here but didn't)
update whole table -- boom, new tuples are added onto the end of the table.


Not really with 8.3. Assuming there are no long running transactions covering both the updates, HOT will free up the space used by the old (now DEAD) tuples and the second update will reuse that, at least for for the most common cases.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com