Обсуждение: how to efficiently update tuple in many-to-many relationship?

От:
Drew Wilson
Дата:

I have 2 tables (A,B) joined in a many-to-many relationship via a
join table ("membership"), where updating table A based on table B
takes a very long time.

Tables A and B have oid primary keys (a_id and b_id respectively).
The join table, "membership", has its own oid primary key
(membership_id), as well as foreign keys "a_id" and "b_id".

A SELECT query across all 3 tables takes 12 seconds.
"SELECT count(*) FROM a JOIN membership USING(a_id) JOIN b USING
(b_id) WHERE b.is_public = true"

But a simple UPDATE using the same SELECT query takes 30 minutes to
an hour.
"UPDATE A set is_public=true WHERE a_id IN (SELECT count(*) FROM a
JOIN membership USING(a_id) JOIN b USING(b_id) WHERE b.is_public =
true)".

What am I doing wrong here? I'm not sure how to diagnose this further.

Here's the output from explain:
db=# EXPLAIN SELECT a_id  FROM a JOIN membership USING(a_id) JOIN b
USING(b_id) WHERE b.is_public = true;
------------------------------------------------------------------------
-----------------------------------------
Hash Join  (cost=167154.78..173749.48 rows=51345 width=4)
    Hash Cond: (a.a_id = membership.a_id)
    ->  Function Scan on a  (cost=0.00..12.50 rows=1000 width=4)
    ->  Hash  (cost=144406.48..144406.48 rows=1819864 width=4)
          ->  Hash Join  (cost=417.91..144406.48 rows=1819864 width=4)
                Hash Cond: (membership.b_id = b.b_id)
                ->  Seq Scan on membership  (cost=0.00..83623.83
rows=4818983 width=8)
                ->  Hash  (cost=348.52..348.52 rows=5551 width=4)
                      ->  Index Scan using b_is_public on b
(cost=0.00..348.52 rows=5551 width=4)
                            Index Cond: (is_public = true)
                            Filter: is_public
(11 rows)


db=# EXPLAIN UPDATE a SET is_public = true WHERE a_id IN
         ( SELECT a_id FROM a JOIN membership USING(a_id) JOIN b USING
(b_id) WHERE b.is_public = true);
------------------------------------------------------------------------
-----------------------------------------
hash in join  (cost=281680.17..370835.63 rows=1819864 width=90)
    hash cond: (public.a.a_id = public.a.a_id)
    ->  seq scan on a  (cost=0.00..47362.09 rows=2097309 width=90)
    ->  hash  (cost=258931.87..258931.87 rows=1819864 width=8)
          ->  hash join  (cost=73996.36..258931.87 rows=1819864 width=8)
                hash cond: (membership.a_id = public.a.a_id)
                ->  hash join  (cost=417.91..144406.48 rows=1819864
width=4)
                      hash cond: (membership.b_id = b.b_id)
                      ->  seq scan on membership
(cost=0.00..83623.83 rows=4818983 width=8)
                      ->  hash  (cost=348.52..348.52 rows=5551 width=4)
                            ->  index scan using
loc_submission_is_public on b  (cost=0.00..348.52 rows=5551 width=4)
                                  index cond: (is_public = true)
                                  filter: is_public
                ->  hash  (cost=47362.09..47362.09 rows=2097309 width=4)
                      ->  seq scan on a  (cost=0.00..47362.09
rows=2097309 width=4)

Thanks,

Drew

От:
"Merlin Moncure"
Дата:

On 4/9/07, Drew Wilson <> wrote:
> I have 2 tables (A,B) joined in a many-to-many relationship via a
> join table ("membership"), where updating table A based on table B
> takes a very long time.
>
> Tables A and B have oid primary keys (a_id and b_id respectively).
> The join table, "membership", has its own oid primary key
> (membership_id), as well as foreign keys "a_id" and "b_id".
>
> A SELECT query across all 3 tables takes 12 seconds.
> "SELECT count(*) FROM a JOIN membership USING(a_id) JOIN b USING
> (b_id) WHERE b.is_public = true"
>
> But a simple UPDATE using the same SELECT query takes 30 minutes to
> an hour.
> "UPDATE A set is_public=true WHERE a_id IN (SELECT count(*) FROM a
> JOIN membership USING(a_id) JOIN b USING(b_id) WHERE b.is_public =
> true)".
>
> What am I doing wrong here? I'm not sure how to diagnose this further.
>
> Here's the output from explain:
> db=# EXPLAIN SELECT a_id  FROM a JOIN membership USING(a_id) JOIN b
> USING(b_id) WHERE b.is_public = true;
> ------------------------------------------------------------------------
> -----------------------------------------
> Hash Join  (cost=167154.78..173749.48 rows=51345 width=4)
>     Hash Cond: (a.a_id = membership.a_id)
>     ->  Function Scan on a  (cost=0.00..12.50 rows=1000 width=4)
>     ->  Hash  (cost=144406.48..144406.48 rows=1819864 width=4)
>           ->  Hash Join  (cost=417.91..144406.48 rows=1819864 width=4)
>                 Hash Cond: (membership.b_id = b.b_id)
>                 ->  Seq Scan on membership  (cost=0.00..83623.83
> rows=4818983 width=8)
>                 ->  Hash  (cost=348.52..348.52 rows=5551 width=4)
>                       ->  Index Scan using b_is_public on b
> (cost=0.00..348.52 rows=5551 width=4)
>                             Index Cond: (is_public = true)
>                             Filter: is_public
> (11 rows)
>
>
> db=# EXPLAIN UPDATE a SET is_public = true WHERE a_id IN
>          ( SELECT a_id FROM a JOIN membership USING(a_id) JOIN b USING
> (b_id) WHERE b.is_public = true);
> ------------------------------------------------------------------------
> -----------------------------------------
> hash in join  (cost=281680.17..370835.63 rows=1819864 width=90)
>     hash cond: (public.a.a_id = public.a.a_id)
>     ->  seq scan on a  (cost=0.00..47362.09 rows=2097309 width=90)
>     ->  hash  (cost=258931.87..258931.87 rows=1819864 width=8)
>           ->  hash join  (cost=73996.36..258931.87 rows=1819864 width=8)
>                 hash cond: (membership.a_id = public.a.a_id)
>                 ->  hash join  (cost=417.91..144406.48 rows=1819864
> width=4)
>                       hash cond: (membership.b_id = b.b_id)
>                       ->  seq scan on membership
> (cost=0.00..83623.83 rows=4818983 width=8)
>                       ->  hash  (cost=348.52..348.52 rows=5551 width=4)
>                             ->  index scan using
> loc_submission_is_public on b  (cost=0.00..348.52 rows=5551 width=4)
>                                   index cond: (is_public = true)
>                                   filter: is_public
>                 ->  hash  (cost=47362.09..47362.09 rows=2097309 width=4)
>                       ->  seq scan on a  (cost=0.00..47362.09
> rows=2097309 width=4)


why don't you rewrite your update statement to use joins (joins >
where exists > where in)?

WHERE a_id IN (SELECT count(*) FROM a
the above looks wrong maybe?

merlin

От:
Tom Lane
Дата:

Drew Wilson <> writes:
> I have 2 tables (A,B) joined in a many-to-many relationship via a
> join table ("membership"), where updating table A based on table B
> takes a very long time.
> ...
>     ->  Function Scan on a  (cost=0.00..12.50 rows=1000 width=4)

I think you've left out some relevant details ... there's nothing
in what you said about a set-returning function ...

            regards, tom lane

От:
Drew Wilson
Дата:

My apologies. That function call was some test code to verify that my
subselect was only being called once.

Let me try again, please.

Here's the query plan for a SELECT statement that returns 1,207,161
rows in 6 seconds.
MatchBox=# explain select count(translation_pair_id) from
translation_pair_data
         join instance i using(translation_pair_id)
         join loc_submission ls using(loc_submission_id)
         where ls.is_public = true;
                                                           QUERY PLAN
------------------------------------------------------------------------
-----------------------------------
Aggregate  (cost=299276.72..299276.73 rows=1 width=4)
    ->  Hash Join  (cost=59962.72..294036.83 rows=2095954 width=4)
          Hash Cond: (i.translation_pair_id =
translation_pair_data.translation_pair_id)
          ->  Hash Join  (cost=369.15..177405.01 rows=2095954 width=4)
                Hash Cond: (i.loc_submission_id = ls.loc_submission_id)
                ->  Seq Scan on instance i  (cost=0.00..99016.16
rows=5706016 width=8)
                ->  Hash  (cost=296.92..296.92 rows=5778 width=4)
                      ->  Index Scan using loc_submission_is_public
on loc_submission ls  (cost=0.00..296.92 rows=5778 width=4)
                            Index Cond: (is_public = true)
                            Filter: is_public
          ->  Hash  (cost=31861.92..31861.92 rows=1690292 width=4)
                ->  Seq Scan on translation_pair_data
(cost=0.00..31861.92 rows=1690292 width=4)


And here's the query plan for the UPDATE query that seems to never
complete. (Execution time > 30 minutes.)
MatchBox=# explain update translation_pair_data set is_public = true
         where translation_pair_id in
             (select translation_pair_id from translation_pair_data
                  join instance i using(translation_pair_id)
                  join loc_submission ls using(loc_submission_id)
                  where ls.is_public = true);
                                                              QUERY PLAN
------------------------------------------------------------------------
-------------------------------------------------------------
Hash IN Join  (cost=328000.49..453415.65 rows=1690282 width=90)
    Hash Cond: (public.translation_pair_data.translation_pair_id =
public.translation_pair_data.translation_pair_id)
    ->  Seq Scan on translation_pair_data  (cost=0.00..31861.82
rows=1690282 width=90)
    ->  Hash  (cost=293067.74..293067.74 rows=2067660 width=8)
          ->  Hash Join  (cost=59958.35..293067.74 rows=2067660 width=8)
                Hash Cond: (i.translation_pair_id =
public.translation_pair_data.translation_pair_id)
                ->  Hash Join  (cost=365.00..177117.92 rows=2067660
width=4)
                      Hash Cond: (i.loc_submission_id =
ls.loc_submission_id)
                      ->  Seq Scan on instance i
(cost=0.00..99016.16 rows=5706016 width=8)
                      ->  Hash  (cost=293.75..293.75 rows=5700 width=4)
                            ->  Index Scan using
loc_submission_is_public on loc_submission ls  (cost=0.00..293.75
rows=5700 width=4)
                                  Index Cond: (is_public = true)
                                  Filter: is_public
                ->  Hash  (cost=31861.82..31861.82 rows=1690282 width=4)
                      ->  Seq Scan on translation_pair_data
(cost=0.00..31861.82 rows=1690282 width=4)


I figure I must be doing something wrong here. Thanks for the help,

Drew

On Apr 9, 2007, at 2:43 PM, Tom Lane wrote:

> Drew Wilson <> writes:
>> I have 2 tables (A,B) joined in a many-to-many relationship via a
>> join table ("membership"), where updating table A based on table B
>> takes a very long time.
>> ...
>>     ->  Function Scan on a  (cost=0.00..12.50 rows=1000 width=4)
>
> I think you've left out some relevant details ... there's nothing
> in what you said about a set-returning function ...
>
>             regards, tom lane


От:
Tom Lane
Дата:

Drew Wilson <> writes:
> Here's the query plan for a SELECT statement that returns 1,207,161
> rows in 6 seconds.
> ...
> And here's the query plan for the UPDATE query that seems to never
> complete. (Execution time > 30 minutes.)

Well, the subplan is certainly the same as before, so it seems there are
two possibilities:

* there's something unreasonably inefficient about the hash join being
used to perform the IN (work_mem too small? inefficient-to-compare
datatype? bad data distribution?)

* the time is actually going into the UPDATE operation proper, or
perhaps some triggers it fires (have you got any foreign keys involving
this table?  what's checkpoint_segments set to?)

You could narrow it down by checking the runtime for
select count(*) from translation_pair_data
  where translation_pair_id in
        (select translation_pair_id from translation_pair_data ...

If that's slow it's the topmost hash join's fault, else we have
to look at the UPDATE's side effects.

            regards, tom lane

От:
Drew Wilson
Дата:

Thanks for the suggestions, Tom. But I'm still stumped.

On Apr 9, 2007, at 7:13 PM, Tom Lane wrote:

> Drew Wilson <> writes:
>> Here's the query plan for a SELECT statement that returns 1,207,161
>> rows in 6 seconds.
>> ...
>> And here's the query plan for the UPDATE query that seems to never
>> complete. (Execution time > 30 minutes.)
>
> Well, the subplan is certainly the same as before, so it seems
> there are
> two possibilities:
>
> * there's something unreasonably inefficient about the hash join being
> used to perform the IN (work_mem too small? inefficient-to-compare
> datatype? bad data distribution?)
I'm not sure why. The ids are OIDs generated from a sequence, with no
deletions.

> * the time is actually going into the UPDATE operation proper, or
> perhaps some triggers it fires (have you got any foreign keys
> involving
> this table?  what's checkpoint_segments set to?)

> You could narrow it down by checking the runtime for
> select count(*) from translation_pair_data
>   where translation_pair_id in
>         (select translation_pair_id from translation_pair_data ...
After I increasing work_mem from 1M to 32M, checkpoint_segments from
3 to 8, (and reloading), the UPDATE operation still takes about 15
minutes (944 seconds) to update 637,712 rows.

Whereas replacing the the "UPDATE ... WHERE translation_pair_id IN"
with "SELECT count(*) WHERE translation_pair_id IN" drops the time
from 15 minutes to 19 seconds (returning the same 637712 rows.)

> If that's slow it's the topmost hash join's fault, else we have
> to look at the UPDATE's side effects.

The SELECT is not slow, so its a side effect of the update... Looking
at the table definition, there is a "BEFORE ON DELETE" trigger
defined, two CHECK constraints for this table, and three foreign
keys. Nothing looks suspicious to me.
Any clues in the table description below?

Here's the table definition. (And I've appended updated query plans
descriptions.)

MatchBox=# \d translation_pair_data               Table
"public.translation_pair_data"
        Column        |            Type             |   Modifiers
---------------------+-----------------------------+---------------
translation_pair_id | oid                         | not null
translation_id      | oid                         | not null
history_id          | oid                         | not null
source_id           | oid                         | not null
created_ts          | timestamp without time zone | default now()
last_added_ts       | timestamp without time zone | default now()
obsolete            | boolean                     |
style               | character(1)                |
localizability      | boolean                     |
ui_restricted       | boolean                     |
linguistic          | boolean                     |
gender              | character(1)                |
platforms           | character varying[]         |
is_public           | boolean                     |
Indexes:
     "translation_pair_pkey" PRIMARY KEY, btree (translation_pair_id)
     "translation_pair_source_id_key" UNIQUE, btree (source_id,
translation_id)
     "translation_pair_created_date" btree (date(created_ts))
     "translation_pair_data_is_public" btree (is_public)
     "translation_pair_source_id" btree (source_id)
     "translation_pair_source_id_is_not_obsolete" btree (source_id,
obsolete) WHERE obsolete IS NOT TRUE
     "translation_pair_translation_id" btree (translation_id)
Check constraints:
     "translation_pair_gender_check" CHECK (gender = 'M'::bpchar OR
gender = 'F'::bpchar OR gender = 'N'::bpchar)
     "translation_pair_style_check" CHECK (style = 'P'::bpchar OR
style = 'O'::bpchar OR style = 'N'::bpchar)
Foreign-key constraints:
     "translation_pair_history_id_fkey" FOREIGN KEY (history_id)
REFERENCES history(history_id)
     "translation_pair_source_id_fkey" FOREIGN KEY (source_id)
REFERENCES source_data(source_id)
     "translation_pair_translation_id_fkey" FOREIGN KEY
(translation_id) REFERENCES translation_data(translation_id)
Triggers:
     del_tp_prodtype BEFORE DELETE ON translation_pair_data FOR EACH
ROW EXECUTE PROCEDURE eme_delete_tp_prodtype()


Thanks for all your help,

Drew

p.s. here are the updated query plans after bumping work_mem to 32M.

MatchBox=# explain select count(*) from
translation_pair_data

                             where translation_pair_id in (select
translation_pair_id from translation_pair_data join instance i using
(translation_pair_id) join loc_submission ls using(loc_submission_id)
where ls.is_public = true);

QUERY PLAN
------------------------------------------------------------------------
-------------------------------------------------------------------
Aggregate  (cost=424978.46..424978.47 rows=1 width=0)
    ->  Hash IN Join  (cost=324546.91..420732.64 rows=1698329 width=0)
          Hash Cond:
(public.translation_pair_data.translation_pair_id =
public.translation_pair_data.translation_pair_id)
          ->  Seq Scan on translation_pair_data  (cost=0.00..38494.29
rows=1698329 width=4)
          ->  Hash  (cost=290643.93..290643.93 rows=2006718 width=8)
                ->  Hash Join  (cost=66710.78..290643.93 rows=2006718
width=8)
                      Hash Cond: (i.translation_pair_id =
public.translation_pair_data.translation_pair_id)
                      ->  Hash Join  (cost=352.38..169363.36
rows=2006718 width=4)
                            Hash Cond: (i.loc_submission_id =
ls.loc_submission_id)
                            ->  Seq Scan on instance i
(cost=0.00..99016.16 rows=5706016 width=8)
                            ->  Hash  (cost=283.23..283.23 rows=5532
width=4)
                                  ->  Index Scan using
loc_submission_is_public on loc_submission ls  (cost=0.00..283.23
rows=5532 width=4)
                                        Index Cond: (is_public = true)
                                        Filter: is_public
                      ->  Hash  (cost=38494.29..38494.29 rows=1698329
width=4)
                            ->  Seq Scan on translation_pair_data
(cost=0.00..38494.29 rows=1698329 width=4)

The SELECT above takes approx 20s, whereas this UPDATE below takes
944s (15 minutes)

MatchBox=# explain update translation_pair_data set is_public = true
where translation_pair_id in (select translation_pair_id from
translation_pair_data join instance i using(translation_pair_id) join
loc_submission ls using(loc_submission_id) where ls.is_public = true);
                                                              QUERY PLAN
------------------------------------------------------------------------
-------------------------------------------------------------
Hash IN Join  (cost=324546.91..457218.64 rows=1698329 width=90)
    Hash Cond: (public.translation_pair_data.translation_pair_id =
public.translation_pair_data.translation_pair_id)
    ->  Seq Scan on translation_pair_data  (cost=0.00..38494.29
rows=1698329 width=90)
    ->  Hash  (cost=290643.93..290643.93 rows=2006718 width=8)
          ->  Hash Join  (cost=66710.78..290643.93 rows=2006718 width=8)
                Hash Cond: (i.translation_pair_id =
public.translation_pair_data.translation_pair_id)
                ->  Hash Join  (cost=352.38..169363.36 rows=2006718
width=4)
                      Hash Cond: (i.loc_submission_id =
ls.loc_submission_id)
                      ->  Seq Scan on instance i
(cost=0.00..99016.16 rows=5706016 width=8)
                      ->  Hash  (cost=283.23..283.23 rows=5532 width=4)
                            ->  Index Scan using
loc_submission_is_public on loc_submission ls  (cost=0.00..283.23
rows=5532 width=4)
                                  Index Cond: (is_public = true)
                                  Filter: is_public
                ->  Hash  (cost=38494.29..38494.29 rows=1698329 width=4)
                      ->  Seq Scan on translation_pair_data
(cost=0.00..38494.29 rows=1698329 width=4)


От:
Tom Lane
Дата:

Drew Wilson <> writes:
> The SELECT is not slow, so its a side effect of the update... Looking
> at the table definition, there is a "BEFORE ON DELETE" trigger
> defined, two CHECK constraints for this table, and three foreign
> keys. Nothing looks suspicious to me.

Since this is an update we can ignore the before-delete trigger, and
the check constraints don't look expensive to test.  Outgoing foreign
key references are normally not a problem either, since there must
be an index on the other end.  But *incoming* foreign key references
might be an issue --- are there any linking to this table?

Also, the seven indexes seem a bit excessive.  I'm not sure if that's
where the update time is going, but they sure aren't helping, and
some of them seem redundant anyway.  In particular I think that the
partial index WHERE obsolete IS NOT TRUE is probably a waste (do you
have any queries you know use it? what do they look like?) and you
probably don't need all three combinations of source_id and
translation_id --- see discussion here:
http://www.postgresql.org/docs/8.2/static/indexes-bitmap-scans.html

BTW, I don't think you ever mentioned what PG version this is exactly?
If it's 8.1 or later it would be worth slogging through EXPLAIN ANALYZE
on the update, or maybe an update of 10% or so of the rows if you're
impatient.  That would break out the time spent in the triggers, which
would let us eliminate them (or not) as the cause of the problem.

            regards, tom lane

От:
Drew Wilson
Дата:

On Apr 10, 2007, at 6:54 AM, Tom Lane wrote:

> Drew Wilson <> writes:
>> The SELECT is not slow, so its a side effect of the update... Looking
>> at the table definition, there is a "BEFORE ON DELETE" trigger
>> defined, two CHECK constraints for this table, and three foreign
>> keys. Nothing looks suspicious to me.
>
> Since this is an update we can ignore the before-delete trigger, and
> the check constraints don't look expensive to test.  Outgoing foreign
> key references are normally not a problem either, since there must
> be an index on the other end.  But *incoming* foreign key references
> might be an issue --- are there any linking to this table?
There is only one incoming foreign key - the one coming in from the
many-to-many join table ('instance').

>
> Also, the seven indexes seem a bit excessive.  I'm not sure if that's
> where the update time is going, but they sure aren't helping, and
> some of them seem redundant anyway.  In particular I think that the
> partial index WHERE obsolete IS NOT TRUE is probably a waste (do you
> have any queries you know use it? what do they look like?) and you
> probably don't need all three combinations of source_id and
> translation_id --- see discussion here:
> http://www.postgresql.org/docs/8.2/static/indexes-bitmap-scans.html
99% of our queries use obsolete IS NOT TRUE, so we have an index on
this.

> BTW, I don't think you ever mentioned what PG version this is exactly?
> If it's 8.1 or later it would be worth slogging through EXPLAIN
> ANALYZE
> on the update, or maybe an update of 10% or so of the rows if you're
> impatient.  That would break out the time spent in the triggers, which
> would let us eliminate them (or not) as the cause of the problem.
Sorry. I'm using 8.2.3 on Mac OS X 10.4.9, w/ 2.Ghz Intel Core Duo,
and 2G RAM.

If I understand the EXPLAIN ANALYZE results below, it looks like the
time spent applying the "set is_public = true" is much much more than
the fetch. I don't see any triggers firing. Is there something else I
can look for in the logs?

Here is the explain analyze output:
MatchBox=# EXPLAIN ANALYZE UPDATE translation_pair_data SET is_public
= true
          WHERE translation_pair_id IN
             (SELECT translation_pair_id FROM translation_pair_data
                  JOIN instance i using(translation_pair_id)
                  JOIN loc_submission ls using(loc_submission_id)
                      WHERE ls.is_public = true);
                        QUERY PLAN
------------------------------------------------------------------------
---------------------------------------------
Hash IN Join  (cost=324546.91..457218.64 rows=1698329 width=90)
(actual time=12891.309..33621.801 rows=637712 loops=1)
    Hash Cond: (public.translation_pair_data.translation_pair_id =
public.translation_pair_data.translation_pair_id)
    ->  Seq Scan on translation_pair_data  (cost=0.00..38494.29
rows=1698329 width=90) (actual time=0.045..19352.184 rows=1690272
loops=1)
    ->  Hash  (cost=290643.93..290643.93 rows=2006718 width=8)
(actual time=10510.411..10510.411 rows=1207161 loops=1)
          ->  Hash Join  (cost=66710.78..290643.93 rows=2006718
width=8) (actual time=1810.299..9821.862 rows=1207161 loops=1)
                Hash Cond: (i.translation_pair_id =
public.translation_pair_data.translation_pair_id)
                ->  Hash Join  (cost=352.38..169363.36 rows=2006718
width=4) (actual time=11.369..6273.439 rows=1207161 loops=1)
                      Hash Cond: (i.loc_submission_id =
ls.loc_submission_id)
                      ->  Seq Scan on instance i
(cost=0.00..99016.16 rows=5706016 width=8) (actual
time=0.029..3774.705 rows=5705932 loops=1)
                      ->  Hash  (cost=283.23..283.23 rows=5532
width=4) (actual time=11.277..11.277 rows=5563 loops=1)
                            ->  Index Scan using
loc_submission_is_public on loc_submission ls  (cost=0.00..283.23
rows=5532 width=4) (actual time=0.110..7.717 rows=5563 loops=1)
                                  Index Cond: (is_public = true)
                                  Filter: is_public
                ->  Hash  (cost=38494.29..38494.29 rows=1698329
width=4) (actual time=1796.574..1796.574 rows=1690272 loops=1)
                      ->  Seq Scan on translation_pair_data
(cost=0.00..38494.29 rows=1698329 width=4) (actual
time=0.012..917.006 rows=1690272 loops=1)
Total runtime: 1008985.005 ms

Thanks for your help,

Drew

От:
Tom Lane
Дата:

Drew Wilson <> writes:
> If I understand the EXPLAIN ANALYZE results below, it looks like the
> time spent applying the "set is_public = true" is much much more than
> the fetch. I don't see any triggers firing.

Nope, there aren't any.  8.2 is smart enough to bypass firing FK
triggers on UPDATE if the key columns didn't change.  Of course that
check takes a certain amount of time, but I don't think it's a large
amount.  So basically we're looking at index update and WAL logging
as the major costs here, I think.

[ thinks for a bit... ]  Part of the problem might be that the working
set for updating all the indexes is too large.  What do you have
shared_buffers set to, and can you increase it?

Oh, and checkpoint_segments at 8 is probably still not nearly enough;
if you have disk space to spare, try something like 30 (which'd eat
about 1Gb of disk space).  It might be educational to set
checkpoint_warning to 300 or so and watch the log to see how often
checkpoints happen during the update --- you want them at least a couple
minutes apart even during max load.

Also, bumping up wal_buffers a little might help some, for large updates
like this.  I've heard people claim that values as high as 64 are helpful.

            regards, tom lane

От:
Steve
Дата:

Hey there;

I'm trying to tune the memory usage of a new machine that has a -lot- of
memory in it (32 gigs).  We're upgrading from a machine that had 16 gigs
of RAM and using a database that's around 130-some gigs on disc.  Our
largest tables have in the order of close to 10 million rows.

Problem is, the postgres documentation isn't always clear about what
different memory things are used for and it's definitely not clear about
what 'useful values' would be for various things.  Further, looking
online, gets a lot of random stuff and most of the configuration
information out there is for pre-8.1 versions that don't have all these
new and strange values :)

This machine exists only for the database.  With that in mind, a few
questions.


- I've set up a configuration (I'll show important values below), and
I"m wondering if there's any way I can actually see the distribution of
memory in the DB and how the memory is being used.

- What is temp_buffers used for exactly?  Does this matter for, say,
nested queries or anything in specific?  Is there any case where having
this as a large number actually -helps-?

- Do full_page_writes and wal_buffers settings matter AT ALL for a machine
where fysnc = off ?

- What does wal_buffers mean and does increasing this value actually help
anything?

- Any idea if this is a smart configuration for this machine?  It's a
Redhat Enterprise Linux machine (kernel 2.6.18), 8 dual-core AMD 64bit
processors, 32 gigs of RAM, 4x 176 (or whatever the exact number is) gig
SCSI hard drives in a stripe.  Only values I have modified are mentioned,
everything else left at default:

shared_buffers = 16GB
temp_buffers = 128MB
max_prepared_transactions = 0

# This value is going to probably set off cries of using this as a set
# command instead of a big global value; however there's more big queries
# than small ones and the number of simultaneous users is very small so
# 'for now' this can be set globally big and if it shows improvement
# I'll implement it as set commands later.
#
# Question; does this mean 2 gigs will be immediately allocated to
# every query, or is this just how big the work memory is allowed to
# grow per transaction?
work_mem=2G

maintenance_work_mem = 4GB
max_stack_depth = 16MB

# Vacuum suggested I make this 'over 3600000' on the old machine, so
# I use this value; if it's too big, this is a symptom of another problem,
# I'd be interested to know :)
max_fsm_pages = 5000000

# For a lot of reasons, it doesn't make any sense to use fsync for this
# DB.  Read-only during the day, backed up daily, UPS'd, etc.
fsync = off
full_page_writes = off
wal_buffers = 512MB

# Leaving this low makes the DB complain, but I'm not sure what's
# reasonable.
checkpoint_segments = 128

random_page_cost = 1.5
cpu_tuple_cost = 0.001
cpu_index_tuple_cost = 0.0005
cpu_operator_cost = 0.00025
effective_cache_size = 8GB

default_statistics_target = 100




Thanks for all your help!

Steve

От:
Tom Lane
Дата:

Steve <> writes:
> - What is temp_buffers used for exactly?

Temporary tables.  Pages of temp tables belonging to your own backend
don't ever get loaded into the main shared-buffers arena, they are read
into backend-local memory.  temp_buffers is the max amount (per backend)
of local memory to use for this purpose.

> - Do full_page_writes and wal_buffers settings matter AT ALL for a machine
> where fysnc = off ?

Yes.

> - What does wal_buffers mean and does increasing this value actually help
> anything?

It's the amount of space available to buffer WAL log data that's not
been written to disk.  If you have a lot of short transactions then
there's not much benefit to increasing it (because the WAL will be
getting forced to disk frequently anyway) but I've heard reports that
for workloads involving long single transactions bumping it up to 64
or 100 or so helps.

> - Any idea if this is a smart configuration for this machine?

Um ... you didn't mention which PG version?

> # This value is going to probably set off cries of using this as a set
> # command instead of a big global value;

No kidding.  You do NOT want work_mem that high, at least not without an
extremely predictable, simple workload.

> wal_buffers = 512MB

I haven't heard any reports that there's a point in values even as high
as 1 meg for this.

            regards, tom lane

От:
Steve
Дата:

> Steve <> writes:
>> - What is temp_buffers used for exactly?
>
> Temporary tables.  Pages of temp tables belonging to your own backend
> don't ever get loaded into the main shared-buffers arena, they are read
> into backend-local memory.  temp_buffers is the max amount (per backend)
> of local memory to use for this purpose.

     Are these only tables explicitly stated as 'temporary' (which as I
recall is a create table option) or are temporary tables used for other
things to like, say, nested queries or other lil in the background things?

>> - Any idea if this is a smart configuration for this machine?
>
> Um ... you didn't mention which PG version?
>

     The latest and greatest stable as downloaded a couple days ago.
8.2.3. :)


Thanks for the info!


Steve

От:
Greg Smith
Дата:

On Tue, 10 Apr 2007, Steve wrote:

> - I've set up a configuration (I'll show important values below), and I"m
> wondering if there's any way I can actually see the distribution of memory in
> the DB and how the memory is being used.

I didn't notice anyone address this for you yet.  There is a tool in
contrib/pg_buffercache whose purpose in life is to show you what the
shared buffer cache has inside it.  The documentation in that directory
leads through installing it.  The additional variable you'll likely never
know is what additional information is inside the operating system's
buffer cache.

> # Leaving this low makes the DB complain, but I'm not sure what's #
> reasonable.
> checkpoint_segments = 128

That's a reasonable setting for a large server.  The main downside to
setting it that high is longer recovery periods after a crash, but I doubt
that's a problem for you if you're so brazen as to turn off fsync.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

От:
Steve
Дата:

> I didn't notice anyone address this for you yet.  There is a tool in
> contrib/pg_buffercache whose purpose in life is to show you what the shared
> buffer cache has inside it.  The documentation in that directory leads
> through installing it.  The additional variable you'll likely never know is
> what additional information is inside the operating system's buffer cache.

     Okay -- thanks!  I'll take a look at this.

>> # Leaving this low makes the DB complain, but I'm not sure what's #
>> reasonable.
>> checkpoint_segments = 128
>
> That's a reasonable setting for a large server.  The main downside to setting
> it that high is longer recovery periods after a crash, but I doubt that's a
> problem for you if you're so brazen as to turn off fsync.

     Hahaha yeah.  It's 100% assumed that if something goes bad we're
restoring from the previous day's backup.  However because the DB is read
only for -most- of the day and only read/write at night it's acceptable
risk for us anyway.  But good to know that's a reasonable value.


Steve

От:
Andrew McMillan
Дата:

On Tue, 2007-04-10 at 15:28 -0400, Steve wrote:
>
> I'm trying to tune the memory usage of a new machine that has a -lot- of
> memory in it (32 gigs).

...
>
> shared_buffers = 16GB

Really?

Wow!

Common wisdom in the past has been that values above a couple of hundred
MB will degrade performance.  Have you done any benchmarks on 8.2.x that
show that you get an improvement from this, or did you just take the
"too much of a good thing is wonderful" approach?

Cheers,
                    Andrew.

-------------------------------------------------------------------------
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/            PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201      MOB: +64(272)DEBIAN      OFFICE: +64(4)499-2267
 You have an unusual equipment for success.  Be sure to use it properly.
-------------------------------------------------------------------------


От:
Steve
Дата:

> Really?
>
> Wow!
>
> Common wisdom in the past has been that values above a couple of hundred
> MB will degrade performance.  Have you done any benchmarks on 8.2.x that
> show that you get an improvement from this, or did you just take the
> "too much of a good thing is wonderful" approach?
>

     Not to be rude, but there's more common wisdom on this particular
subject than anything else in postgres I'd say ;)  I think I recently read
someone else on this list who's laundry-listed the recommended memory
values that are out there these days and pretty much it ranges from
what you've just said to "half of system memory".

     I've tried many memory layouts, and in my own experience with
this huge DB, more -does- appear to be better but marginally so; more
memory alone won't fix a speed problem.  It may be a function of how much
reading/writing is done to the DB and if fsync is used or not if that
makes any sense :)  Seems there's no "silver bullet" to the shared_memory
question.  Or if there is, nobody can agree on it ;)


Anyway, talk to you later!


Steve

От:
Ron
Дата:

At 12:38 PM 4/13/2007, Steve wrote:
>>Really?
>>
>>Wow!
>>
>>Common wisdom in the past has been that values above a couple of hundred
>>MB will degrade performance.  Have you done any benchmarks on 8.2.x that
>>show that you get an improvement from this, or did you just take the
>>"too much of a good thing is wonderful" approach?
>
>         Not to be rude, but there's more common wisdom on this
> particular subject than anything else in postgres I'd say ;)  I
> think I recently read someone else on this list who's
> laundry-listed the recommended memory values that are out there
> these days and pretty much it ranges from what you've just said to
> "half of system memory".
>
>         I've tried many memory layouts, and in my own experience
> with this huge DB, more -does- appear to be better but marginally
> so; more memory alone won't fix a speed problem.  It may be a
> function of how much reading/writing is done to the DB and if fsync
> is used or not if that makes any sense :)  Seems there's no "silver
> bullet" to the shared_memory question.  Or if there is, nobody can
> agree on it ;)

One of the reasons for the wide variance in suggested values for pg
memory use is that pg 7.x and pg 8.x are =very= different beasts.

If you break the advice into pg 7.x and pg 8.x categories, you find
that there is far less variation in the suggestions.

Bottom line: pg 7.x could not take advantage of larger sums of memory
anywhere near as well as pg 8.x can.

Cheers,
Ron


От:
Carlos Moreno
Дата:

Steve wrote:
>>
>> Common wisdom in the past has been that values above a couple of hundred
>> MB will degrade performance.

The annotated config file talks about setting shared_buffers to a third
of the
available memory --- well, it says "it should be no more than 1/3 of the
total
amount of memory"  (quoting off the top of my head).  Don't recall seeing
any warning about not exceeding a few hundred megabytes.

My eternal curiosity when it comes to this memory and shared_buffers thing:

How does PG take advantage of the available memory?  I mean, if I have a
machine with, say, 4 or 8GB of memory, how will those GBs would end
up being used?   They just do??   (I mean, I would find that a vaild
answer;
but I ask, because this configuration parameters stuff makes me think that
perhaps PG does not simply use whatever memory is in there, but it has
to go through the parameters in the config file to allocate whatever it has
to use).

So, is it just like that?   We put more memory and PG will automatically
make use of it?

Carlos
--


От:
"Jan de Visser"
Дата:

On Friday 13 April 2007 14:53:53 Carlos Moreno wrote:
> How does PG take advantage of the available memory?  I mean, if I have a
> machine with, say, 4 or 8GB of memory, how will those GBs would end
> up being used?   They just do??   (I mean, I would find that a vaild
> answer;

On linux the filesystem cache will gobble them up, which means indirectly
pgsql profits as well (assuming no other apps poison the fs cache).

jan

--
--------------------------------------------------------------
Jan de Visser                     

                Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------

От:
Tom Lane
Дата:

Ron <> writes:
> One of the reasons for the wide variance in suggested values for pg
> memory use is that pg 7.x and pg 8.x are =very= different beasts.
> If you break the advice into pg 7.x and pg 8.x categories, you find
> that there is far less variation in the suggestions.
> Bottom line: pg 7.x could not take advantage of larger sums of memory
> anywhere near as well as pg 8.x can.

Actually I think it was 8.1 that really broke the barrier in terms of
scalability of shared_buffers.  Pre-8.1, the buffer manager just didn't
scale well enough to make it useful to use more than a few hundred meg.
(In fact, we never even bothered to fix the shared-memory-sizing
calculations to be able to deal with >2GB shared memory until 8.1;
if you try it in 8.0 it'll probably just crash.)

            regards, tom lane