Обсуждение: slow full table update

От:
Дата:

Hi,

I have table with cca 60.000 rows and
when I run query as:
 Update table SET column=0;
after 10 minutes i must stop query, but it still running :(

I've Postgres 8.1 with all default settings in postgres.conf

Where is the problem?

Thak you for any tips.

best regards.
Marek Fiala







От:
tv@fuzzy.cz
Дата:

Sorry, but you have to provide much more information about the table. The
information you've provided is really not sufficient - the rows might be
large or small. I guess it's the second option, with a lots of dead rows.

Try this:

ANALYZE table;
SELECT relpages, reltuples FROM pg_class WHERE relname = 'table';

Anyway, is the autovacuum running? What are the parameters? Try to execute

VACUUM table;

and then run the two commands above. That might 'clean' the table and
improve the update performance. Don't forget each such UPDATE will
actually create a copy of all the modified rows (that's how PostgreSQL
works), so if you don't run VACUUM periodically or autovacuum demon, then
the table will bloat (occupy much more disk space than it should).

If it does not help, try do determine if the UPDATE is CPU or disk bound.
I'd guess there are problems with I/O bottleneck (due to the bloating).

regards
Tomas

> Hi,
>
> I have table with cca 60.000 rows and
> when I run query as:
>  Update table SET column=0;
> after 10 minutes i must stop query, but it still running :(
>
> I've Postgres 8.1 with all default settings in postgres.conf
>
> Where is the problem?
>
> Thak you for any tips.
>
> best regards.
> Marek Fiala
>
>
>
>
>
>
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



От:
Дата:

Hi,

thank you for your reply.

Here is some aditional information:

the problem is on every tables with small and large rows too.
autovacuum is running.

relpages    reltuples
6213              54743

tables are almost write-only
Munin Graphs shows that problems is with I/O bottleneck.

I found out that
Update 100 rows takes 0.3s
but update 1000 rows takes 50s

Is this better information?

Thanks for any help.

best regards
Marek Fiala
______________________________________________________________
> Od: 
> Komu: 
> CC: 
> Datum: 10.11.2008 17:42
> Předmět: Re: [PERFORM] slow full table update
>
>Sorry, but you have to provide much more information about the table. The
>information you've provided is really not sufficient - the rows might be
>large or small. I guess it's the second option, with a lots of dead rows.
>
>Try this:
>
>ANALYZE table;
>SELECT relpages, reltuples FROM pg_class WHERE relname = 'table';
>
>Anyway, is the autovacuum running? What are the parameters? Try to execute
>
>VACUUM table;
>
>and then run the two commands above. That might 'clean' the table and
>improve the update performance. Don't forget each such UPDATE will
>actually create a copy of all the modified rows (that's how PostgreSQL
>works), so if you don't run VACUUM periodically or autovacuum demon, then
>the table will bloat (occupy much more disk space than it should).
>
>If it does not help, try do determine if the UPDATE is CPU or disk bound.
>I'd guess there are problems with I/O bottleneck (due to the bloating).
>
>regards
>Tomas
>
>> Hi,
>>
>> I have table with cca 60.000 rows and
>> when I run query as:
>>  Update table SET column=0;
>> after 10 minutes i must stop query, but it still running :(
>>
>> I've Postgres 8.1 with all default settings in postgres.conf
>>
>> Where is the problem?
>>
>> Thak you for any tips.
>>
>> best regards.
>> Marek Fiala
>>
>>
>>
>>
>>
>>
>>
>> --
>> Sent via pgsql-performance mailing list ()
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>
>
>--
>Sent via pgsql-performance mailing list ()
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance
>


От:
tv@fuzzy.cz
Дата:

Hi,

so the table occupies about 50 MB, i.e. each row has about 1 kB, right?
Updating 1000 rows should means about 1MB of data to be updated.

There might be a problem with execution plan of the updates - I guess the
100 rows update uses index scan and the 1000 rows update might use seq
scan.

Anyway the table is not too big, so I wouldn't expect such I/O bottleneck
on a properly tuned system. Have you checked the postgresql.conf settings?
What are the values for

1) shared_buffers - 8kB pages used as a buffer (try to increase this a
little, for example to 1000, i.e. 8MB, or even more)

2) checkpoint_segments - number of 16MB checkpoint segments, aka
transaction logs, this usually improves the write / update performance a
lot, so try to increase the default value (3) to at least 8

3) wal_buffers - 8kB pages used to store WAL (minimal effect usually, but
try to increase it to 16 - 64, just to be sure)

There is a nicely annotated config, with recommendations on how to set the
values based on usage etc. See this:

http://www.powerpostgresql.com/Downloads/annotated_conf_80.html
http://www.powerpostgresql.com/PerfList

regards
Tomas

> Hi,
>
> thank you for your reply.
>
> Here is some aditional information:
>
> the problem is on every tables with small and large rows too.
> autovacuum is running.
>
> relpages    reltuples
> 6213              54743
>
> tables are almost write-only
> Munin Graphs shows that problems is with I/O bottleneck.
>
> I found out that
> Update 100 rows takes 0.3s
> but update 1000 rows takes 50s
>
> Is this better information?
>
> Thanks for any help.
>
> best regards
> Marek Fiala
> ______________________________________________________________
>> Od: 
>> Komu: 
> > CC: 
>> Datum: 10.11.2008 17:42
>> P�edm�t: Re: [PERFORM] slow full table update
>>
>>Sorry, but you have to provide much more information about the table. The
>>information you've provided is really not sufficient - the rows might be
>>large or small. I guess it's the second option, with a lots of dead rows.
>>
>>Try this:
>>
>>ANALYZE table;
>>SELECT relpages, reltuples FROM pg_class WHERE relname = 'table';
>>
>>Anyway, is the autovacuum running? What are the parameters? Try to
>> execute
>>
>>VACUUM table;
>>
>>and then run the two commands above. That might 'clean' the table and
>>improve the update performance. Don't forget each such UPDATE will
>>actually create a copy of all the modified rows (that's how PostgreSQL
>>works), so if you don't run VACUUM periodically or autovacuum demon, then
>>the table will bloat (occupy much more disk space than it should).
>>
>>If it does not help, try do determine if the UPDATE is CPU or disk bound.
>>I'd guess there are problems with I/O bottleneck (due to the bloating).
>>
>>regards
>>Tomas
>>
>>> Hi,
>>>
>>> I have table with cca 60.000 rows and
>>> when I run query as:
>>>  Update table SET column=0;
>>> after 10 minutes i must stop query, but it still running :(
>>>
>>> I've Postgres 8.1 with all default settings in postgres.conf
>>>
>>> Where is the problem?
>>>
>>> Thak you for any tips.
>>>
>>> best regards.
>>> Marek Fiala
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sent via pgsql-performance mailing list
>>> ()
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>
>>
>>
>>
>>--
>>Sent via pgsql-performance mailing list
>> ()
>>To make changes to your subscription:
>>http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



От:
Дата:

Hi,

I've changed settings,
but with no effect on speed.

I try explain query with this result
for 10.000 rows > update songs set views = 0 where sid > 20000 and sid < 30000

Bitmap Heap Scan on songs  (cost=151.59..6814.29 rows=8931 width=526) (actual time=4.848..167.855 rows=8945 loops=1)

  Recheck Cond: ((sid > 20000) AND (sid < 30000))

  ->  Bitmap Index Scan on pk_songs2  (cost=0.00..151.59 rows=8931 width=0) (actual time=4.071..4.071 rows=9579
loops=1)

        Index Cond: ((sid > 20000) AND (sid < 30000))

Is there a way to run this query on sigle  throughpass with no Recheck Cond?

Thank you.

best regards
Marek Fiala

______________________________________________________________
> Od: 
> Komu: 
> Datum: 12.11.2008 17:48
> Předmět: Re: [PERFORM] slow full table update
>
>Hi,
>
>so the table occupies about 50 MB, i.e. each row has about 1 kB, right?
>Updating 1000 rows should means about 1MB of data to be updated.
>
>There might be a problem with execution plan of the updates - I guess the
>100 rows update uses index scan and the 1000 rows update might use seq
>scan.
>
>Anyway the table is not too big, so I wouldn't expect such I/O bottleneck
>on a properly tuned system. Have you checked the postgresql.conf settings?
>What are the values for
>
>1) shared_buffers - 8kB pages used as a buffer (try to increase this a
>little, for example to 1000, i.e. 8MB, or even more)
>
>2) checkpoint_segments - number of 16MB checkpoint segments, aka
>transaction logs, this usually improves the write / update performance a
>lot, so try to increase the default value (3) to at least 8
>
>3) wal_buffers - 8kB pages used to store WAL (minimal effect usually, but
>try to increase it to 16 - 64, just to be sure)
>
>There is a nicely annotated config, with recommendations on how to set the
>values based on usage etc. See this:
>
>http://www.powerpostgresql.com/Downloads/annotated_conf_80.html
>http://www.powerpostgresql.com/PerfList
>
>regards
>Tomas
>
>> Hi,
>>
>> thank you for your reply.
>>
>> Here is some aditional information:
>>
>> the problem is on every tables with small and large rows too.
>> autovacuum is running.
>>
>> relpages    reltuples
>> 6213              54743
>>
>> tables are almost write-only
>> Munin Graphs shows that problems is with I/O bottleneck.
>>
>> I found out that
>> Update 100 rows takes 0.3s
>> but update 1000 rows takes 50s
>>
>> Is this better information?
>>
>> Thanks for any help.
>>
>> best regards
>> Marek Fiala
>> ______________________________________________________________
>>> Od: 
>>> Komu: 
>> > CC: 
>>> Datum: 10.11.2008 17:42
>>> P�edm�t: Re: [PERFORM] slow full table update
>>>
>>>Sorry, but you have to provide much more information about the table. The
>>>information you've provided is really not sufficient - the rows might be
>>>large or small. I guess it's the second option, with a lots of dead rows.
>>>
>>>Try this:
>>>
>>>ANALYZE table;
>>>SELECT relpages, reltuples FROM pg_class WHERE relname = 'table';
>>>
>>>Anyway, is the autovacuum running? What are the parameters? Try to
>>> execute
>>>
>>>VACUUM table;
>>>
>>>and then run the two commands above. That might 'clean' the table and
>>>improve the update performance. Don't forget each such UPDATE will
>>>actually create a copy of all the modified rows (that's how PostgreSQL
>>>works), so if you don't run VACUUM periodically or autovacuum demon, then
>>>the table will bloat (occupy much more disk space than it should).
>>>
>>>If it does not help, try do determine if the UPDATE is CPU or disk bound.
>>>I'd guess there are problems with I/O bottleneck (due to the bloating).
>>>
>>>regards
>>>Tomas
>>>
>>>> Hi,
>>>>
>>>> I have table with cca 60.000 rows and
>>>> when I run query as:
>>>>  Update table SET column=0;
>>>> after 10 minutes i must stop query, but it still running :(
>>>>
>>>> I've Postgres 8.1 with all default settings in postgres.conf
>>>>
>>>> Where is the problem?
>>>>
>>>> Thak you for any tips.
>>>>
>>>> best regards.
>>>> Marek Fiala
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent via pgsql-performance mailing list
>>>> ()
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>>
>>>
>>>
>>>
>>>--
>>>Sent via pgsql-performance mailing list
>>> ()
>>>To make changes to your subscription:
>>>http://www.postgresql.org/mailpref/pgsql-performance
>>>
>>
>>
>> --
>> Sent via pgsql-performance mailing list ()
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>
>
>--
>Sent via pgsql-performance mailing list ()
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance
>


От:
Richard Huxton
Дата:

 wrote:
> Hi,
>
> I've changed settings,
> but with no effect on speed.
>
> I try explain query with this result
> for 10.000 rows > update songs set views = 0 where sid > 20000 and sid < 30000
>
> Bitmap Heap Scan on songs  (cost=151.59..6814.29 rows=8931 width=526) (actual time=4.848..167.855 rows=8945 loops=1)

This query says t is taking 167 milli-seconds, not 10 minutes as your
first message said. Is this query actually slow?

>
>   Recheck Cond: ((sid > 20000) AND (sid < 30000))
>
>   ->  Bitmap Index Scan on pk_songs2  (cost=0.00..151.59 rows=8931 width=0) (actual time=4.071..4.071 rows=9579
loops=1)
>
>         Index Cond: ((sid > 20000) AND (sid < 30000))
>
> Is there a way to run this query on sigle  throughpass with no Recheck Cond?

Only a sequential scan.

--
  Richard Huxton
  Archonet Ltd

От:
"Vladimir Sitnikov"
Дата:


>
>   Recheck Cond: ((sid > 20000) AND (sid < 30000))
>
>   ->  Bitmap Index Scan on pk_songs2  (cost=0.00..151.59 rows=8931 width=0) (actual time=4.071..4.071 rows=9579 loops=1)
>
>         Index Cond: ((sid > 20000) AND (sid < 30000))
>
> Is there a way to run this query on sigle  throughpass with no Recheck Cond?

"Recheck Cond" is somewhat misleading here.
 
Bitmap Index Scan has almost void "recheck" impact in case the whole bitmap fits in work_mem. That means bitmap scan degrades when the number of rows in table (not the total number of returned rows) is greater than work_mem*1024*8. 60'000 rows bitmap scan will require 60'000/8=7'500 bytes ~ 8Kbytes of memory to run without additional recheck, thus I do not believe it hurts you in this particular case


Regards,
Vladimir Sitnikov
От:
Tom Lane
Дата:

Richard Huxton <> writes:
>  wrote:
>> I try explain query with this result
>> for 10.000 rows > update songs set views = 0 where sid > 20000 and sid < 30000
>>
>> Bitmap Heap Scan on songs  (cost=151.59..6814.29 rows=8931 width=526) (actual time=4.848..167.855 rows=8945 loops=1)

> This query says t is taking 167 milli-seconds, not 10 minutes as your
> first message said. Is this query actually slow?

The explain plan tree only shows the time to fetch/compute the new rows,
not to actually perform the update, update indexes, or fire triggers.
If there is a big discrepancy then the extra time must be going into
one of those steps.

8.1 does show trigger execution time separately, so the most obvious
problem (unindexed foreign key reference) seems to be excluded, unless
the OP just snipped that part of the output ...

            regards, tom lane

От:
"Scott Marlowe"
Дата:

On Mon, Nov 10, 2008 at 9:30 AM,  <> wrote:
> Hi,
>
> I have table with cca 60.000 rows and
> when I run query as:
>  Update table SET column=0;
> after 10 minutes i must stop query, but it still running :(

What does

vacuum verbose table;

say?  I'm wondering if it's gotten overly bloated.

How long does

select count(*) from table;

take to run (use \timing to time it)

От:
Дата:

hi,

 select count(*) from songs;
 count
-------
 54909
(1 row)

Time: 58.182 ms

update songs set views = 0;
UPDATE 54909
Time: 101907.837 ms
time is actually less than 10 minutes, but it is still very long :(

vacuum said>

VACUUM VERBOSE songs;
INFO:  vacuuming "public.songs"
INFO:  index "pk_songs2" now contains 54909 row versions in 595 pages
DETAIL:  193 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.06 sec.
INFO:  index "fk_albums_aid_index" now contains 54909 row versions in 1330 pages
DETAIL:  193 index row versions were removed.
812 index pages have been deleted, 812 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.04 sec.
INFO:  index "fk_artists_artid_index" now contains 54910 row versions in 628 pages
DETAIL:  193 index row versions were removed.
114 index pages have been deleted, 114 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.10 sec.
INFO:  index "fk_users_uid_karaoke_index" now contains 54910 row versions in 2352 pages
DETAIL:  193 index row versions were removed.
2004 index pages have been deleted, 2004 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.95 sec.
INFO:  index "datum_tag_indx" now contains 54910 row versions in 2083 pages
DETAIL:  193 index row versions were removed.
1728 index pages have been deleted, 1728 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.47 sec.
INFO:  index "datum_video_indx" now contains 54910 row versions in 1261 pages
DETAIL:  193 index row versions were removed.
826 index pages have been deleted, 826 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.06 sec.
INFO:  "songs": removed 193 row versions in 164 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  "songs": found 193 removable, 54909 nonremovable row versions in 6213 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 132969 unused item pointers.
0 pages are entirely empty.
CPU 0.07s/0.04u sec elapsed 1.74 sec.
INFO:  vacuuming "pg_toast.pg_toast_28178"
INFO:  index "pg_toast_28178_index" now contains 2700 row versions in 13 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_28178": found 0 removable, 2700 nonremovable row versions in 645 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 88 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 1750.460 ms

best regards
Marek Fiala
______________________________________________________________
> Od: 
> Komu: 
> CC: 
> Datum: 12.11.2008 21:55
> Předmět: Re: [PERFORM] slow full table update
>
>On Mon, Nov 10, 2008 at 9:30 AM,  <> wrote:
>> Hi,
>>
>> I have table with cca 60.000 rows and
>> when I run query as:
>>  Update table SET column=0;
>> after 10 minutes i must stop query, but it still running :(
>
>What does
>
>vacuum verbose table;
>
>say?  I'm wondering if it's gotten overly bloated.
>
>How long does
>
>select count(*) from table;
>
>take to run (use timing to time it)
>


От:
"Scott Marlowe"
Дата:

This is the critical point.  You have this line:

There were 132969 unused item pointers.

Which says there's 132k or so dead rows in your table.  Which means
vacuum / autovacuum isn't keeping up.  Did you try and stop the update
several times?  Each time it starts then gets killed it creates dead
rows.

Try doing a vacuum full followed by a reindex OR a cluster on this
table and see if that helps.

От:
Tomas Vondra
Дата:

> This is the critical point.  You have this line:
>
> There were 132969 unused item pointers.
>
> Which says there's 132k or so dead rows in your table.  Which means
> vacuum / autovacuum isn't keeping up.  Did you try and stop the update
> several times?  Each time it starts then gets killed it creates dead
> rows.

Try to run just ANALYZE on the table and then run the

SELECT relpages, reltuples FROM pg_class WHERE relname = 'table'

again. It should report about 20k of pages, i.e. 160MB. That might slow
the things down ;-)

> Try doing a vacuum full followed by a reindex OR a cluster on this
> table and see if that helps.

Well, maybe the vacuum will fix the problem - have you executed the
query that took 167ms (according to the explain analyze output posted by
you) over a clean table? But I doubt the growth from 6.000 to 20.000
alone might cause degradation from 170ms to several minutes ...

regards
Tomas

От:
Tomas Vondra
Дата:

> The explain plan tree only shows the time to fetch/compute the new rows,
> not to actually perform the update, update indexes, or fire triggers.
> If there is a big discrepancy then the extra time must be going into
> one of those steps.
>
> 8.1 does show trigger execution time separately, so the most obvious
> problem (unindexed foreign key reference) seems to be excluded, unless
> the OP just snipped that part of the output ...

Yeah, that quite frequent problem with updates. Try to separate create a
copy of the table, i.e.

CREATE TABLE test_table AS SELECT * FROM table;

and try to execute the query on it.

What tables do reference the original table using a foreign key? Do they
have indexes on the foreign key column? How large are there referencing
tables? Are these tables updated heavily and vacuumed properly (i.e.
aren't they bloated with dead rows)?

I'm not sure if the FK constraints are checked always, or just in case
the referenced column is updated. I guess the FK check is performed only
in case of DELETE or when the value in the FK column is modified (but
update of the primary key is not very frequent I guess).

Are there any triggers and / or rules on the table?

regards
Tomas

От:
PFC
Дата:

> update songs set views = 0;
> UPDATE 54909
> Time: 101907.837 ms
> time is actually less than 10 minutes, but it is still very long :(

    Wow.

test=> CREATE TABLE test (id SERIAL PRIMARY KEY, value INTEGER);
test=> INSERT INTO test (value) SELECT n FROM generate_series( 1,100000 )
AS n;
Temps : 1706,495 ms
test=> UPDATE test SET value=0;
Temps : 1972,420 ms

    Note this is 8.3.3 on a desktop PC with the database and xlog on a Linux
Software RAID1 of rather slow drives (about 50 MB/s).
    Anyway your 10 minutes are really wrong.

    First thing to check is if there is a problem with your IO subsystem, try
the example queries above, you should get timings in the same ballpark. If
you get 10x slower than that, you have a problem.

    Are the rows large ? I would believe so, because a "songs" table will
probably contain things like artist, title, comments, and lots of other
information in strings that are too small to be TOAST'ed. Perhaps your
problem is in index updates, too.

    So, make a copy of the songs table, without any indices, and no foreign
keys :

    CREATE TABLE songs2 AS SELECT * FROM songs;

    Then try your UPDATE on this. How slow is it ?

    Now drop this table, and recreate it with the foreign keys. Test the
update again.
    Now drop this table, and recreate it with the foreign keys and indexes.
Test the update again.

    This will give you some meaningful information.

    You will probably update the 'views' column quite often, it will even
probably be the most often updated column in your application. In this
case, you could try moving it to a separate table with just (song_id,
view), that way you will update a very small table.