Обсуждение: Postgresql is very slow

Поиск

Список

Период

Сортировка

Postgresql is very slow

От

bijayant kumar

Дата:

23 июня 2008 г., 08:06:59

Hello to list,

We have a CentOS-5 server with postgresql-8.1.8 installed. I am struggling with postgresql performance. Any query say
select* from tablename takes 10-15 mins to give the output, and while executing the query system loads goes up like
anything.After the query output, system loads starts decresing. 

Any query select,insert,update simple or complex behaves in the same way, what i have explained above.

System Specification:

OS :- CentOs 5
Postgresql 8.1.8
RAM :- 1 GB
SWAP 2 GB

Some relevent part(uncommented) of my  /var/lib/pgsql/data/postgresql.conf

listen_addresses = 'localhost'
max_connections = 100
shared_buffers = 1000

The one more strange thing is that with the same setting on another server, postgresql is running very smooth. I had
runvacum also some times back. 

Please help me out and let me know if you need any other information.

Thanks & Regards,

Bijayant Kumar

Send instant messages to your online friends http://uk.messenger.yahoo.com

Re: Postgresql is very slow

От

Jan de Visser

Дата:

23 июня 2008 г., 09:35:23

On Monday 23 June 2008 07:06:54 bijayant kumar wrote:
> Hello to list,
>
> We have a CentOS-5 server with postgresql-8.1.8 installed. I am struggling
> with postgresql performance. Any query say select * from tablename takes
> 10-15 mins to give the output, and while executing the query system loads
> goes up like anything. After the query output, system loads starts
> decresing.

Sounds like a vacuum problem.

>
> Any query select,insert,update simple or complex behaves in the same way,
> what i have explained above.
>
> System Specification:
>
> OS :- CentOs 5
> Postgresql 8.1.8
> RAM :- 1 GB
> SWAP 2 GB
>
> Some relevent part(uncommented) of my  /var/lib/pgsql/data/postgresql.conf
>
> listen_addresses = 'localhost'
> max_connections = 100
> shared_buffers = 1000

You shared_buffers seems low.

>
> The one more strange thing is that with the same setting on another server,
> postgresql is running very smooth. I had run vacum also some times back.

You are aware that vacuum is supposed to be an ongoing maintenance activity,
right?

>
> Please help me out and let me know if you need any other information.
>
> Thanks & Regards,
>
> Bijayant Kumar
>
> Send instant messages to your online friends http://uk.messenger.yahoo.com

jan

Re: Postgresql is very slow

От

Jeremy Harris

Дата:

23 июня 2008 г., 10:36:38

bijayant kumar wrote:
>  select * from tablename takes 10-15 mins to give the output

There are better ways to dump data than using a database; that's
not a useful query.

> Any query select,insert,update simple or complex behaves in the same way

Have you set up suitable indexes for your operations (and then run analyze)?

Cheers,
   Jeremy

Re: Postgresql is very slow

От

tv@fuzzy.cz

Дата:

23 июня 2008 г., 10:49:47

Hi,

> Hello to list,
>
> We have a CentOS-5 server with postgresql-8.1.8 installed. I am struggling
> with postgresql performance. Any query say select * from tablename takes
> 10-15 mins to give the output, and while executing the query system loads
> goes up like anything. After the query output, system loads starts
> decresing.

I doubt the 'select * from tablename' is a good candidate for tuning, but
give us more information about the table. What is it's size - how many
rows does it have and how much space does it occupy on the disk? What is a
typical usage of the table - is it modified (update / delete) frequently?
How is it maintained - is there a autovacuum running, or did you set a
routine vacuum (and analyze) job to maintain the database?

I guess one of the servers (the slow one) is running for a long time
without a proper db maintenance (vacuum / analyze) and you dumped / loaded
the db onto a new server. So the 'new server' has much more 'compact'
tables and thus gives the responses much faster. And this holds for more
complicated queries (with indexes etc) too.

An output from 'EXPLAIN' (or 'EXPLAIN ANALYZE') command would give a much
better overview.

Tomas

Re: Postgresql is very slow

От

"Gregory S. Youngblood"

Дата:

23 июня 2008 г., 19:21:26

> System Specification:

> OS :- CentOs 5
> Postgresql 8.1.8
> RAM :- 1 GB
> SWAP 2 GB

[Greg says]
How much memory is actually free, can you include the output from the command "free" in your reply? What else runs on
thisserver? What is the system load before and during your query? 

While it's more likely the other comments about vacuum will be the ultimate cause, performance will also be degraded,
sometimessignificantly, if your system has too many other things running and you are actively using swap space. 

Greg

Re: Postgresql is very slow

От

bijayant kumar

Дата:

24 июня 2008 г., 03:23:18

Hi,

Thanks for the reply. Many gentlemans have replied to my question, thanks to all of them. I have tried to answer all
questionsin one mail. 

--- On Mon, 23/6/08, tv@fuzzy.cz <tv@fuzzy.cz> wrote:

> From: tv@fuzzy.cz <tv@fuzzy.cz>
> Subject: Re: [PERFORM] Postgresql is very slow
> To: bijayant4u@yahoo.com
> Cc: pgsql-performance@postgresql.org
> Date: Monday, 23 June, 2008, 7:20 PM
> Hi,
>
> > Hello to list,
> >
> > We have a CentOS-5 server with postgresql-8.1.8
> installed. I am struggling
> > with postgresql performance. Any query say select *
> from tablename takes
> > 10-15 mins to give the output, and while executing the
> query system loads
> > goes up like anything. After the query output, system
> loads starts
> > decresing.
>
> I doubt the 'select * from tablename' is a good
> candidate for tuning, but
> give us more information about the table. What is it's
> size - how many
> rows does it have and how much space does it occupy on the
> disk? What is a
> typical usage of the table - is it modified (update /
> delete) frequently?
> How is it maintained - is there a autovacuum running, or
> did you set a
> routine vacuum (and analyze) job to maintain the database?
>
> I guess one of the servers (the slow one) is running for a
> long time
> without a proper db maintenance (vacuum / analyze) and you
> dumped / loaded
> the db onto a new server. So the 'new server' has
> much more 'compact'
> tables and thus gives the responses much faster. And this
> holds for more
> complicated queries (with indexes etc) too.
>
> An output from 'EXPLAIN' (or 'EXPLAIN
> ANALYZE') command would give a much
> better overview.
>

We maintains mail server, for this datas are stored in postgresql. There are total 24 tables but only two are used.
Basicallyone table say USER stores the users information like mailid and his password, and there are 1669 rows in this
table.The other table stores the domains name and no updation/deletion/insertion happens very frequently. Once in a
monththis table is touched. 
But the second table USER is modified frequently(like on an average 10 times daily) because users changes their
password,new users are being added, old ones are deleted. 

We have created this database with the dump of our old server, and with the same dump the database is running fine on
thenew server but not on the slow server. 

I was not aware of the VACUUM functionality earlier, but some times back i read and run this on the server but i did
notachieve anything in terms of performance. The server is running from 1 to 1.5 years and we have done VACUUM only
once.

Is this the problem of slow database?  One more thing if i recreate the database, will it help?

The output of ANALYZE

ANALYZE verbose USERS;
INFO:  analyzing "public.USERS"
INFO:  "USERS": scanned 3000 of 54063 pages, containing 128 live rows and 1 dead rows; 128 rows in sample, 2307
estimatedtotal rows 
ANALYZE

The output of EXPLAIN query;

select * from USERS where email like '%bijayant.kumar%';
This simplest query tooks 10 minutes and server loads goes from 0.35 to 16.94.

EXPLAIN select * from USERS where email like '%bijayant.kumar%';
                          QUERY PLAN
--------------------------------------------------------------
 Seq Scan on USERS  (cost=0.00..54091.84 rows=1 width=161)
   Filter: ((email)::text ~~ '%bijayant.kumar%'::text)
(2 rows)

I hope i have covered everything in my mail to troubleshoot my problem.

> Tomas
>
>
> --
> Sent via pgsql-performance mailing list
> (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Send instant messages to your online friends http://uk.messenger.yahoo.com

Re: Postgresql is very slow

От

"Scott Marlowe"

Дата:

24 июня 2008 г., 04:52:05

On Mon, Jun 23, 2008 at 11:48 PM, bijayant kumar <bijayant4u@yahoo.com> wrote:

OK, you don't have a ton of updates each day, but they add up over time.

> I was not aware of the VACUUM functionality earlier, but some times back i read and run this on the server but i did
notachieve anything in terms of performance. The server is running from 1 to 1.5 years and we have done VACUUM only
once.

vacuuming isn't so much about performance as about maintenance.  You
don't change the oil in your car to make it go faster, you do it to
keep it running smoothly.  Don't change it for 1.5 years and you could
have problems.  sludge build up / dead tuple build up.  Kinda similar.

> Is this the problem of slow database?  One more thing if i recreate the database, will it help?

Most likely.  What does

vacuum verbose;

on the main database say?

> The output of ANALYZE
>
> ANALYZE verbose USERS;
> INFO:  analyzing "public.USERS"
> INFO:  "USERS": scanned 3000 of 54063 pages, containing 128 live rows and 1 dead rows; 128 rows in sample, 2307
estimatedtotal rows 
> ANALYZE

So, 54963 pages hold 128 live database rows.  A page is 8k.  that
means you're storing 128 live rows in approximately a 400+ megabyte
file.

> The output of EXPLAIN query;
>
> select * from USERS where email like '%bijayant.kumar%';
> This simplest query tooks 10 minutes and server loads goes from 0.35 to 16.94.
>
> EXPLAIN select * from USERS where email like '%bijayant.kumar%';
>                          QUERY PLAN
> --------------------------------------------------------------
>  Seq Scan on USERS  (cost=0.00..54091.84 rows=1 width=161)
>   Filter: ((email)::text ~~ '%bijayant.kumar%'::text)
> (2 rows)

You're scanning ~ 54094 sequential pages to retrieve 1 row.   Note
that explain analyze is generally a better choice, it gives more data
useful for troubleshooting.

Definitely need a vacuum full on this table, likely followed by a reindex.

Re: Postgresql is very slow

От

"Scott Marlowe"

Дата:

24 июня 2008 г., 04:53:06

> Definitely need a vacuum full on this table, likely followed by a reindex.

Or a cluster on the table...

Re: Postgresql is very slow

От

tv@fuzzy.cz

Дата:

24 июня 2008 г., 05:17:11

>> I was not aware of the VACUUM functionality earlier, but some times back
>> i read and run this on the server but i did not achieve anything in
>> terms of performance. The server is running from 1 to 1.5 years and we
>> have done VACUUM only once.
>
> vacuuming isn't so much about performance as about maintenance.  You
> don't change the oil in your car to make it go faster, you do it to
> keep it running smoothly.  Don't change it for 1.5 years and you could
> have problems.  sludge build up / dead tuple build up.  Kinda similar.
>

I have to disagree - the VACUUM is a maintenance task, but with a direct
impact on performance. The point is that Postgresql holds dead rows (old
versions, deleted, etc.) until freed by vacuum, and these rows need to be
checked every time (are they still visible to the transaction?). So on a
heavily modified table you may easily end up with most of the tuples being
dead and table consisting of mostly dead tuples.

>> The output of EXPLAIN query;
>>
>> select * from USERS where email like '%bijayant.kumar%';
>> This simplest query tooks 10 minutes and server loads goes from 0.35 to
>> 16.94.
>>
>> EXPLAIN select * from USERS where email like '%bijayant.kumar%';
>>                          QUERY PLAN
>> --------------------------------------------------------------
>>  Seq Scan on USERS  (cost=0.00..54091.84 rows=1 width=161)
>>   Filter: ((email)::text ~~ '%bijayant.kumar%'::text)
>> (2 rows)
>
> You're scanning ~ 54094 sequential pages to retrieve 1 row.   Note
> that explain analyze is generally a better choice, it gives more data
> useful for troubleshooting.

Not necessarily, the 'cost' depends on seq_page_cost and there might be
other value than 1 (which is the default). A better approach is

SELECT relpages, reltuples FROM pg_class WHERE relname = 'users';

which reads the values from system catalogue.

> Definitely need a vacuum full on this table, likely followed by a reindex.

Yes, that's true. I guess the table holds a lot of dead tuples. I'm not
sure why this happens on one server (the new one) and not on the other
one. I guess the original one uses some automatic vacuuming (autovacuum,
cron job, or something like that).

As someone already posted, clustering the table (by primary key for
example) should be much faster than vacuuming and give better performance
in the end. See

http://www.postgresql.org/docs/8.3/interactive/sql-cluster.html

The plain reindex won't help here - it won't remove dead tuples.

Tomas

Re: Postgresql is very slow

От

bijayant kumar

Дата:

24 июня 2008 г., 06:17:44

--- On Tue, 24/6/08, tv@fuzzy.cz <tv@fuzzy.cz> wrote:

> From: tv@fuzzy.cz <tv@fuzzy.cz>
> Subject: Re: [PERFORM] Postgresql is very slow
> To: "Scott Marlowe" <scott.marlowe@gmail.com>
> Cc: bijayant4u@yahoo.com, tv@fuzzy.cz, pgsql-performance@postgresql.org
> Date: Tuesday, 24 June, 2008, 1:47 PM
> >> I was not aware of the VACUUM functionality
> earlier, but some times back
> >> i read and run this on the server but i did not
> achieve anything in
> >> terms of performance. The server is running from 1
> to 1.5 years and we
> >> have done VACUUM only once.
> >
> > vacuuming isn't so much about performance as about
> maintenance.  You
> > don't change the oil in your car to make it go
> faster, you do it to
> > keep it running smoothly.  Don't change it for 1.5
> years and you could
> > have problems.  sludge build up / dead tuple build up.
>  Kinda similar.
> >
>
> I have to disagree - the VACUUM is a maintenance task, but
> with a direct
> impact on performance. The point is that Postgresql holds
> dead rows (old
> versions, deleted, etc.) until freed by vacuum, and these
> rows need to be
> checked every time (are they still visible to the
> transaction?). So on a
> heavily modified table you may easily end up with most of
> the tuples being
> dead and table consisting of mostly dead tuples.
>
> >> The output of EXPLAIN query;
> >>
> >> select * from USERS where email like
> '%bijayant.kumar%';
> >> This simplest query tooks 10 minutes and server
> loads goes from 0.35 to
> >> 16.94.
> >>
> >> EXPLAIN select * from USERS where email like
> '%bijayant.kumar%';
> >>                          QUERY PLAN
> >>
> --------------------------------------------------------------
> >>  Seq Scan on USERS  (cost=0.00..54091.84 rows=1
> width=161)
> >>   Filter: ((email)::text ~~
> '%bijayant.kumar%'::text)
> >> (2 rows)
> >
> > You're scanning ~ 54094 sequential pages to
> retrieve 1 row.   Note
> > that explain analyze is generally a better choice, it
> gives more data
> > useful for troubleshooting.
>
> Not necessarily, the 'cost' depends on
> seq_page_cost and there might be
> other value than 1 (which is the default). A better
> approach is
>
> SELECT relpages, reltuples FROM pg_class WHERE relname =
> 'users';
>
> which reads the values from system catalogue.
>
The Output of query on the Slow Server

SELECT relpages, reltuples FROM pg_class WHERE relname ='users';
 relpages | reltuples
----------+-----------
    54063 |      2307
(1 row)

The Output of query on the old server which is fast

 relpages | reltuples
----------+-----------
       42 |      1637


> > Definitely need a vacuum full on this table, likely
> followed by a reindex.
>

The Slow server load increases whenever i run a simple query, is it the good idea to run VACUUM full on the live
server'sdatabase now or it should be run when the traffic is very low may be in weekend. 

> Yes, that's true. I guess the table holds a lot of dead
> tuples. I'm not
> sure why this happens on one server (the new one) and not
> on the other
> one. I guess the original one uses some automatic vacuuming
> (autovacuum,
> cron job, or something like that).

There was nothing related to VACUUM of database in the crontab.
>
> As someone already posted, clustering the table (by primary
> key for
> example) should be much faster than vacuuming and give
> better performance
> in the end. See
>
> http://www.postgresql.org/docs/8.3/interactive/sql-cluster.html
>
> The plain reindex won't help here - it won't remove
> dead tuples.
>
I am new to Postgres database, i didnt understand the "indexing" part. Is it related to PRIMARY_KEY column of the
table?

Should i have to run:- CLUSTER USERS using 'username';

> Tomas

Send instant messages to your online friends http://uk.messenger.yahoo.com

Re: Postgresql is very slow

От

"Ian Barwick"

Дата:

24 июня 2008 г., 06:33:17

2008/6/24 Scott Marlowe <scott.marlowe@gmail.com>:
> On Mon, Jun 23, 2008 at 11:48 PM, bijayant kumar <bijayant4u@yahoo.com> wrote:
(...)
>> The output of EXPLAIN query;
>>
>> select * from USERS where email like '%bijayant.kumar%';
>> This simplest query tooks 10 minutes and server loads goes from 0.35 to 16.94.
>>
>> EXPLAIN select * from USERS where email like '%bijayant.kumar%';
>>                          QUERY PLAN
>> --------------------------------------------------------------
>>  Seq Scan on USERS  (cost=0.00..54091.84 rows=1 width=161)
>>   Filter: ((email)::text ~~ '%bijayant.kumar%'::text)
>> (2 rows)
>
> You're scanning ~ 54094 sequential pages to retrieve 1 row.   Note
> that explain analyze is generally a better choice, it gives more data
> useful for troubleshooting.
>
> Definitely need a vacuum full on this table, likely followed by a reindex.

This is a LIKE query with a wildcard at the start of the string to
match, reindexing won't help much.


Ian Barwick

Re: Postgresql is very slow

От

tv@fuzzy.cz

Дата:

24 июня 2008 г., 07:01:54

>> Not necessarily, the 'cost' depends on
>> seq_page_cost and there might be
>> other value than 1 (which is the default). A better
>> approach is
>>
>> SELECT relpages, reltuples FROM pg_class WHERE relname =
>> 'users';
>>
>> which reads the values from system catalogue.
>>
> The Output of query on the Slow Server
>
> SELECT relpages, reltuples FROM pg_class WHERE relname ='users';
>  relpages | reltuples
> ----------+-----------
>     54063 |      2307
> (1 row)
>
> The Output of query on the old server which is fast
>
>  relpages | reltuples
> ----------+-----------
>        42 |      1637
>
>

This definitely confirms the suspicion about dead tuples etc. On the old
server the table has 1637 tuples and occupies just 42 pages (i.e. 330kB
with 8k pages), which gives about 0.025 of a page (0.2kB per) per row.

Let's suppose the characteristics of data (row sizes, etc.) are the same
on both servers - in that case the 2307 rows should occuppy about 58
pages, but as you can see from the first output it occupies 54063, i.e.
400MB instead of 450kB.

>> > Definitely need a vacuum full on this table, likely
>> followed by a reindex.
>>
>
> The Slow server load increases whenever i run a simple query, is it the
> good idea to run VACUUM full on the live server's database now or it
> should be run when the traffic is very low may be in weekend.

The load increases because with the queries you've sent the database has
to read the whole table (sequential scan) and may be spread through the
disk (thus the disk has to seek).

I'd recommend running CLUSTER instead of VACUUM - that should be much
faster in this case. It will lock the table, but the performance already
sucks, so I'd probably prefer a short downtime with a much faster
processing after that.

>
>> Yes, that's true. I guess the table holds a lot of dead
>> tuples. I'm not
>> sure why this happens on one server (the new one) and not
>> on the other
>> one. I guess the original one uses some automatic vacuuming
>> (autovacuum,
>> cron job, or something like that).
>
> There was nothing related to VACUUM of database in the crontab.

In that case there's something running vacuum - maybe autovacuum (see
postgresql.conf), or so.

>> As someone already posted, clustering the table (by primary
>> key for
>> example) should be much faster than vacuuming and give
>> better performance
>> in the end. See
>>
>> http://www.postgresql.org/docs/8.3/interactive/sql-cluster.html
>>
>> The plain reindex won't help here - it won't remove
>> dead tuples.
>>
> I am new to Postgres database, i didnt understand the "indexing" part. Is
> it related to PRIMARY_KEY column of the table?

Not sure what you mean by the 'nd

Principle of clustering is quite simple - by sorting the table according
to an index (by the columns in the index) you may get better performance
when using the index. Another 'bonus' is that it compacts the table on the
disk,  so disk seeking is less frequent. These two effects may mean a
serious increase of performance. You may cluster according to any index on
the table, not just by primary key - just choose the most frequently used
index.

Sure, there are some drawbacks - it locks the table, so you may not use it
when the command is running. It's not an incremental operation, the order
is not enforced when modifying the table - when you modify a row the new
version won't respect the order and you have to run the CLUSTER command
from time to time. And it's possible to cluster by one index only.

>
> Should i have to run:- CLUSTER USERS using 'username';

I guess 'username' is a column, so it won't work. You have to choose an
index (I'd recommend the primary key index, i.e. the one with _pk at the
end).

Tomas

Re: Postgresql is very slow

От

bijayant kumar

Дата:

24 июня 2008 г., 09:14:00

Thank you and all very much for your support. Now i have understood the problem related to my server. I will try the
suggestedthing like CLUSTER and then let you all know what happens after that. 

Once again Thanking you all.

Bijayant Kumar


--- On Tue, 24/6/08, tv@fuzzy.cz <tv@fuzzy.cz> wrote:

> From: tv@fuzzy.cz <tv@fuzzy.cz>
> Subject: Re: [PERFORM] Postgresql is very slow
> To: bijayant4u@yahoo.com
> Cc: pgsql-performance@postgresql.org
> Date: Tuesday, 24 June, 2008, 3:32 PM
> >> Not necessarily, the 'cost' depends on
> >> seq_page_cost and there might be
> >> other value than 1 (which is the default). A
> better
> >> approach is
> >>
> >> SELECT relpages, reltuples FROM pg_class WHERE
> relname =
> >> 'users';
> >>
> >> which reads the values from system catalogue.
> >>
> > The Output of query on the Slow Server
> >
> > SELECT relpages, reltuples FROM pg_class WHERE relname
> ='users';
> >  relpages | reltuples
> > ----------+-----------
> >     54063 |      2307
> > (1 row)
> >
> > The Output of query on the old server which is fast
> >
> >  relpages | reltuples
> > ----------+-----------
> >        42 |      1637
> >
> >
>
> This definitely confirms the suspicion about dead tuples
> etc. On the old
> server the table has 1637 tuples and occupies just 42 pages
> (i.e. 330kB
> with 8k pages), which gives about 0.025 of a page (0.2kB
> per) per row.
>
> Let's suppose the characteristics of data (row sizes,
> etc.) are the same
> on both servers - in that case the 2307 rows should occuppy
> about 58
> pages, but as you can see from the first output it occupies
> 54063, i.e.
> 400MB instead of 450kB.
>
> >> > Definitely need a vacuum full on this table,
> likely
> >> followed by a reindex.
> >>
> >
> > The Slow server load increases whenever i run a simple
> query, is it the
> > good idea to run VACUUM full on the live server's
> database now or it
> > should be run when the traffic is very low may be in
> weekend.
>
> The load increases because with the queries you've sent
> the database has
> to read the whole table (sequential scan) and may be spread
> through the
> disk (thus the disk has to seek).
>
> I'd recommend running CLUSTER instead of VACUUM - that
> should be much
> faster in this case. It will lock the table, but the
> performance already
> sucks, so I'd probably prefer a short downtime with a
> much faster
> processing after that.
>
> >
> >> Yes, that's true. I guess the table holds a
> lot of dead
> >> tuples. I'm not
> >> sure why this happens on one server (the new one)
> and not
> >> on the other
> >> one. I guess the original one uses some automatic
> vacuuming
> >> (autovacuum,
> >> cron job, or something like that).
> >
> > There was nothing related to VACUUM of database in the
> crontab.
>
> In that case there's something running vacuum - maybe
> autovacuum (see
> postgresql.conf), or so.
>
> >> As someone already posted, clustering the table
> (by primary
> >> key for
> >> example) should be much faster than vacuuming and
> give
> >> better performance
> >> in the end. See
> >>
> >>
> http://www.postgresql.org/docs/8.3/interactive/sql-cluster.html
> >>
> >> The plain reindex won't help here - it
> won't remove
> >> dead tuples.
> >>
> > I am new to Postgres database, i didnt understand the
> "indexing" part. Is
> > it related to PRIMARY_KEY column of the table?
>
> Not sure what you mean by the 'nd
>
> Principle of clustering is quite simple - by sorting the
> table according
> to an index (by the columns in the index) you may get
> better performance
> when using the index. Another 'bonus' is that it
> compacts the table on the
> disk,  so disk seeking is less frequent. These two effects
> may mean a
> serious increase of performance. You may cluster according
> to any index on
> the table, not just by primary key - just choose the most
> frequently used
> index.
>
> Sure, there are some drawbacks - it locks the table, so you
> may not use it
> when the command is running. It's not an incremental
> operation, the order
> is not enforced when modifying the table - when you modify
> a row the new
> version won't respect the order and you have to run the
> CLUSTER command
> from time to time. And it's possible to cluster by one
> index only.
>
> >
> > Should i have to run:- CLUSTER USERS using
> 'username';
>
> I guess 'username' is a column, so it won't
> work. You have to choose an
> index (I'd recommend the primary key index, i.e. the
> one with _pk at the
> end).
>
> Tomas
>
>
> --
> Sent via pgsql-performance mailing list
> (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Send instant messages to your online friends http://uk.messenger.yahoo.com

Re: Postgresql is very slow

От

PFC

Дата:

24 июня 2008 г., 09:15:43

> SELECT relpages, reltuples FROM pg_class WHERE relname ='users';
>  relpages | reltuples
> ----------+-----------
>     54063 |      2307
> (1 row)

    This is a horribly bloated table.

> The Output of query on the old server which is fast
>
>  relpages | reltuples
> ----------+-----------
>        42 |      1637


    This is a healthy table.

    You need to clean up the users table.
    For this the easiest way is either to VACUUM FULL or CLUSTER it. CLUSTER
will be faster in your case. Use whatever index makes sense, or even the
PK.

> The Slow server load increases whenever i run a simple query, is it the
> good idea to run VACUUM full on the live server's database now or it
> should be run when the traffic is very low may be in weekend.

    Use CLUSTER.
    It is blocking so your traffic will suffer during the operation, which
should not take very long. Since you have very few rows, most of the
needed time will be reading the table from disk. I would suggest to do it
right now.

    CLUSTER users_pk ON users;

    Then, configure your autovacuum so it runs often enough. On a small table
like this (once cleaned up) VACUUM will be very fast, 42 pages should take
just a couple tens of ms to vacuum, so you can do it often.

Postgresql update op is very very slow

От

"jay"

Дата:

25 июня 2008 г., 01:58:28

I've a table with about 34601755 rows ,when I execute 'update msg_table set
type=0;' is very very slow, cost several hours, but still not complete?

Why postgresql is so slowly? Is the PG MVCC problem?

But I try it on Mysql, the same table and rows, it only cost about 340
seconds.

Any idea for the problem?


My machine config:
    Memory 8G, 8 piece 15K disk , 2CPU(Quad-Core) AMD
    OS: Red Hat AS4

My postgres.conf main parameter is following:


shared_buffers = 5GB                    # min 128kB or max_connections*16kB
                                        # (change requires restart)
temp_buffers = 512MB                    # min 800kB
work_mem = 400MB                                # min 64kB
maintenance_work_mem = 600MB            # min 1MB
max_fsm_pages = 262144 # 2G min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 2000                # min 100, ~70 bytes each

bgwriter_delay = 20ms                   # 10-10000ms between rounds
bgwriter_lru_maxpages = 500             # 0-1000 max buffers written/round
bgwriter_lru_multiplier = 2.0           # 0-10.0 multipler on buffers

Re: [SOLVED] Postgresql is very slow

От

bijayant kumar

Дата:

25 июня 2008 г., 03:39:26

Thank you all very very much. After running CLUSTER on the "USERS" table, now the speed is very very good. Now i have
alsounderstood the importance of VACUUM and ANALYZE. 

Once again thank you all very very much. You guys rock.


--- On Tue, 24/6/08, tv@fuzzy.cz <tv@fuzzy.cz> wrote:

> From: tv@fuzzy.cz <tv@fuzzy.cz>
> Subject: Re: [PERFORM] Postgresql is very slow
> To: bijayant4u@yahoo.com
> Cc: pgsql-performance@postgresql.org
> Date: Tuesday, 24 June, 2008, 3:32 PM
> >> Not necessarily, the 'cost' depends on
> >> seq_page_cost and there might be
> >> other value than 1 (which is the default). A
> better
> >> approach is
> >>
> >> SELECT relpages, reltuples FROM pg_class WHERE
> relname =
> >> 'users';
> >>
> >> which reads the values from system catalogue.
> >>
> > The Output of query on the Slow Server
> >
> > SELECT relpages, reltuples FROM pg_class WHERE relname
> ='users';
> >  relpages | reltuples
> > ----------+-----------
> >     54063 |      2307
> > (1 row)
> >
> > The Output of query on the old server which is fast
> >
> >  relpages | reltuples
> > ----------+-----------
> >        42 |      1637
> >
> >
>
> This definitely confirms the suspicion about dead tuples
> etc. On the old
> server the table has 1637 tuples and occupies just 42 pages
> (i.e. 330kB
> with 8k pages), which gives about 0.025 of a page (0.2kB
> per) per row.
>
> Let's suppose the characteristics of data (row sizes,
> etc.) are the same
> on both servers - in that case the 2307 rows should occuppy
> about 58
> pages, but as you can see from the first output it occupies
> 54063, i.e.
> 400MB instead of 450kB.
>
> >> > Definitely need a vacuum full on this table,
> likely
> >> followed by a reindex.
> >>
> >
> > The Slow server load increases whenever i run a simple
> query, is it the
> > good idea to run VACUUM full on the live server's
> database now or it
> > should be run when the traffic is very low may be in
> weekend.
>
> The load increases because with the queries you've sent
> the database has
> to read the whole table (sequential scan) and may be spread
> through the
> disk (thus the disk has to seek).
>
> I'd recommend running CLUSTER instead of VACUUM - that
> should be much
> faster in this case. It will lock the table, but the
> performance already
> sucks, so I'd probably prefer a short downtime with a
> much faster
> processing after that.
>
> >
> >> Yes, that's true. I guess the table holds a
> lot of dead
> >> tuples. I'm not
> >> sure why this happens on one server (the new one)
> and not
> >> on the other
> >> one. I guess the original one uses some automatic
> vacuuming
> >> (autovacuum,
> >> cron job, or something like that).
> >
> > There was nothing related to VACUUM of database in the
> crontab.
>
> In that case there's something running vacuum - maybe
> autovacuum (see
> postgresql.conf), or so.
>
> >> As someone already posted, clustering the table
> (by primary
> >> key for
> >> example) should be much faster than vacuuming and
> give
> >> better performance
> >> in the end. See
> >>
> >>
> http://www.postgresql.org/docs/8.3/interactive/sql-cluster.html
> >>
> >> The plain reindex won't help here - it
> won't remove
> >> dead tuples.
> >>
> > I am new to Postgres database, i didnt understand the
> "indexing" part. Is
> > it related to PRIMARY_KEY column of the table?
>
> Not sure what you mean by the 'nd
>
> Principle of clustering is quite simple - by sorting the
> table according
> to an index (by the columns in the index) you may get
> better performance
> when using the index. Another 'bonus' is that it
> compacts the table on the
> disk,  so disk seeking is less frequent. These two effects
> may mean a
> serious increase of performance. You may cluster according
> to any index on
> the table, not just by primary key - just choose the most
> frequently used
> index.
>
> Sure, there are some drawbacks - it locks the table, so you
> may not use it
> when the command is running. It's not an incremental
> operation, the order
> is not enforced when modifying the table - when you modify
> a row the new
> version won't respect the order and you have to run the
> CLUSTER command
> from time to time. And it's possible to cluster by one
> index only.
>
> >
> > Should i have to run:- CLUSTER USERS using
> 'username';
>
> I guess 'username' is a column, so it won't
> work. You have to choose an
> index (I'd recommend the primary key index, i.e. the
> one with _pk at the
> end).
>
> Tomas
>
>
> --
> Sent via pgsql-performance mailing list
> (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Send instant messages to your online friends http://uk.messenger.yahoo.com

Re: Postgresql update op is very very slow

От

Rusty Conover

Дата:

25 июня 2008 г., 03:42:59

On Jun 24, 2008, at 9:12 PM, jay wrote:

>
> I've a table with about 34601755 rows ,when I execute 'update
> msg_table set
> type=0;' is very very slow, cost several hours, but still not
> complete?
>
> Why postgresql is so slowly? Is the PG MVCC problem?
>
> But I try it on Mysql, the same table and rows, it only cost about 340
> seconds.
>
> Any idea for the problem?
>
>
> My machine config:
>     Memory 8G, 8 piece 15K disk , 2CPU(Quad-Core) AMD
>     OS: Red Hat AS4
>
> My postgres.conf main parameter is following:
>
>

Hi Jay,

Is the "type" used in an index?  Have you properly increased your
number of checkpoint segments?  Any warnings in in your log file about
excessive checkpointing?

Cheers,

Rusty
--
Rusty Conover
InfoGears Inc.
http://www.infogears.com

答复: [PERFORM] Postgresql update op is very very slow

От

"jay"

Дата:

25 июня 2008 г., 04:41:46

Hi Rusty,

The "type" is not in a index. The number of checkpoint segement is 64 and PG version is 8.3.3

After turn on log, I found something about checkpoints.

LOG: 00000: checkpoint complete: wrote 174943 buffers (26.7%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=207.895 s, sync=12.282 s, total=220.205 s

LOCATION: LogCheckpointEnd, xlog.c:5640

LOG: 00000: checkpoint starting: xlog

LOCATION: LogCheckpointStart, xlog.c:5604

LOG: 00000: duration: 11060.593 ms statement: select * from pg_stat_bgwriter;

LOCATION: exec_simple_query, postgres.c:1063

LOG: 00000: checkpoint complete: wrote 173152 buffers (26.4%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=217.455 s, sync=5.059 s, total=222.874 s

LOCATION: LogCheckpointEnd, xlog.c:5640

LOG: 00000: checkpoint starting: xlog

LOCATION: LogCheckpointStart, xlog.c:5604

postgres=# select * from pg_stat_bgwriter;

-------------------+-----------------+--------------------+---------------+------------------+-----------------+---------------

292 | 93 | 16898561 | 243176 | 2303 | 3989550 | 3694189

(1 row)

Is checkpoint too frequency lead the problem?

If it’s, how to solve it ?

-----邮件原件-----
发件人: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] 代表 Rusty Conover
发送时间: 2008年6月25日 13:02
收件人: jay
抄送: pgsql-performance@postgresql.org
主题: Re: [PERFORM] Postgresql update op is very very slow

On Jun 24, 2008, at 9:12 PM, jay wrote:

> I've a table with about 34601755 rows ,when I execute 'update

> msg_table set

> type=0;' is very very slow, cost several hours, but still not

> complete?

> Why postgresql is so slowly? Is the PG MVCC problem?

> But I try it on Mysql, the same table and rows, it only cost about 340

> seconds.

> Any idea for the problem?

> My machine config:

> Memory 8G, 8 piece 15K disk , 2CPU(Quad-Core) AMD

> OS: Red Hat AS4

> My postgres.conf main parameter is following:

Hi Jay,

Is the "type" used in an index? Have you properly increased your

number of checkpoint segments? Any warnings in in your log file about

excessive checkpointing?

Cheers,

Rusty

Rusty Conover

InfoGears Inc.

http://www.infogears.com

Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-performance

PostgreSQL and Ruby on Rails - better accessibility

От

"Amol Pujari"

Дата:

25 июня 2008 г., 05:06:46

Hi, Could anybody comment on the postgres-pr driver, from performance point
of view, is it faster than others?

What other options are available to access postgresql in ruby/ruby on rails?
which of them is most popular, better?

regards
Amol

DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Re: PostgreSQL and Ruby on Rails - better accessibility

От

Nikhils

Дата:

25 июня 2008 г., 05:36:02

Hi,

i, Could anybody comment on the postgres-pr driver, from performance point
of view, is it faster than others?

I guess, a more appropriate place to check out for ruby/rails postgres drivers would be rubyforge.org itself. There is a libpq based postgres driver available there (ruby-postgres) but YMMV.

What other options are available to access postgresql in ruby/ruby on rails?
which of them is most popular, better?

Again refer to rubyforge.org. There is a RubyES project amongst others. And lastly and more importantly I think this list is appropriate for Postgres database backend related performance questions only.

Regards,
Nikhils

--
EnterpriseDB http://www.enterprisedb.com

Re: Postgresql update op is very very slow

От

"Heikki Linnakangas"

Дата:

25 июня 2008 г., 07:11:17

jay wrote:
> I've a table with about 34601755 rows ,when I execute 'update msg_table set
> type=0;' is very very slow, cost several hours, but still not complete?
>
> Why postgresql is so slowly? Is the PG MVCC problem?

Possibly. Because of MVCC, a full-table update will actually create a
new version of each row.

I presume that's a one-off query, or a seldom-run batch operation, and
not something your application needs to do often. In that case, you
could drop all indexes, and recreate them after the update, which should
help a lot:

BEGIN;
DROP INDEX <index name>, <index name 2>, ...; -- for each index
UPDATE msg_table SET type = 0;
CREATE INDEX ... -- Recreate indexes
COMMIT;

Or even better, instead of using UPDATE, do a SELECT INTO a new table,
drop the old one, and rename the new one in its place. That has the
advantage that the new table doesn't contain the old row version, so you
don't need to vacuum right away to reclaim the space.

Actually, there's an even more clever trick to do roughly the same thing:

ALTER TABLE msg_table ALTER COLUMN type TYPE int4 USING 0;

(assuming type is int4, replace with the actual data type if necessary)

This will rewrite the table, similar to a DROP + CREATE, and rebuild all
indexes. But all in one command.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Postgresql update op is very very slow

От

Greg Smith

Дата:

25 июня 2008 г., 10:51:15

On Wed, 25 Jun 2008, jay wrote:

> Why postgresql is so slowly? Is the PG MVCC problem?

Update is extremely intensive not just because of MVCC, but because a
new version of all the rows are being written out.  This forces both lots
of database commits and lots of complicated disk I/O to accomplish.

Couple of suggestions:
-Increase checkpoint_segments a lot; start with a 10X increase to 30.
-If you can afford some potential for data loss in case of a crash,
consider using async commit:
http://www.postgresql.org/docs/8.3/static/wal-async-commit.html

>     Memory 8G, 8 piece 15K disk , 2CPU(Quad-Core) AMD

Is there any sort of write cache on the controller driving those disks?
If not, or if you've turned it off, that would explain your problem right
there, because you'd be limited by how fast you can sync to disk after
each update.  Async commit is the only good way around that.  If you have
a good write cache, that feature won't buy you as much improvement.

> bgwriter_delay = 20ms                   # 10-10000ms between rounds
> bgwriter_lru_maxpages = 500             # 0-1000 max buffers written/round
> bgwriter_lru_multiplier = 2.0           # 0-10.0 multipler on buffers

This a bit much and the background writer can get in the way in this
situation.  You might turn it off (bgwriter_lru_maxpages = 0) until you've
sorted through everything else, then increase that parameter again.  The
combination of 20ms and 500 pages is far faster than your disk system can
possibly handle anyway; 100ms/500 or 20ms/100 (those two are approximately
the same) would be as aggressive as I'd even consider with an 8-disk
array, and something lower is probably more appropriate for you.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

答复: [PERFORM] Postgresql update op is very very slow

От

"jay"

Дата:

26 июня 2008 г., 07:06:11

    I know the problem, because there are about 35 million rows , which
cost about 12G disk space and checkpoint segments use 64, but update
operation is in one transaction which lead fast fill up the checkpoint
segments and lead do checkpoints frequently, but checkpoints will cost lots
resources, so update operation become slowly and slowly and bgwrite won't
write because it's not commit yet.
Create a new table maybe a quick solution, but it's not appropriated in some
cases.
    If we can do commit very 1000 row per round, it may resolve the
problem.
But  PG not support transaction within function yet?

-----邮件原件-----
发件人: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] 代表 Heikki Linnakangas
发送时间: 2008年6月25日 18:11
收件人: jay
抄送: pgsql-performance@postgresql.org
主题: Re: [PERFORM] Postgresql update op is very very slow

jay wrote:
> I've a table with about 34601755 rows ,when I execute 'update msg_table
set
> type=0;' is very very slow, cost several hours, but still not complete?
>
> Why postgresql is so slowly? Is the PG MVCC problem?

Possibly. Because of MVCC, a full-table update will actually create a
new version of each row.

I presume that's a one-off query, or a seldom-run batch operation, and
not something your application needs to do often. In that case, you
could drop all indexes, and recreate them after the update, which should
help a lot:

BEGIN;
DROP INDEX <index name>, <index name 2>, ...; -- for each index
UPDATE msg_table SET type = 0;
CREATE INDEX ... -- Recreate indexes
COMMIT;

Or even better, instead of using UPDATE, do a SELECT INTO a new table,
drop the old one, and rename the new one in its place. That has the
advantage that the new table doesn't contain the old row version, so you
don't need to vacuum right away to reclaim the space.

Actually, there's an even more clever trick to do roughly the same thing:

ALTER TABLE msg_table ALTER COLUMN type TYPE int4 USING 0;

(assuming type is int4, replace with the actual data type if necessary)

This will rewrite the table, similar to a DROP + CREATE, and rebuild all
indexes. But all in one command.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow

От

"Pavan Deolasee"

Дата:

26 июня 2008 г., 07:31:58

2008/6/26 jay <jackem.mojx@alibaba-inc.com>:

>        If we can do commit very 1000 row per round, it may resolve the
> problem.
> But  PG not support transaction within function yet?
>

Yeah, transaction control is not supported inside functions. There are
some hacks using dblink to do transactions inside functions. You may
want to check that out.

I had suggested another hack in the past for very simplistic updates,
when you are sure that the tuple length does not change between
updates and you are ready to handle half updated table if there is a
crash or failure in between. May be for your case, where you are
updating a single column of the entire table and setting it to some
default value for all the rows, it may work fine. But please be aware
of data consistency issues before you try that. And it must be once in
a lifetime kind of hack.

http://postgresql-in.blogspot.com/2008/04/postgresql-in-place-update.html

Thanks,
Pavan


--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: ??: Postgresql update op is very very slow

От

"Heikki Linnakangas"

Дата:

26 июня 2008 г., 09:19:22

jay wrote:
>     I know the problem, because there are about 35 million rows , which
> cost about 12G disk space and checkpoint segments use 64, but update
> operation is in one transaction which lead fast fill up the checkpoint
> segments and lead do checkpoints frequently, but checkpoints will cost lots
> resources, so update operation become slowly and slowly and bgwrite won't
> write because it's not commit yet.
> Create a new table maybe a quick solution, but it's not appropriated in some
> cases.
>     If we can do commit very 1000 row per round, it may resolve the
> problem.

Committing more frequently won't help you with checkpoints. The updates
will generate just as much WAL regardless of how often you commit, so
you will have to checkpoint just as often. And commits have no effect on
bgwriter either; bgwriter will write just as much regardless of how
often you commit.

One idea would be to partition the table vertically, that is, split the
table into two tables, so that the columns that you need to update like
that are in one table, together with the primary key, and the rest of
the columns are in another table. That way the update won't need to scan
or write the columns that are not changed. You can create a view on top
of the two tables to make them look like the original table to the
application.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: ??: Postgresql update op is very very slow

От

"Holger Hoffstaette"

Дата:

26 июня 2008 г., 10:05:09

Hi -

I have been following this thread and find some of the recommendations
really surprising. I understand that MVCC necessarily creates overhead,
in-place updates would not be safe against crashes etc. but have a hard
time believing that this is such a huge problem for RDBMS in 2008. How do
large databases treat mass updates? AFAIK both DB2 and Oracle use MVCC
(maybe a different kind?) as well, but I cannot believe that large updates
still pose such big problems.
Are there no options (algorithms) for adaptively choosing different
update strategies that do not incur the full MVCC overhead?

Holger

(Disclaimer: I'm not a professional DBA, just a curious developer).

Re: ??: Postgresql update op is very very slow

От

Craig Ringer

Дата:

26 июня 2008 г., 10:17:42

Holger Hoffstaette wrote:
> Hi -
>
> I have been following this thread and find some of the recommendations
> really surprising. I understand that MVCC necessarily creates overhead,
> in-place updates would not be safe against crashes etc. but have a hard
> time believing that this is such a huge problem for RDBMS in 2008. How do
> large databases treat mass updates? AFAIK both DB2 and Oracle use MVCC
> (maybe a different kind?) as well, but I cannot believe that large updates
> still pose such big problems.
> Are there no options (algorithms) for adaptively choosing different
> update strategies that do not incur the full MVCC overhead?

I think Pg already does in place updates, or close, if the tuples being
replaced aren't referenced by any in-flight transaction. I noticed a
while ago that if I'm doing bulk load/update work, if there aren't any
other transactions no MVCC bloat seems to occur and updates are faster.

I'd be interested to have this confirmed, as I don't think I've seen it
documented anywhere. Is it a side-effect/benefit of HOT somehow?

--
Craig Ringer

Re: ??: Postgresql update op is very very slow

От

Andrew Sullivan

Дата:

26 июня 2008 г., 10:53:57

On Thu, Jun 26, 2008 at 02:40:59PM +0200, Holger Hoffstaette wrote:

> large databases treat mass updates? AFAIK both DB2 and Oracle use MVCC
> (maybe a different kind?) as well, but I cannot believe that large updates
> still pose such big problems.

DB2 does not use MVCC.  This is why lock escalation is such a big
problem for them.

Oracle uses a kind of MVCC based on rollback segments: your work goes
into the rollback segment, so that it can be undone, and the update
happens in place.  This causes a different kind of pain: you can run
out of rollback segments (part way through a long-running transaction,
even) and then have to undo everything in order to do any work at
all.  Every system involves trade-offs, and different systems make
different ones.  The bulk update problem is PostgreSQL's weak spot,
and for that cost one gets huge other benefits.

> Are there no options (algorithms) for adaptively choosing different
> update strategies that do not incur the full MVCC overhead?

How would you pick?  But one thing you could do is create the table
with a non-standard fill factor, which might allow HOT to work its magic.

A

--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: ??: Postgresql update op is very very slow

От

Andrew Sullivan

Дата:

26 июня 2008 г., 10:55:21

On Thu, Jun 26, 2008 at 09:16:25PM +0800, Craig Ringer wrote:

> I think Pg already does in place updates, or close, if the tuples being
> replaced aren't referenced by any in-flight transaction. I noticed a while
> ago that if I'm doing bulk load/update work, if there aren't any other
> transactions no MVCC bloat seems to occur and updates are faster.

Are you on 8.3?  That may be HOT working for you.  MVCC doesn't get
turned off if there are no other transactions (it can't: what if
another transaction starts part way through yours?).

A

--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: ??: Postgresql update op is very very slow

От

Mark Mielke

Дата:

26 июня 2008 г., 11:53:37

Holger Hoffstaette wrote:
> Hi -
>
> I have been following this thread and find some of the recommendations
> really surprising. I understand that MVCC necessarily creates overhead,
> in-place updates would not be safe against crashes etc. but have a hard
> time believing that this is such a huge problem for RDBMS in 2008. How do
> large databases treat mass updates? AFAIK both DB2 and Oracle use MVCC
> (maybe a different kind?) as well, but I cannot believe that large updates
> still pose such big problems.
> Are there no options (algorithms) for adaptively choosing different
> update strategies that do not incur the full MVCC overhead?
>

My opinion:

Any system that provides cheap UPDATE operations is either not ACID
compliant, or is not designed for highly concurrent access, possibly
both. By ACID compliant I mean that there both the OLD and NEW need to
take space on the hard disk in order to guarantee that if a failure
occurs in the middle of the transaction, one can select only the OLD
versions for future transactions, or if it fails after the end fo the
transaction, one can select only the NEW versions for future
transactions. If both must be on disk, it follows that updates are
expensive. Even with Oracle rollback segments - the rollback segments
need to be written. Perhaps they will be more sequential, and able to be
written more efficiently, but the data still needs to be written. The
other option is to make sure that only one person is doing updates at a
time, and in this case it becomes possible (although not necessarily
safe unless one implements the ACID compliant behaviour described in the
previous point) for one operation to complete before the next begins.

The HOT changes introduced recently into PostgreSQL should reduce the
cost of updates in many cases (but not all - I imagine that updating ALL
rows is still expensive).

There is a third system I can think of, but I think it's more
theoretical than practical. That is, remember the list of changes to
each row/column and "replay" them on query. The database isn't ever
stored in a built state, but is only kept as pointers that allow any
part of the table to be re-built on access. The UPDATE statement could
be recorded cheaply, but queries against the UPDATE statement might be
very expensive. :-)

Cheers,
mark

--
Mark Mielke <mark@mielke.cc>

Re: 答复: [PERFORM] Postgresql update op is very very slow

От

Tom Lane

Дата:

26 июня 2008 г., 12:04:44

"jay" <jackem.mojx@alibaba-inc.com> writes:
>     I know the problem, because there are about 35 million rows , which
> cost about 12G disk space and checkpoint segments use 64, but update
> operation is in one transaction which lead fast fill up the checkpoint
> segments and lead do checkpoints frequently, but checkpoints will cost lots
> resources, so update operation become slowly and slowly and bgwrite won't
> write because it's not commit yet.
> Create a new table maybe a quick solution, but it's not appropriated in some
> cases.
>     If we can do commit very 1000 row per round, it may resolve the
> problem.

No, that's utterly unrelated.  Transaction boundaries have nothing to do
with checkpoints.

            regards, tom lane

Re: [PERFORM] Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow

От

"Scott Marlowe"

Дата:

26 июня 2008 г., 13:00:40

2008/6/26 Pavan Deolasee <pavan.deolasee@gmail.com>:
> 2008/6/26 jay <jackem.mojx@alibaba-inc.com>:
>
>>        If we can do commit very 1000 row per round, it may resolve the
>> problem.
>> But  PG not support transaction within function yet?
>>
>
> Yeah, transaction control is not supported inside functions. There are
> some hacks using dblink to do transactions inside functions. You may
> want to check that out.

If you need autonomous transactions.  For most people save points and
catching seem to be a n acceptable form of transaction control.

> I had suggested another hack in the past for very simplistic updates,
> when you are sure that the tuple length does not change between
> updates and you are ready to handle half updated table if there is a
> crash or failure in between. May be for your case, where you are
> updating a single column of the entire table and setting it to some
> default value for all the rows, it may work fine. But please be aware
> of data consistency issues before you try that. And it must be once in
> a lifetime kind of hack.
>
> http://postgresql-in.blogspot.com/2008/04/postgresql-in-place-update.html

In a way that's what pg_bulkloader does.

Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow

От

"Scott Marlowe"

Дата:

26 июня 2008 г., 13:03:03

2008/6/26 Tom Lane <tgl@sss.pgh.pa.us>:
> "jay" <jackem.mojx@alibaba-inc.com> writes:
>>       I know the problem, because there are about 35 million rows , which
>> cost about 12G disk space and checkpoint segments use 64, but update
>> operation is in one transaction which lead fast fill up the checkpoint
>> segments and lead do checkpoints frequently, but checkpoints will cost lots
>> resources, so update operation become slowly and slowly and bgwrite won't
>> write because it's not commit yet.
>> Create a new table maybe a quick solution, but it's not appropriated in some
>> cases.
>>       If we can do commit very 1000 row per round, it may resolve the
>> problem.
>
> No, that's utterly unrelated.  Transaction boundaries have nothing to do
> with checkpoints.

True.  But if you update 10000 rows and vacuum you can keep the bloat
to something reasonable.

On another note, I haven't seen anyone suggest adding the appropriate
where clause to keep from updating rows that already match.  Cheap
compared to updating the whole table even if a large chunk aren't a
match.  i.e.

... set col=0 where col <>0;

That should  be the first thing you reach for in this situation, if it can help.

Re: ??: Postgresql update op is very very slow

От

Greg Smith

Дата:

26 июня 2008 г., 19:15:33

On Thu, 26 Jun 2008, Holger Hoffstaette wrote:

> How do large databases treat mass updates? AFAIK both DB2 and Oracle use
> MVCC (maybe a different kind?) as well

An intro to the other approaches used by Oracle and DB2 (not MVCC) is at

http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007#Transaction_Locking_and_Scalability

(a URL which I really need to shorten one day).

> Are there no options (algorithms) for adaptively choosing different
> update strategies that do not incur the full MVCC overhead?

If you stare at the big picture of PostgreSQL's design, you might notice
that it usually aims to do things one way and get that implementation
right for the database's intended audience.  That intended audience cares
about data integrity and correctness and is willing to suffer the overhead
that goes along with operating that way.  There's few "I don't care about
reliability here so long as it's fast" switches you can flip, and not
having duplicate code paths to support them helps keep the code simpler
and therefore more reliable.

This whole area is one of those good/fast/cheap trios.  If you want good
transaction guarantees on updates, you either get the hardware and
settings right to handle that (!cheap), or it's slow.  The idea of
providing a !good/fast/cheap option for updates might have some
theoretical value, but I think you'd find it hard to get enough support
for that idea to get work done on it compared to the other things
developer time is being spent on right now.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: ??: Postgresql update op is very very slow

От

Greg Smith

Дата:

26 июня 2008 г., 19:20:29

On Thu, 26 Jun 2008, Craig Ringer wrote:

> I'd be interested to have this confirmed, as I don't think I've seen it
> documented anywhere. Is it a side-effect/benefit of HOT somehow?

The documentation is in README.HOT, for example:
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/heap/README.HOT?rev=1.3;content-type=text%2Fplain

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Postgresql is very slow