Обсуждение: Response time increases over time

Поиск
Список
Период
Сортировка

Response time increases over time

От
Havasvölgyi Ottó
Дата:


Hi all,


I am running a load simulation on Debian  with PostgreSQL 8.4.9 (standard Debian package).
Certain number of clients do the following stepsin a transaction (read commited level) periodically (about 1.1 transaction per second / client) and concurrently:

-reads a record of table Machine and State (they each have about 300 records, read size is about 1.4 KB)
-reads a record of table Card (it has about 1200 records)
-reads some other records from other tables, all these are straightforward, single line queries (here there are even less records in the tables)
-updates Machine (1 record, updates 2-3 integer values)
-updates State (1 record, updates a bytea field, about 1,3KB)
-updates Card (1 record, updates an integer)
-inserts 1-1 record into 2 log tables

Its important, that each client updates different records, so there is no update conflict. There are no triggers or rules. Tables have simple indexes, 2 at most.

As I run the simulation with more and more clients, I can observe, that at the beginning of the simulation the transaction times are quite acceptable (20-30 ms) and quite uniform/smooth, but as the simultion progresses, it becomes higher (30-40-50-60 ms) and more and more non-uniform, but the tps interestingly remains the same during the simulation. With 100 clients this kind of behaviour can be seen very well. The simulation's duration is 500 sec.
I wonder why this happens on this server, and how I can keep the response time as low as at the beginning.

Just for comparison, I ran the same simulation on a Windows 7 notebook machine but with PostgreSQL 9.1.2 (downloaded from EnterpriseDB's site, not PostgreSQL Plus), and it did not show this problem even with 120 clients. It's transaction times were surprisingly smooth and consistent. The client code was the same in the 2 cases.
Actually I ran first the test on the Windows machine, and after that on the better Debian. I expected that it would be even better there. Network latency is quite minimal, because the clients and the database server run on VMs on a server machine in the Linux case.

Here is some important config variables from the 8.4 (9.1.2 is configured similarly):

 

ssl=false

shared_buffers=24MB (OS max currently, but should not be a problem because 9.1.2 performed quite well on Windows with 24 MB)

work_mem=1MB

maintainance_work_mem=16MB

 

fsync=on

sync_commit=on

wal_sync_method=fsync

full_page_writes=on

wal_buffers=1MB

commit_delay=0

checkpoint segments=8

 

effective_cache_size=256MB

 

vacuum: default bgwriter: default


I suspected that due to the lot of update, the tables get bloated with dead rows, but vacuum analyze verbose did not show that.
It seems that something cannot keep up with the load, but tps does not change, just the response time gets higher.
Could you please help me with what can cause this kind of behaviour on Linux?
What setting should I change perhaps?
Is there so much difference between 8.4 and 9.1, or is this something else?
Please tell me if any other info is needed.

Thanks in advance,
Otto

Re: Response time increases over time

От
Josh Berkus
Дата:
On 12/6/11 4:30 PM, Havasvölgyi Ottó wrote:
> Is there so much difference between 8.4 and 9.1, or is this something else?
> Please tell me if any other info is needed.

It is fairly likely that the difference you're seeing here is due to
improvements made in checkpointing and other operations made between 8.4
and 9.1.

Is there some reason you didn't test 9.1 on Linux to compare the two?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Response time increases over time

От
Havasvölgyi Ottó
Дата:
Thanks, Josh.
The only reason I tried 8.4 first is that it was available for Debian as compiled package, so it was simpler for me to do it. Anyway I am going to test 9.1 too. I will post about the results.

Best reagrds,
Otto


2011/12/7 Josh Berkus <josh@agliodbs.com>
On 12/6/11 4:30 PM, Havasvölgyi Ottó wrote:
> Is there so much difference between 8.4 and 9.1, or is this something else?
> Please tell me if any other info is needed.

It is fairly likely that the difference you're seeing here is due to
improvements made in checkpointing and other operations made between 8.4
and 9.1.

Is there some reason you didn't test 9.1 on Linux to compare the two?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Response time increases over time

От
Mario Splivalo
Дата:
On 12/07/2011 09:23 AM, Havasvölgyi Ottó wrote:
> Thanks, Josh.
> The only reason I tried 8.4 first is that it was available for Debian as
> compiled package, so it was simpler for me to do it. Anyway I am going
> to test 9.1 too. I will post about the results.
>

If you're using squeeze, you can get 9.1 from the debian backports.

    Mario

Re: Response time increases over time

От
Havasvölgyi Ottó
Дата:
Thanks for that Mario, I will check it out.

@All:
Anyway, I have compiled 9.1.2 from source, and unfortunately the performance haven't got better at the same load, it is consistently quite low (~70 ms average transaction time with 100 clients) on this Debian. I am quite surprised about this, it is unrealistically high.
I have run pg_test_fsync, and showed about 2600 fsync/sec, which means HDD has write caching on (it is a 7200 rpm drive, there is no HW RAID controller). However my other machine, the simple Win7 one, on which performance was so good and consistent, fsync/sec was a lot lower, only about 100 as I can remember, so it probably really flushed each transaction to disk.
I have also run load simulation on this Debian machine with InnoDb, and it performed quite well, so the machine itself is good enough to handle this. On the other hand it is quite poor on Win7, but that's another story...

So there seems to be something on this Debian machine that hinders PostgreSQL to perform better. With 8.4 I logged slow queries (with 9.1 not yet), and almost all were COMMIT, taking 10-20-30 or even more ms. But at the same time the fsync rate can be quite high based on pg_test_fsync, so probably not fsync is what makes it slow. Performance seems to degrade drastically as I increase the concurrency, mainly concurrent commit has problems as I can see.
I also checked that connection pooling works well, and clients don't close/open connections.
I also have a graph about outstanding transaction count over time, and it is quite strange: it shows that low performce (20-30 xacts at a time) and high-performace (<5 xact at a time) parts are alternating quite frequently instead of being more even.
Do anybody have any idea based on this info about what can cause such behaviour, or what I could check or try?

Thanks in advance,
Otto

2011/12/7 Mario Splivalo <mario.splivalo@megafon.hr>
On 12/07/2011 09:23 AM, Havasvölgyi Ottó wrote:
> Thanks, Josh.
> The only reason I tried 8.4 first is that it was available for Debian as
> compiled package, so it was simpler for me to do it. Anyway I am going
> to test 9.1 too. I will post about the results.
>

If you're using squeeze, you can get 9.1 from the debian backports.

       Mario

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Response time increases over time

От
Aidan Van Dyk
Дата:
On Wed, Dec 7, 2011 at 5:13 PM, Havasvölgyi Ottó
<havasvolgyi.otto@gmail.com> wrote:

> So there seems to be something on this Debian machine that hinders
> PostgreSQL to perform better. With 8.4 I logged slow queries (with 9.1 not
> yet), and almost all were COMMIT, taking 10-20-30 or even more ms. But at
> the same time the fsync rate can be quite high based on pg_test_fsync, so
> probably not fsync is what makes it slow. Performance seems to degrade
> drastically as I increase the concurrency, mainly concurrent commit has
> problems as I can see.

> Do anybody have any idea based on this info about what can cause such
> behaviour, or what I could check or try?

Let me guess, debian squeeze, with data and xlog on both on a single
ext3 filesystem, and the fsync done by your commit (xlog) is flushing
all the dirty data of the entire filesystem (including PG data writes)
out before it can return...

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Response time increases over time

От
Havasvölgyi Ottó
Дата:
Yes, ext3 is the global file system, and you are right, PG xlog and data are on this one.
Is this really what happens Aidan at fsync?
What is be the best I can do?
Mount xlog directory to a separate file system?
If so, which file system fits the best for this purpose?
Should I also mount the data separately, or is that not so important?

The strange thing is that InnoDb data and xlog are also on the same filesystem, but on a separate one (ext4) from the global one.

Thanks,
Otto




2011/12/8 Aidan Van Dyk <aidan@highrise.ca>
On Wed, Dec 7, 2011 at 5:13 PM, Havasvölgyi Ottó
<havasvolgyi.otto@gmail.com> wrote:

> So there seems to be something on this Debian machine that hinders
> PostgreSQL to perform better. With 8.4 I logged slow queries (with 9.1 not
> yet), and almost all were COMMIT, taking 10-20-30 or even more ms. But at
> the same time the fsync rate can be quite high based on pg_test_fsync, so
> probably not fsync is what makes it slow. Performance seems to degrade
> drastically as I increase the concurrency, mainly concurrent commit has
> problems as I can see.

> Do anybody have any idea based on this info about what can cause such
> behaviour, or what I could check or try?

Let me guess, debian squeeze, with data and xlog on both on a single
ext3 filesystem, and the fsync done by your commit (xlog) is flushing
all the dirty data of the entire filesystem (including PG data writes)
out before it can return...

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Response time increases over time

От
Marti Raudsepp
Дата:
On Thu, Dec 8, 2011 at 06:37, Aidan Van Dyk <aidan@highrise.ca> wrote:
> Let me guess, debian squeeze, with data and xlog on both on a single
> ext3 filesystem, and the fsync done by your commit (xlog) is flushing
> all the dirty data of the entire filesystem (including PG data writes)
> out before it can return...

This is fixed with the data=writeback mount option, right?
(If it's the root file system, you need to add
rootfsflags=data=writeback to your kernel boot flags)

While this setting is safe and recommended for PostgreSQL and other
transactional databases, it can cause garbage to appear in recently
written files after a crash/power loss -- for applications that don't
correctly fsync data to disk.

Regards,
Marti

Re: Response time increases over time

От
Havasvölgyi Ottó
Дата:
I have moved the data directory (xlog, base, global, and everything) to an ext4 file system. The result hasn't changed unfortuately. With the same load test the average response time: 80ms; from 40ms to 120 ms everything occurs.
This ext4 has default settings in fstab.
Have you got any other idea what is going on here?

Thanks,
Otto




2011/12/8 Marti Raudsepp <marti@juffo.org>
On Thu, Dec 8, 2011 at 06:37, Aidan Van Dyk <aidan@highrise.ca> wrote:
> Let me guess, debian squeeze, with data and xlog on both on a single
> ext3 filesystem, and the fsync done by your commit (xlog) is flushing
> all the dirty data of the entire filesystem (including PG data writes)
> out before it can return...

This is fixed with the data=writeback mount option, right?
(If it's the root file system, you need to add
rootfsflags=data=writeback to your kernel boot flags)

While this setting is safe and recommended for PostgreSQL and other
transactional databases, it can cause garbage to appear in recently
written files after a crash/power loss -- for applications that don't
correctly fsync data to disk.

Regards,
Marti

Re: Response time increases over time

От
Bob Lunney
Дата:
Otto,

Separate the pg_xlog directory onto its own filesystem and retry your tests.

Bob Lunney


From: Havasvölgyi Ottó <havasvolgyi.otto@gmail.com>
To: Marti Raudsepp <marti@juffo.org>
Cc: Aidan Van Dyk <aidan@highrise.ca>; pgsql-performance@postgresql.org
Sent: Thursday, December 8, 2011 9:48 AM
Subject: Re: [PERFORM] Response time increases over time

I have moved the data directory (xlog, base, global, and everything) to an ext4 file system. The result hasn't changed unfortuately. With the same load test the average response time: 80ms; from 40ms to 120 ms everything occurs.
This ext4 has default settings in fstab.
Have you got any other idea what is going on here?

Thanks,
Otto




2011/12/8 Marti Raudsepp <marti@juffo.org>
On Thu, Dec 8, 2011 at 06:37, Aidan Van Dyk <aidan@highrise.ca> wrote:
> Let me guess, debian squeeze, with data and xlog on both on a single
> ext3 filesystem, and the fsync done by your commit (xlog) is flushing
> all the dirty data of the entire filesystem (including PG data writes)
> out before it can return...

This is fixed with the data=writeback mount option, right?
(If it's the root file system, you need to add
rootfsflags=data=writeback to your kernel boot flags)

While this setting is safe and recommended for PostgreSQL and other
transactional databases, it can cause garbage to appear in recently
written files after a crash/power loss -- for applications that don't
correctly fsync data to disk.

Regards,
Marti



Re: Response time increases over time

От
Havasvölgyi Ottó
Дата:
I have put pg_xlog back to the ext3 partition, but nothing changed.
I have also switched off sync_commit, but nothing. This is quite interesting...
Here is a graph about the transaction time (sync_commit off, pg_xlog on separate file system): Graph
On the graph the red line up there is the tranaction/sec, it is about 110, and does not get lower as the transaction time gets higher.
Based on this, am I right that it is not the commit, that causes these high transaction times?
Kernel version is 2.6.32.
Any idea is appreciated.

Thanks,
Otto




2011/12/8 Bob Lunney <bob_lunney@yahoo.com>
Otto,

Separate the pg_xlog directory onto its own filesystem and retry your tests.

Bob Lunney


From: Havasvölgyi Ottó <havasvolgyi.otto@gmail.com>
To: Marti Raudsepp <marti@juffo.org>
Cc: Aidan Van Dyk <aidan@highrise.ca>; pgsql-performance@postgresql.org
Sent: Thursday, December 8, 2011 9:48 AM

Subject: Re: [PERFORM] Response time increases over time

I have moved the data directory (xlog, base, global, and everything) to an ext4 file system. The result hasn't changed unfortuately. With the same load test the average response time: 80ms; from 40ms to 120 ms everything occurs.
This ext4 has default settings in fstab.
Have you got any other idea what is going on here?

Thanks,
Otto




2011/12/8 Marti Raudsepp <marti@juffo.org>
On Thu, Dec 8, 2011 at 06:37, Aidan Van Dyk <aidan@highrise.ca> wrote:
> Let me guess, debian squeeze, with data and xlog on both on a single
> ext3 filesystem, and the fsync done by your commit (xlog) is flushing
> all the dirty data of the entire filesystem (including PG data writes)
> out before it can return...

This is fixed with the data=writeback mount option, right?
(If it's the root file system, you need to add
rootfsflags=data=writeback to your kernel boot flags)

While this setting is safe and recommended for PostgreSQL and other
transactional databases, it can cause garbage to appear in recently
written files after a crash/power loss -- for applications that don't
correctly fsync data to disk.

Regards,
Marti