Обсуждение: Performance degrade running on multicore computer
Hi,
My PostgreSQL server has two CPUs (OS: Fedora 11), each with 4 cores. Total is 8cores. Now I have several clients running at the same time to do insert and update on the same table, each client having its own connection. I have made two testing with clients running in parallel to load 20M data in total. Each testing, the data is split evenly by the client number such that each client only loads a piece of data.
1) Long transaction: A client does the commit at the end of loading. Result: Each postgres consumes 95% CPU. The more clients run in parallel, the slower the total runing time is (when 8 clients, it is slowest). However, I expect the more clients run in parallel, it should be faster to load all the data.
2) Short transaction: I set the clients to do a commit on loading every 500 records. Results: Each postgres consumes about 50%CPU. Now the total running is as what i have expected; the more clients run in parallel, the faster it is (when 8 clients, it is fastest).
Could anybody help to why when I do the long transaction with 8 clients, it is slowest? How can I solve this problem? As I don't want to use the 2), in which I have to set the commit size each time.
Thanks a lot!!
-Afancy
My PostgreSQL server has two CPUs (OS: Fedora 11), each with 4 cores. Total is 8cores. Now I have several clients running at the same time to do insert and update on the same table, each client having its own connection. I have made two testing with clients running in parallel to load 20M data in total. Each testing, the data is split evenly by the client number such that each client only loads a piece of data.
1) Long transaction: A client does the commit at the end of loading. Result: Each postgres consumes 95% CPU. The more clients run in parallel, the slower the total runing time is (when 8 clients, it is slowest). However, I expect the more clients run in parallel, it should be faster to load all the data.
2) Short transaction: I set the clients to do a commit on loading every 500 records. Results: Each postgres consumes about 50%CPU. Now the total running is as what i have expected; the more clients run in parallel, the faster it is (when 8 clients, it is fastest).
Could anybody help to why when I do the long transaction with 8 clients, it is slowest? How can I solve this problem? As I don't want to use the 2), in which I have to set the commit size each time.
Thanks a lot!!
-Afancy
afancy <groupme@gmail.com> writes: > My PostgreSQL server has two CPUs (OS: Fedora 11), each with 4 cores. Total > is 8cores. Now I have several clients running at the same time to do insert > and update on the same table, each client having its own connection. I have > made two testing with clients running in parallel to load 20M data in > total. Each testing, the data is split evenly by the client number such that > each client only loads a piece of data. What exactly are you doing when you "load data"? There are some code paths that are slower if they have to examine not-yet-committed tuples, and your report sounds a bit like that might be what's happening. But with so few details (not even a Postgres version number :-() it's difficult to be sure of anything. regards, tom lane
Hi,
I am using the PostgreSQL 8.4. What is the code path? After a row is inserted to the table, it will update the fields of "validfrom", and "validto". Followings are the table structure, data, and the performance data:
xiliu=# \d page
Table "pyetlexa.page"
Column | Type | Modifiers
-----------------+-------------------+-----------
pageid | integer | not null
url | character varying |
size | integer |
validfrom | date |
validto | date |
version | integer |
domainid | integer |
serverversionid | integer |
Indexes:
"page_pkey" PRIMARY KEY, btree (pageid)
"url_version_idx" btree (url, version DESC)
Here is the data in this table:
http://imagebin.ca/img/KyxMDIKq.png
Here is the performance data by "top":
http://imagebin.ca/img/2ssw4wEQ.png
Regards,
afancy
I am using the PostgreSQL 8.4. What is the code path? After a row is inserted to the table, it will update the fields of "validfrom", and "validto". Followings are the table structure, data, and the performance data:
xiliu=# \d page
Table "pyetlexa.page"
Column | Type | Modifiers
-----------------+-------------------+-----------
pageid | integer | not null
url | character varying |
size | integer |
validfrom | date |
validto | date |
version | integer |
domainid | integer |
serverversionid | integer |
Indexes:
"page_pkey" PRIMARY KEY, btree (pageid)
"url_version_idx" btree (url, version DESC)
Here is the data in this table:
http://imagebin.ca/img/KyxMDIKq.png
Here is the performance data by "top":
http://imagebin.ca/img/2ssw4wEQ.png
Regards,
afancy
On Sun, Nov 22, 2009 at 12:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
afancy <groupme@gmail.com> writes:What exactly are you doing when you "load data"? There are some code
> My PostgreSQL server has two CPUs (OS: Fedora 11), each with 4 cores. Total
> is 8cores. Now I have several clients running at the same time to do insert
> and update on the same table, each client having its own connection. I have
> made two testing with clients running in parallel to load 20M data in
> total. Each testing, the data is split evenly by the client number such that
> each client only loads a piece of data.
paths that are slower if they have to examine not-yet-committed tuples,
and your report sounds a bit like that might be what's happening.
But with so few details (not even a Postgres version number :-()
it's difficult to be sure of anything.
regards, tom lane
On 01/-10/-28163 11:59 AM, afancy wrote: > Hi, > > My PostgreSQL server has two CPUs (OS: Fedora 11), each with 4 cores. > Total is 8cores. Now I have several clients running at the same time > to do insert and update on the same table, each client having its own > connection. I have made two testing with clients running in > parallel to load 20M data in total. Each testing, the data is split > evenly by the client number such that each client only loads a piece > of data. > > 1) Long transaction: A client does the commit at the end of loading. > Result: Each postgres consumes 95% CPU. The more clients run in > parallel, the slower the total runing time is (when 8 clients, it is > slowest). However, I expect the more clients run in parallel, it > should be faster to load all the data. > > 2) Short transaction: I set the clients to do a commit on loading > every 500 records. Results: Each postgres consumes about 50%CPU. Now > the total running is as what i have expected; the more clients run in > parallel, the faster it is (when 8 clients, it is fastest). > > Could anybody help to why when I do the long transaction with 8 > clients, it is slowest? How can I solve this problem? As I don't want > to use the 2), in which I have to set the commit size each time. > > Thanks a lot!! > > -Afancy > Since you have 2 cpus, you may want to try setting the processor affinity for postgres (server and client programs) to the 4 cores on one of the cpus (taskset command on linux). Here's an excerpt from a modified /etc/init.d/postgresql: $SU -l postgres -c "taskset -c 4-7 $PGENGINE/postmaster -p '$PGPORT' -D '$PGDATA' ${PGOPTS} &" >> "$PGLOG" 2>&1 < /dev/null Thanks to Greg Smith to pointing this out when we had a similar issue w/a 2-cpu server. NB: This was with postgresql 8.3. Don't know if 8.4+ has built-in processor affinity. (Apologies in advance for the email formatting.)