Обсуждение: [PERFORM] Speed differences between two servers
Hello,
I use two dedicated bare metal servers (Online and Kimsufi). The first one takes much longer to execute a procedure
thatrecreates a database by truncating its tables, then copying the data from a set of text files; it is however much
fasterfor more typical SELECT and INSERT queries done by users.
Here is the timing for the procedure :
#Kimsufi server
time psql -f myfile.sql mydb
real 0m12.585s
user 0m0.200s
sys 0m0.076s
#Online server
time psql -f myfile.sql mydb
real 1m15.410s
user 0m0.144s
sys 0m0.028s
As you can see, the Kimsufi server takes 12 seconds to complete the procedure, while the Online one needs 75 seconds.
For more usual queries however, the ratio is reversed, as shown by explain analyze for a typical query:
#Kimsufi server
marica=> explain (analyze, buffers) SELECT t1.id_contentieux, t1.ref_dossier, t1.ref_assureur,
noms_des_tiers(t1.id_contentieux)as id_tiers, t1.libelle, t1.affaire, 1 as authorized
FROM tblcontentieux t1 WHERE id_contentieux IN (SELECT id_contentieux FROM tblcontentieux_log WHERE
plainto_tsquery('vol')@@ tsv_libelle) AND id_client = 13 ORDER BY 2
;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=543.29..543.56 rows=106 width=116) (actual time=19.870..19.885 rows=75 loops=1)
Sort Key: t1.ref_dossier
Sort Method: quicksort Memory: 35kB
Buffers: shared hit=689
-> Nested Loop (cost=430.59..539.73 rows=106 width=116) (actual time=4.103..19.143 rows=75 loops=1)
Buffers: shared hit=689
-> HashAggregate (cost=430.31..430.49 rows=18 width=4) (actual time=2.077..2.266 rows=124 loops=1)
Group Key: tblcontentieux_log.id_contentieux
Buffers: shared hit=112
-> Bitmap Heap Scan on tblcontentieux_log (cost=29.11..429.95 rows=142 width=4) (actual
time=0.712..1.550rows=147 loops=1)
Recheck Cond: (plainto_tsquery('vol'::text) @@ tsv_libelle)
Heap Blocks: exact=105
Buffers: shared hit=112
-> Bitmap Index Scan on tblcontentieux_log_tvs_libelle_idx (cost=0.00..29.07 rows=142 width=0)
(actualtime=0.632..0.632 rows=147 loops=1)
Index Cond: (plainto_tsquery('vol'::text) @@ tsv_libelle)
Buffers: shared hit=7
-> Index Scan using tblcontentieux_pkey on tblcontentieux t1 (cost=0.28..4.59 rows=1 width=116) (actual
time=0.018..0.019rows=1 loops=124)
Index Cond: (id_contentieux = tblcontentieux_log.id_contentieux)
Filter: (id_client = 13)
Rows Removed by Filter: 0
Buffers: shared hit=372
Planning time: 3.666 ms
Execution time: 20.176 ms
#Online server
marica=> explain (analyze,buffers) SELECT t1.id_contentieux, t1.ref_dossier, t1.ref_assureur,
noms_des_tiers(t1.id_contentieux)as id_tiers, t1.libelle, t1.affaire, 1 as authorized
FROM tblcontentieux t1 WHERE id_contentieux IN (SELECT id_contentieux FROM tblcontentieux_log WHERE
plainto_tsquery('vol')@@ tsv_libelle) AND id_client = 13 ORDER BY 2;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=492.01..492.27 rows=104 width=116) (actual time=10.660..10.673 rows=75 loops=1)
Sort Key: t1.ref_dossier
Sort Method: quicksort Memory: 35kB
Buffers: shared hit=686
-> Nested Loop (cost=390.14..488.52 rows=104 width=116) (actual time=1.363..10.066 rows=75 loops=1)
Buffers: shared hit=686
-> HashAggregate (cost=389.85..390.03 rows=18 width=4) (actual time=0.615..0.725 rows=124 loops=1)
Group Key: tblcontentieux_log.id_contentieux
Buffers: shared hit=109
-> Bitmap Heap Scan on tblcontentieux_log (cost=13.08..389.51 rows=139 width=4) (actual
time=0.156..0.465rows=147 loops=1)
Recheck Cond: (plainto_tsquery('vol'::text) @@ tsv_libelle)
Heap Blocks: exact=106
Buffers: shared hit=109
-> Bitmap Index Scan on tblcontentieux_log_tvs_libelle_idx (cost=0.00..13.04 rows=139 width=0)
(actualtime=0.126..0.126 rows=147 loops=1)
Index Cond: (plainto_tsquery('vol'::text) @@ tsv_libelle)
Buffers: shared hit=3
-> Index Scan using tblcontentieux_pkey on tblcontentieux t1 (cost=0.28..4.02 rows=1 width=116) (actual
time=0.010..0.011rows=1 loops=124)
Index Cond: (id_contentieux = tblcontentieux_log.id_contentieux)
Filter: (id_client = 13)
Rows Removed by Filter: 0
Buffers: shared hit=372
Planning time: 1.311 ms
Execution time: 10.813 ms
Both are bare metal servers, with 4GB of RAM; the dataset is small (compressed dump is 3MB). The main differences that
Ifound are in disk I/O as shown by hdparm, and processor type :
#Kimsufi server
hdparm -tT /dev/sda
Timing cached reads: 1744 MB in 2.00 seconds = 872.16 MB/sec
Timing buffered disk reads: 482 MB in 3.00 seconds = 160.48 MB/sec
Processor Intel(R) Atom(TM) CPU N2800 @ 1.86GHz (4 cores, cache size : 512 KB)
Disk 2TB, 7200rpm, db on 500MB partition
#Online server
hdparm -tT /dev/sda
Timing cached reads: 2854 MB in 2.00 seconds = 1427.05 MB/sec
Timing buffered disk reads: 184 MB in 3.00 seconds = 61.26 MB/sec
Processor Intel(R) Atom(TM) CPU C2350 @ 1.74GHz (2 cores, cache size : 1024 KB)
Disk 1TB, 7200rpm, db on 1TB partition
I've created two pastebins with the output of the following commands for each server:
# hdparm /dev/sda
# hdparm -i /dev/sda
# df
# cat /proc/cpuinfo
# cat /proc/meminfo
#Kimsufi server
https://pastebin.com/3860hS92
#Online server
https://pastebin.com/FT1HFbD7
My questions:
-Does the difference in 'buffered disk reads' explain the 6 fold increase in execution time for truncate/copy on the
Onlineserver?
-Why are regular queries much faster on this same server?
--
Bien à vous, Vincent Veyron
https://legalcase.libremen.com/
Legal case management software
On Mon, May 8, 2017 at 11:49 AM, Vincent Veyron <vv.lists@wanadoo.fr> wrote: > Hello, > > I use two dedicated bare metal servers (Online and Kimsufi). The first one takes much longer to execute a procedure thatrecreates a database by truncating its tables, then copying the data from a set of text files; it is however much fasterfor more typical SELECT and INSERT queries done by users. > > Here is the timing for the procedure : > > #Kimsufi server > time psql -f myfile.sql mydb > real 0m12.585s > user 0m0.200s > sys 0m0.076s > > #Online server > time psql -f myfile.sql mydb > real 1m15.410s > user 0m0.144s > sys 0m0.028s > > My questions: > > -Does the difference in 'buffered disk reads' explain the 6 fold increase in execution time for truncate/copy on the Onlineserver? The most likely cause of the difference would be that one server IS honoring fsync requests from the db and the other one isn't. If you run pgbench on both (something simple like pgbench -c 1 -T 60, aka one thread for 60 seconds) on a machine running on a 7200RPM hard drive, you should get approximately 120 transactions per second, or less, since that's how many times a second a disk spinning at that speed can write out data. If you get say 110 on the slow machine and 800 on the fast one, there's the culprit, the fast machine is not honoring fsync requests and is not crash-safe. I.e. if you start writing to the db and pull the power plug out the back of the machine it will likely power up with a corrupted database. > -Why are regular queries much faster on this same server? That's a whole nother subject. Most likely the faster machine can fit the whole db in memory, or has much faster memory, or the whole dataset is cached etc etc. For now concentrate on figuring out of you've got an fsync problem. If the data is just test data etc that you can afford to lose then you can leave off fsync and not worry. But in production this is rarely the case.
On Mon, 8 May 2017 12:48:29 -0600
Scott Marlowe <scott.marlowe@gmail.com> wrote:
Hi Scott,
Thank you for your input.
>
> The most likely cause of the difference would be that one server IS
> honoring fsync requests from the db and the other one isn't.
>
> If you run pgbench on both (something simple like pgbench -c 1 -T 60,
> aka one thread for 60 seconds) on a machine running on a 7200RPM hard
> drive, you should get approximately 120 transactions per second
Here are the results :
#Kimsufi
pgbench -c 1 -T 60 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 6618
latency average: 9.069 ms
tps = 110.270771 (including connections establishing)
tps = 110.283733 (excluding connections establishing)
#Online
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1150
latency average: 52.317 ms
tps = 19.114403 (including connections establishing)
tps = 19.115739 (excluding connections establishing)
>
> > -Why are regular queries much faster on this same server?
>
> That's a whole nother subject. Most likely the faster machine can fit
> the whole db in memory, or has much faster memory, or the whole
> dataset is cached etc etc.
>
The dataset is small (35 MB) and both servers have 4GB memory. It appears to be faster on the Online server.
using 'dmidecode -t 17' :
#Kimsufi
Memory Device
Array Handle: 0x0016
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 2048 MB
Form Factor: DIMM
Set: None
Locator: SO DIMM 0
Bank Locator: Channel A DIMM0
Type: DDR3
Type Detail: Synchronous
Speed: 1066 MHz
Manufacturer: 0x0000000000000000
Serial Number: 0x00000000
Asset Tag: Unknown
Part Number: 0x000000000000000000000000000000000000
Rank: Unknown
Configured Clock Speed: 1066 MHz
[repeated for second locator]
#Online
Memory Device
Array Handle: 0x0015
Error Information Handle: No Error
Total Width: Unknown
Data Width: Unknown
Size: 4096 MB
Form Factor: DIMM
Set: None
Locator: DIMM0
Bank Locator: BANK 0
Type: DDR3
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 1600 MHz
Manufacturer: <BAD INDEX>
Serial Number: <BAD INDEX>
Asset Tag: <BAD INDEX>
Part Number: <BAD INDEX>
Rank: 1
Configured Clock Speed: 1333 MHz
Minimum voltage: Unknown
Maximum voltage: Unknown
Configured voltage: Unknown
--
Bien à vous, Vincent Veyron
https://libremen.com
Logiciels de gestion, libres
On Mon, May 8, 2017 at 4:24 PM, Vincent Veyron <vv.lists@wanadoo.fr> wrote: > On Mon, 8 May 2017 12:48:29 -0600 > Scott Marlowe <scott.marlowe@gmail.com> wrote: > > Hi Scott, > > Thank you for your input. > >> >> The most likely cause of the difference would be that one server IS >> honoring fsync requests from the db and the other one isn't. >> >> If you run pgbench on both (something simple like pgbench -c 1 -T 60, >> aka one thread for 60 seconds) on a machine running on a 7200RPM hard >> drive, you should get approximately 120 transactions per second > > Here are the results : > > #Kimsufi > pgbench -c 1 -T 60 test > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 1 > query mode: simple > number of clients: 1 > number of threads: 1 > duration: 60 s > number of transactions actually processed: 6618 > latency average: 9.069 ms > tps = 110.270771 (including connections establishing) > tps = 110.283733 (excluding connections establishing) Just under 120, looks like fsync is working. > > #Online > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 1 > query mode: simple > number of clients: 1 > number of threads: 1 > duration: 60 s > number of transactions actually processed: 1150 > latency average: 52.317 ms > tps = 19.114403 (including connections establishing) > tps = 19.115739 (excluding connections establishing) OK that's horrendous. My mobile phone is likely faster. We need to figure out why it's so slow. If it's in a RAID-1 set it might be syncing. >> > -Why are regular queries much faster on this same server? >> >> That's a whole nother subject. Most likely the faster machine can fit >> the whole db in memory, or has much faster memory, or the whole >> dataset is cached etc etc. >> > > The dataset is small (35 MB) and both servers have 4GB memory. It appears to be faster on the Online server. Yeah it fits in memory. Select queries will only hit disk at bootup. First machine SNIP > Speed: 1066 MHz SNIP > Configured Clock Speed: 1066 MHz Second machine > Speed: 1600 MHz SNIP > Configured Clock Speed: 1333 MHz Yeah the second machine likely has a noticeably faster CPU than the first as well. It's about two years younger so yeah it's probably just cpu/mem that's making it fast.
On Mon, May 8, 2017 at 5:06 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Mon, May 8, 2017 at 4:24 PM, Vincent Veyron <vv.lists@wanadoo.fr> wrote: >> On Mon, 8 May 2017 12:48:29 -0600 >> Scott Marlowe <scott.marlowe@gmail.com> wrote: >>> > -Why are regular queries much faster on this same server? >>> >>> That's a whole nother subject. Most likely the faster machine can fit >>> the whole db in memory, or has much faster memory, or the whole >>> dataset is cached etc etc. >>> >> >> The dataset is small (35 MB) and both servers have 4GB memory. It appears to be faster on the Online server. > > Yeah it fits in memory. Select queries will only hit disk at bootup. > > First machine > SNIP >> Speed: 1066 MHz > SNIP >> Configured Clock Speed: 1066 MHz > > Second machine > >> Speed: 1600 MHz > SNIP >> Configured Clock Speed: 1333 MHz > > Yeah the second machine likely has a noticeably faster CPU than the > first as well. It's about two years younger so yeah it's probably just > cpu/mem that's making it fast. OK went back and looked at your original post. I seems like those two queries that are 10 and 20 ms have essentially the same plan on similar sized dbs, so it's reasonable to assume the newer machine is about twice as fast. Without seeing what your test sql file does I have no idea what the big difference in the other direction. You'll have to pull out and run the individual queries, or turn on auto explain or something to see the plans and compare. A lot of time it's just some simple tuning in postgresql.conf or maybe a database got an alter database on it to change something? Either way use show all; to compare settings and get explain (analyze) off of the slow queries.
On Mon, 8 May 2017 17:35:38 -0600
Scott Marlowe <scott.marlowe@gmail.com> wrote:
> Without seeing what your test sql file does I have no idea what the
> big difference in the other direction.
It truncates 59 tables, copies data back from a set of text files, inserts a few single records and does a few select
setval('')to reset the serial columns.
here it is :
https://pastebin.com/LVsvFzkj
>You'll have to pull out and run
> the individual queries, or turn on auto explain or something to see
> the plans and compare.
I used log_duration; it shows that the truncate and all the \copy are much slower, while all insert/select statements
aretwice as fast
>A lot of time it's just some simple tuning in
> postgresql.conf or maybe a database got an alter database on it to
> change something?
Server setups are identical : same software, same configurations, same databases.
I've put in a ticket at the Online provider with the data to see if they have an answer (now 14H00 in Paris, so they
maytake a while to respond)
--
Bien à vous, Vincent Veyron
https://compta.libremen.com
Logiciel libre de comptabilité générale en partie double
Well, the response to the ticket was quite fast : https://status.online.net/index.php?do=details&task_id=720 Here's the stated cause : >Our tests have confirmed an issue caused by the fans of the power supplies installed in several chassis. >The fans create vibrations amplifying errors on the discs. Now on to decide whether I'm waiting for the fix or re-building a new box... Thanks a bunch for your help. -- Bien à vous, Vincent Veyron https://libremen.com Logiciels de gestion, libres
On Tue, May 9, 2017 at 7:08 AM, Vincent Veyron <vv.lists@wanadoo.fr> wrote: > > Well, the response to the ticket was quite fast : > > https://status.online.net/index.php?do=details&task_id=720 > > Here's the stated cause : > >>Our tests have confirmed an issue caused by the fans of the power supplies installed in several chassis. > >>The fans create vibrations amplifying errors on the discs. > > Now on to decide whether I'm waiting for the fix or re-building a new box... > > Thanks a bunch for your help. You're welcome, any time. As for the hard drives, can you upgrade to a pair of SSDs? If your data set fits on (and will continue to fit on) SSDs, the performance gained from SSDs is HUGE and worth a few hundred extra for the drive. Note that you want to use the Intel enterprise stuff that survives power loss, not the cheap low end SSDs. May be an easy fix for your hosting company and a big performance gain.
On Tue, 9 May 2017 10:24:20 -0600
Scott Marlowe <scott.marlowe@gmail.com> wrote:
>
> As for the hard drives, can you upgrade to a pair of SSDs? If your
> data set fits on (and will continue to fit on) SSDs, the performance
> gained from SSDs is HUGE and worth a few hundred extra for the drive.
> Note that you want to use the Intel enterprise stuff that survives
> power loss, not the cheap low end SSDs. May be an easy fix for your
> hosting company and a big performance gain.
>
Sure, but I'm getting plenty of performance already : my little machines can serve 40 requests/second with 6 or 7
queriesper request. So I'm fine for a while.
You can see for yourself if you enter the demo account for the site in my sig, and click in a couple of files (it's the
databasethat gets re-created by the procedure I mentionned in my original post)
--
Bien à vous, Vincent Veyron
https://legalcase.libremen.com/
Legal case, contract and insurance claim management software