Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

Поиск

Список

Период

Сортировка

От	Pietro Pugni
Тема	Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Дата	3 апреля 2015 г. 13:43:13
Msg-id	0E6681E8-7412-4A8F-8AEC-FD6A29820B66@gmail.com обсуждение исходный текст
Ответ на	Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL (didier <did447@gmail.com>)
Список	pgsql-performance

Дерево обсуждения

Hi didier,

thank you for your time.

I forgot to display before the output of free. I’ve looked into it before and I found difficult to fully understand if there was something wrong.

Before starting Postgres:

total used free shared buffers cached

Mem: 125G 9G 115G 15M 362M 8.1G

-/+ buffers/cache: 1.5G 124G

Swap: 127G 0B 127G

Here’s an example of free output when queries B_1 and B_2 are running (they’re part of the same transaction). Generally values remains the same. For what I can understand, RAM isn’t used at all (there’s a lot of unused RAM).

total used free shared buffers cached

Mem: 125G 13G 112G 3.1G 362M 11G

-/+ buffers/cache: 1.9G 123G

Swap: 127G 0B 127G

With Postgres running after transaction has been executed:

total used free shared buffers cached

Mem: 125G 13G 112G 3.1G 362M 11G

-/+ buffers/cache: 1.5G 124G

Swap: 127G 0B 127G

there's also huge page
/sys/kernel/mm/transparent_hugepage/enabled
can you try to disable it?

It was enabled and after disabling it nothing changed: time execution is practically the same (131s for the same transaction tested in previous emails, which is composed by queries B_1 and B_2).

Also test on the dell:
select tmp.cf, tmp.dt from grep_studi.tmp;
and
select tmp.cf, tmp.dt from grep_studi.tmp order by tmp.cf;
in Query B_2
the sort is 9 time slower on the dell, you have to find why…

Here’s the output for the two queries:

select tmp.cf, tmp.dt from grep_studi.tmp;

"Seq Scan on grep_studi.tmp (cost=0.00..11007.74 rows=1346868 width=72) (actual time=0.082..618.709 rows=2951191 loops=1)"

" Output: cf, dt"

" Buffers: shared hit=512 read=7802 dirtied=8314"

"Planning time: 0.087 ms"

"Execution time: 745.505 ms"

select tmp.cf, tmp.dt from grep_studi.tmp;

"Sort (cost=38431.55..39104.99 rows=1346868 width=72) (actual time=3146.548..3306.179 rows=2951191 loops=1)"

" Output: cf, dt"

" Sort Key: tmp.cf"

" Sort Method: quicksort Memory: 328866kB"

" Buffers: shared hit=8317"

" -> Seq Scan on grep_studi.tmp (cost=0.00..11007.74 rows=1346868 width=72) (actual time=0.012..373.346 rows=2951191 loops=1)"

" Output: cf, dt"

" Buffers: shared hit=8314"

"Planning time: 0.034 ms"

"Execution time: 3459.065 ms"

32 GB for buffers is too high for the queries in your test but it
doesn't matter.

I’ve set shared_buffers to be 1/4 of the total RAM. I’ve changed kernel values to accomodate this value. Lowering to smaller values doesn’t improve the transaction results. Here’s the results with 1 run for each level of shared_buffers:

32GB: 131s

16GB: 132s

8GB: 133s

4GB: 132s

2GB: 143s

1GB: 148s

512MB: 183s

256MB: 192s

Probably I can keep 4GB but I make use of several partitions with tens of millions of records each. This is why I keep shared_buffers high. My applications is also similar to a DWH solution with one user. Like you said, big values of shared_buffers shouldn’t be a issue.

I’ve done some tests with sysbench on Dell T420 and MacMini.

T420 - RAM READ - 16GB / 1MB

sh-4.3# sysbench --test=memory --memory-oper=read --memory-block-size=1MB --memory-total-size=16GB run

sysbench 0.4.12: multi-threaded system evaluation benchmark

Running the test with following options:

Number of threads: 1

Doing memory operations speed test

Memory block size: 1024K

Memory transfer size: 16384M

Memory operations type: read

Memory scope type: global

Threads started!

Done.

Operations performed: 16384 (3643025.32 ops/sec)

16384.00 MB transferred (3643025.32 MB/sec)

Test execution summary:

total time: 0.0045s

total number of events: 16384

total time taken by event execution: 0.0031

per-request statistics:

min: 0.00ms

avg: 0.00ms

max: 0.02ms

approx. 95 percentile: 0.00ms

Threads fairness:

events (avg/stddev): 16384.0000/0.00

execution time (avg/stddev): 0.0031/0.00

MacMini - RAM READ - 16GB / 1MB

server:sysbench Pietro$ ./sysbench --test=memory --memory-oper=read --memory-block-size=1MB --memory-total-size=16GB run

sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:

Number of threads: 1

Random number generator seed is 0 and will be ignored

Threads started!

Operations performed: 16384 ( 5484.50 ops/sec)

16384.00 MB transferred (5484.50 MB/sec)

General statistics:

total time: 2.9873s

total number of events: 16384

total time taken by event execution: 2.9836s

response time:

min: 0.18ms

avg: 0.18ms

max: 0.24ms

approx. 95 percentile: 0.19ms

Threads fairness:

events (avg/stddev): 16384.0000/0.00

execution time (avg/stddev): 2.9836/0.00

T420 - RAM WRITE - 16GB / 1MB

sh-4.3# sysbench --test=memory --memory-oper=write --memory-block-size=1MB --memory-total-size=16GB run

sysbench 0.4.12: multi-threaded system evaluation benchmark

Running the test with following options:

Number of threads: 1

Doing memory operations speed test

Memory block size: 1024K

Memory transfer size: 16384M

Memory operations type: write

Memory scope type: global

Threads started!

Done.

Operations performed: 16384 ( 8298.97 ops/sec)

16384.00 MB transferred (8298.97 MB/sec)

Test execution summary:

total time: 1.9742s

total number of events: 16384

total time taken by event execution: 1.9723

per-request statistics:

min: 0.12ms

avg: 0.12ms

max: 0.25ms

approx. 95 percentile: 0.12ms

Threads fairness:

events (avg/stddev): 16384.0000/0.00

execution time (avg/stddev): 1.9723/0.00

MacMini - RAM WRITE - 16GB / 1MB

server:sysbench Pietro$ ./sysbench --test=memory --memory-oper=write --memory-block-size=1MB --memory-total-size=16GB run

sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:

Number of threads: 1

Random number generator seed is 0 and will be ignored

Threads started!

Operations performed: 16384 ( 5472.90 ops/sec)

16384.00 MB transferred (5472.90 MB/sec)

General statistics:

total time: 2.9937s

total number of events: 16384

total time taken by event execution: 2.9890s

response time:

min: 0.18ms

avg: 0.18ms

max: 0.32ms

approx. 95 percentile: 0.19ms

Threads fairness:

events (avg/stddev): 16384.0000/0.00

execution time (avg/stddev): 2.9890/0.00

T420 - CPU

sh-4.3# sysbench --test=cpu run

sysbench 0.4.12: multi-threaded system evaluation benchmark

Running the test with following options:

Number of threads: 1

Doing CPU performance benchmark

Threads started!

Done.

Maximum prime number checked in CPU test: 10000

Test execution summary:

total time: 13.0683s

total number of events: 10000

total time taken by event execution: 13.0674

per-request statistics:

min: 1.30ms

avg: 1.31ms

max: 1.44ms

approx. 95 percentile: 1.35ms

Threads fairness:

events (avg/stddev): 10000.0000/0.00

execution time (avg/stddev): 13.0674/0.00

MacMini - CPU

server:sysbench Pietro$ ./sysbench --test=cpu run

sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:

Number of threads: 1

Random number generator seed is 0 and will be ignored

Primer numbers limit: 10000

Threads started!

General statistics:

total time: 11.5728s

total number of events: 10000

total time taken by event execution: 11.5703s

response time:

min: 1.15ms

avg: 1.16ms

max: 2.17ms

approx. 95 percentile: 1.17ms

Threads fairness:

events (avg/stddev): 10000.0000/0.00

execution time (avg/stddev): 11.5703/0.00

I’ve done these tests because someone else on this discussion asked me to investigate on memory bandwidth and because I found this interesting article about Intel Xeon vs Intel i5 with different Postgres versions: http://blog.pgaddict.com/posts/performance-since-postgresql-7-4-to-9-4-pgbench

Hope this helps to better understand the problem.

Thank you very much.

Best regards,

Pietro

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL