Обсуждение: PostgreSQLv14 TPC-H performance GCC vs Clang
Hi
PostgreSQLv14 source code build with GCCv11.2 and Clangv12(without JIT) with optimisation flags like O3 and tested with HammerDB
Observed TPC-H , GCC performance better than Clang(without JIT). The performance difference ~22% and also noticed the assembly code difference GCC vs Clang( e.g. GCC inlined functionality compared to Clang).
Environment details:
————————-
OS :RHEL8.4
Bare metal : Apple/AMD EPYC/IBM
Test(TPC-H) Benchmark Environment:HammerDB
Is the performance difference mainly because of below points ?
1 data over flow and calculations like int128(int128.c) and C arithmetic operations(functions include in float.h e.g float4_mul)
And please suggest is any another functionality or code points need to check on the performance difference
> .. optimisation flags like O3
> And please suggest ... to check on the performance difference The Phoronix has been tested the PostgreSQL 13 with Clang 12 + GCC 11.1 On Xeon Ice Lake
"The CFLAGS/CXXFLAGS set throughout testing were "-O3 -march=native -flto"
as would be common for HPC systems when building performance sensitive code."
and the results:
https://www.phoronix.com/scan.php?page=article&item=clang12-gcc11-icelake&num=4 ( see ~ bottom of the page )
only the Postgres ( GCC 11 vs. LLVM Clang 12 Benchmarks On Xeon Ice Lake )
maybe you can replicate the Phoronix results ( but this is only gcc11.1 ! )
"Compare your own system(s) to this result file with the Phoronix Test Suite
by running the command: phoronix-test-suite benchmark 2105299-IB-COMPILERT91"
Regards.
Imre
arjun shetty <arjunshetty955@gmail.com> ezt írta (időpont: 2021. nov. 2., K, 18:13):
HiPostgreSQLv14 source code build with GCCv11.2 and Clangv12(without JIT) with optimisation flags like O3 and tested with HammerDBObserved TPC-H , GCC performance better than Clang(without JIT). The performance difference ~22% and also noticed the assembly code difference GCC vs Clang( e.g. GCC inlined functionality compared to Clang).Environment details:————————-OS :RHEL8.4Bare metal : Apple/AMD EPYC/IBMTest(TPC-H) Benchmark Environment:HammerDBIs the performance difference mainly because of below points ?1 data over flow and calculations like int128(int128.c) and C arithmetic operations(functions include in float.h e.g float4_mul)And please suggest is any another functionality or code points need to check on the performance difference
Hi
@imre : Thank you sharing the links on “ Phoronix has been tested the PostgreSQL 13”.
I compared my test results with Phoronix test suit” . It has too deviations(may be hardware environment and PostgreSQL version)
I think PostgreSQLv13 may have issues with Auto vacuum and currently I’m using with PostgreSQLv14
In my environment GCC performs better than Clang(llvm) the reason would be “int128”performance better in GCC compared to Clang.
1.Clang(__int128) require 4 additional functions like “__divti3 , __modti3, __udivti3, __umodti3” and these additional not required in GCC . So it may lead performance drop in Clang.
2.__int128 aligned 16 bytes boundaries (MAXALIGN) supported in GCC and may this in not support in Clang
@postgresql- performance: kindly let know your view on those two points.
On Wednesday, November 3, 2021, Imre Samu <pella.samu@gmail.com> wrote:
> .. optimisation flags like O3> And please suggest ... to check on the performance differenceThe Phoronix has been tested the PostgreSQL 13 with Clang 12 + GCC 11.1 On Xeon Ice Lake"The CFLAGS/CXXFLAGS set throughout testing were "-O3 -march=native -flto"as would be common for HPC systems when building performance sensitive code."and the results:https://www.phoronix.com/scan.php?page=article&item=clang12- gcc11-icelake&num=4 ( see ~ bottom of the page ) only the Postgres ( GCC 11 vs. LLVM Clang 12 Benchmarks On Xeon Ice Lake )maybe you can replicate the Phoronix results ( but this is only gcc11.1 ! )"Compare your own system(s) to this result file with the Phoronix Test Suiteby running the command: phoronix-test-suite benchmark 2105299-IB-COMPILERT91"Regards.Imrearjun shetty <arjunshetty955@gmail.com> ezt írta (időpont: 2021. nov. 2., K, 18:13):HiPostgreSQLv14 source code build with GCCv11.2 and Clangv12(without JIT) with optimisation flags like O3 and tested with HammerDBObserved TPC-H , GCC performance better than Clang(without JIT). The performance difference ~22% and also noticed the assembly code difference GCC vs Clang( e.g. GCC inlined functionality compared to Clang).Environment details:————————-OS :RHEL8.4Bare metal : Apple/AMD EPYC/IBMTest(TPC-H) Benchmark Environment:HammerDBIs the performance difference mainly because of below points ?1 data over flow and calculations like int128(int128.c) and C arithmetic operations(functions include in float.h e.g float4_mul)And please suggest is any another functionality or code points need to check on the performance difference
Hi, IMO this thread provides so little information it's almost impossible to answer the question. There's almost no information about the hardware, scale of the test, configuration of the Postgres instance, the exact build flags, differences in generated asm code, etc. I find it hard to believe merely switching from clang to gcc yields 22% speedup - that's way higher than any differences we've seen in the past. In my experience, the speedup is unlikely to be "across the board". There will be a handful of affected queries, while most remaining queries will be about the same. In that case you need to focus on those queries, see if the plans are the same, do some profiling, etc. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Yes, currently focusing affects queries as well.
In meanwhile on analysis(hardware level) and sample examples noticed
1. GCC performance better than Clang on int128 .
2. Clang performance better than GCC on long long
the reference example https://stackoverflow.com/questions/63029428/why-is-int128-t-faster-than-long-long-on-x86-64-gcc
3.GCC enabled with “ fexcess-precision=standard” (precision cast for floating point ).
Is these 3 points can make performance difference GCC vs Clang in PostgreSQLv14 in Apple/AMD/()environment(intel environment need to check). In these environment int128 enabled wrt PostgreSQLv14.
On Friday, November 5, 2021, Tomas Vondra <tomas.vondra@enterprisedb.com > wrote:
On Friday, November 5, 2021, Tomas Vondra <tomas.vondra@enterprisedb.com
Hi,
IMO this thread provides so little information it's almost impossible to answer the question. There's almost no information about the hardware, scale of the test, configuration of the Postgres instance, the exact build flags, differences in generated asm code, etc.
I find it hard to believe merely switching from clang to gcc yields 22% speedup - that's way higher than any differences we've seen in the past.
In my experience, the speedup is unlikely to be "across the board". There will be a handful of affected queries, while most remaining queries will be about the same. In that case you need to focus on those queries, see if the plans are the same, do some profiling, etc.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
> GCC vs Clang
related:
As I see - with LLVM/Clang 14.0 ( X86_64 -O3 ) ~12% performance increase expected with the new optimisation ( probably adapted from gcc )
arjun shetty <arjunshetty955@gmail.com> ezt írta (időpont: 2021. nov. 16., K, 11:10):
Yes, currently focusing affects queries as well.In meanwhile on analysis(hardware level) and sample examples noticed1. GCC performance better than Clang on int128 .2. Clang performance better than GCC on long longthe reference example https://stackoverflow.com/questions/63029428/why-is-int128-t-faster-than-long-long-on-x86-64-gcc3.GCC enabled with “ fexcess-precision=standard” (precision cast for floating point ).Is these 3 points can make performance difference GCC vs Clang in PostgreSQLv14 in Apple/AMD/()environment(intel environment need to check). In these environment int128 enabled wrt PostgreSQLv14.
On Friday, November 5, 2021, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:Hi,
IMO this thread provides so little information it's almost impossible to answer the question. There's almost no information about the hardware, scale of the test, configuration of the Postgres instance, the exact build flags, differences in generated asm code, etc.
I find it hard to believe merely switching from clang to gcc yields 22% speedup - that's way higher than any differences we've seen in the past.
In my experience, the speedup is unlikely to be "across the board". There will be a handful of affected queries, while most remaining queries will be about the same. In that case you need to focus on those queries, see if the plans are the same, do some profiling, etc.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi All,

I checked with LLVM/CLang 14.0 on arch x86-64-O3 in the Mac/AMD EPYC environment , but I see GCC performs better than Clang14.
Clang14-https://github.com/llvm/llvm-project(main branch and pull or commitID:3f3fe4a5cfa1797..)

pre analysis GCC vs Clang
(1) GCC more inlined functionality compared to Clang in PostgreSQL
(2) in few functions GCC are not inlined but Clang consider inline
postgresqlv14/src/include/utlis/float.h: float8_mul(),float8_div (arithmetic functions).v
postgresqlv14/src/backend/adt/geo_ops.c : point_xxx().
(3) GCC performs better than clang on datatype Int128(need to cross check on instruction level/assembly code on Hardware).
(4) as point(2) without inline(remove inline in source code ) on those functions in file's float.h and geo_ops.c and observed performance improvement 6% compared to within inline in Clang.
regards,
Arjun
On Fri, Dec 10, 2021 at 11:51 PM Imre Samu <pella.samu@gmail.com> wrote:
> GCC vs Clangrelated:As I see - with LLVM/Clang 14.0 ( X86_64 -O3 ) ~12% performance increase expected with the new optimisation ( probably adapted from gcc )arjun shetty <arjunshetty955@gmail.com> ezt írta (időpont: 2021. nov. 16., K, 11:10):Yes, currently focusing affects queries as well.In meanwhile on analysis(hardware level) and sample examples noticed1. GCC performance better than Clang on int128 .2. Clang performance better than GCC on long longthe reference example https://stackoverflow.com/questions/63029428/why-is-int128-t-faster-than-long-long-on-x86-64-gcc3.GCC enabled with “ fexcess-precision=standard” (precision cast for floating point ).Is these 3 points can make performance difference GCC vs Clang in PostgreSQLv14 in Apple/AMD/()environment(intel environment need to check). In these environment int128 enabled wrt PostgreSQLv14.
On Friday, November 5, 2021, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:Hi,
IMO this thread provides so little information it's almost impossible to answer the question. There's almost no information about the hardware, scale of the test, configuration of the Postgres instance, the exact build flags, differences in generated asm code, etc.
I find it hard to believe merely switching from clang to gcc yields 22% speedup - that's way higher than any differences we've seen in the past.
In my experience, the speedup is unlikely to be "across the board". There will be a handful of affected queries, while most remaining queries will be about the same. In that case you need to focus on those queries, see if the plans are the same, do some profiling, etc.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company