Обсуждение: Optimizing PostgreSQL with LLVM's PGO+LTO

Поиск
Список
Период
Сортировка

Optimizing PostgreSQL with LLVM's PGO+LTO

От
João Paulo Labegalini de Carvalho
Дата:
Hi all,

I am investigating the benefits of different profile-guided optimizations (PGO) and link-time optimizations (LTO) versus binary optimizers (e.g. BOLT) for applications such as PostgreSQL.

I am facing issues when applying LTO to PostgreSQL as the produced binary seems broken (the server dies quickly after it has started). This is definitely a compiler bug, but I was wondering if anyone here  have experimented with LTO for PostgreSQL.

Thanks,

--
João Paulo L. de Carvalho
Ph.D Computer Science |  IC-UNICAMP | Campinas , SP - Brazil
Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canada

Re: Optimizing PostgreSQL with LLVM's PGO+LTO

От
Darafei "Komяpa" Praliaskouski
Дата:
Hi,

We have implemented LTO in PostGIS's build system a couple releases ago. It definitely gives +10% on heavy maths. Unfortunately we did not manage to get it running under FreeBSD because of default system linker issues so we had to hide it under --with-lto switch which we recommend to everyone.

I did not experiment with Postgres itself but there are definitely traces of numerous LTO-enabled private builds on the web.

On Fri, Jan 27, 2023 at 8:05 PM João Paulo Labegalini de Carvalho <jaopaulolc@gmail.com> wrote:
Hi all,

I am investigating the benefits of different profile-guided optimizations (PGO) and link-time optimizations (LTO) versus binary optimizers (e.g. BOLT) for applications such as PostgreSQL.

I am facing issues when applying LTO to PostgreSQL as the produced binary seems broken (the server dies quickly after it has started). This is definitely a compiler bug, but I was wondering if anyone here  have experimented with LTO for PostgreSQL.

Thanks,

--
João Paulo L. de Carvalho
Ph.D Computer Science |  IC-UNICAMP | Campinas , SP - Brazil
Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canada

Re: Optimizing PostgreSQL with LLVM's PGO+LTO

От
Tom Lane
Дата:
=?UTF-8?Q?Jo=C3=A3o_Paulo_Labegalini_de_Carvalho?= <jaopaulolc@gmail.com> writes:
> I am facing issues when applying LTO to PostgreSQL as the produced binary
> seems broken (the server dies quickly after it has started). This is
> definitely a compiler bug, but I was wondering if anyone here  have
> experimented with LTO for PostgreSQL.

There are a lot of places where we're implicitly relying on
cross-compilation-unit optimizations NOT happening, because the
code isn't adequately decorated with memory barriers and the like.
So I wouldn't necessarily assume that the misbehavior you're seeing
represents anything that the compiler folks would consider a bug.

In the long run we might be interested in trying to make this
work better, but I don't know of anyone working on it now.

            regards, tom lane



Re: Optimizing PostgreSQL with LLVM's PGO+LTO

От
Andres Freund
Дата:
Hi,

On 2023-01-27 10:05:09 -0700, João Paulo Labegalini de Carvalho wrote:
> I am investigating the benefits of different profile-guided optimizations
> (PGO) and link-time optimizations (LTO) versus binary optimizers (e.g.
> BOLT) for applications such as PostgreSQL.
> 
> I am facing issues when applying LTO to PostgreSQL as the produced binary
> seems broken (the server dies quickly after it has started). This is
> definitely a compiler bug, but I was wondering if anyone here  have
> experimented with LTO for PostgreSQL.

What compiler / version / flags / OS did you try?


FWIW, I've experimented with LTO and PGO a bunch, both with gcc and clang. I
did hit a crash in gcc, but that did turn out to be a compiler bug, and
actually reduced to something not even needing LTO.

I saw quite substantial speedups with PGO, but I only tested very specific
workloads. IIRC it was >15% gain in concurrent readonly pgbench.


I dimly recall failing to get some benefit out of bolt for some reason that I
unfortunately don't even vaguely recall.

Greetings,

Andres Freund



Re: Optimizing PostgreSQL with LLVM's PGO+LTO

От
Andres Freund
Дата:
Hi,

On 2023-01-27 15:06:37 -0500, Tom Lane wrote:
> There are a lot of places where we're implicitly relying on
> cross-compilation-unit optimizations NOT happening, because the code isn't
> adequately decorated with memory barriers and the like.

We have a fallback compiler barrier implementation doing that, but it
shouldn't be used on any halfway reasonable compiler. Cross-compilation-unit
calls don't provide a memory barrier - I assume you're thinking about a
compiler barrier?

I'm sure we have a few places that aren't that careful, but I would hope it's
not a large number. Are you thinking of specific "patterns" we've repeated all
over, or just a few cases you recall?

Greetings,

Andres Freund



Re: Optimizing PostgreSQL with LLVM's PGO+LTO

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2023-01-27 15:06:37 -0500, Tom Lane wrote:
>> There are a lot of places where we're implicitly relying on
>> cross-compilation-unit optimizations NOT happening, because the code isn't
>> adequately decorated with memory barriers and the like.

> We have a fallback compiler barrier implementation doing that, but it
> shouldn't be used on any halfway reasonable compiler. Cross-compilation-unit
> calls don't provide a memory barrier - I assume you're thinking about a
> compiler barrier?

Sorry, yeah, I was being sloppy there.

> I'm sure we have a few places that aren't that careful, but I would hope it's
> not a large number. Are you thinking of specific "patterns" we've repeated all
> over, or just a few cases you recall?

I recall that we used to have dependencies on, for example, the LWLock
functions being out-of-line.  Probably that specific pain point has
been cleaned up, but it surprises me not at all to hear that there
are more.

I agree that there are probably not a huge number of places that would
need to be fixed, but I'm not sure how we'd go about finding them.

            regards, tom lane



Re: Optimizing PostgreSQL with LLVM's PGO+LTO

От
Andres Freund
Дата:
Hi,

On 2023-01-27 18:28:16 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I'm sure we have a few places that aren't that careful, but I would hope it's
> > not a large number. Are you thinking of specific "patterns" we've repeated all
> > over, or just a few cases you recall?
> 
> I recall that we used to have dependencies on, for example, the LWLock
> functions being out-of-line.  Probably that specific pain point has
> been cleaned up, but it surprises me not at all to hear that there
> are more.

We did clean up a fair bit, some via "infrastructure" fixes. E.g. our
spinlocks didn't use to be a barrier a good while back (c.f. 0709b7ee72e), and
that required putting volatile on things that couldn't move across the lock
boundaries.  I think that in turn was what caused the LWLock issue you
mention, as back then lwlocks used spinlocks.

The increased use of atomics instead of "let's just do a dirty read", fixed a
few instances too.


> I agree that there are probably not a huge number of places that would
> need to be fixed, but I'm not sure how we'd go about finding them.

Yea, that's the annoying part...


One thing we can look for is the use of volatile, which we used to use a lot
for preventing code rearrangement (for lack of barrier primitives in the bad
old days). Both Robert and I removed a bunch of that kind of use of volatile,
and from memory some of them wouldn't have been safe with LTO.

It's really too bad that we [have to] use volatile around signal handlers and
for PG_TRY too, otherwise it'd be easier to search for.

Kinda wondering if we ought to add a sig_volatile, err_volatile or such.


But the main thing probably is to just regularly test LTO and look for
problems. Perhaps worth adding a BF animal that uses -O3 + LTO?

I don't immediately see how to squeeze using PGO into the BF build process
(since we'd have to build without PGO, run some workload, build with PGO -
without any source modifications inbetween)...

Greetings,

Andres Freund



Re: Optimizing PostgreSQL with LLVM's PGO+LTO

От
João Paulo Labegalini de Carvalho
Дата:

What compiler / version / flags / OS did you try?

I am running experiment on a machine with:
  • Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
  • Ubuntu 18.04.6 LTS
  • LLVM/Clang 15.0.6 (build from source)
These are the flags I am using:

CFLAGS = -O3 -fuse-ld=lld -gline-tables-only -fprofile-instr-generate
LDFLAGS = -fuse-ld=lld -Wl,-q


FWIW, I've experimented with LTO and PGO a bunch, both with gcc and clang. I
did hit a crash in gcc, but that did turn out to be a compiler bug, and
actually reduced to something not even needing LTO.

Good to hear that it works. I just need to figure out what is going wrong on my end then.
 
I saw quite substantial speedups with PGO, but I only tested very specific
workloads. IIRC it was >15% gain in concurrent readonly pgbench.

I successfully applied PGO only and obtained similar gains with TPC-C & TPC-H workloads.

I dimly recall failing to get some benefit out of bolt for some reason that I
unfortunately don't even vaguely recall.

I got similar gains slightly higher than PGO with BOLT, but not for all queries in TPC-H. In fact, I observed small (2-4%) regressions with BOLT.

--
João Paulo L. de Carvalho
Ph.D Computer Science |  IC-UNICAMP | Campinas , SP - Brazil
Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canada

Re: Optimizing PostgreSQL with LLVM's PGO+LTO

От
Andres Freund
Дата:
Hi,

On 2023-01-30 10:24:02 -0700, João Paulo Labegalini de Carvalho wrote:
> > What compiler / version / flags / OS did you try?
> >
> 
> I am running experiment on a machine with:
> 
>    - Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
>    - Ubuntu 18.04.6 LTS
>    - LLVM/Clang 15.0.6 (build from source)
> 
> These are the flags I am using:
> 
> CFLAGS = -O3 -fuse-ld=lld -gline-tables-only -fprofile-instr-generate
> LDFLAGS = -fuse-ld=lld -Wl,-q

For some reason my notes for using LTO include changing RANLIB to point to
gcc/llvm-ranlib of the appropriate version. Won't even be used on HEAD, but
before that it can make a difference.

Depending on how you built clang, it could be that the above recipe ends up
using the system lld, which might be too old.

What are the crashes you're getting?

Greetings,

Andres Freund



Re: Optimizing PostgreSQL with LLVM's PGO+LTO

От
João Paulo Labegalini de Carvalho
Дата:

On Mon, Jan 30, 2023 at 10:47 AM Andres Freund <andres@anarazel.de> wrote:
For some reason my notes for using LTO include changing RANLIB to point to
gcc/llvm-ranlib of the appropriate version. Won't even be used on HEAD, but
before that it can make a difference.

I will try that.
 
Depending on how you built clang, it could be that the above recipe ends up
using the system lld, which might be too old.

I double checked and I am using the lld that I built from source.
 
What are the crashes you're getting?

When I run make check, the server starts up fine but the test queries seem to not execute. I don't see any errors, the check step just quits after a while.

2023-02-01 13:00:38.703 EST postmaster[28750] LOG:  starting PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by clang version 15.0.6, 64-bit
2023-02-01 13:00:38.703 EST postmaster[28750] LOG:  listening on Unix socket "/tmp/pg_regress-h8Fmqu/.s.PGSQL.58085"
2023-02-01 13:00:38.704 EST startup[28753] LOG:  database system was shut down at 2023-02-01 13:00:38 EST
2023-02-01 13:00:38.705 EST postmaster[28750] LOG:  database system is ready to accept connections

--
João Paulo L. de Carvalho
Ph.D Computer Science |  IC-UNICAMP | Campinas , SP - Brazil
Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canada