Re: BUG #16076: JIT causes huge delays in a complex query. jit=off solves it.

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #16076: JIT causes huge delays in a complex query. jit=off solves it.
Дата
Msg-id 20210530212639.67te76g2btc6bdzz@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #16076: JIT causes huge delays in a complex query. jit=off solves it.  (Yuri Astrakhan <yuriastrakhan@gmail.com>)
Список pgsql-bugs
Hi,

I'm afraid not :(. I really could use some help around this...

Regards,

Andres

On 2021-05-21 17:54:16 -0400, Yuri Astrakhan wrote:
> Hi Andres, I just heard v14 beta is out (congrats!). Do you know if the
> very large query optimizations you mentioned (see last paragraph) have made
> it into v13 or v14b1?  Thanks!
> 
> On Thu, Oct 24, 2019 at 9:04 PM Andres Freund <andres@anarazel.de> wrote:
> 
> > Hi,
> >
> > On 2019-10-24 15:12:16 +0000, PG Bug reporting form wrote:
> > > The following bug has been logged on the website:
> > >
> > > Bug reference:      16076
> > > Logged by:          Yuri Astrakhan
> > > Email address:      yuriastrakhan@gmail.com
> > > PostgreSQL version: 12.0
> > > Operating system:   Ubuntu 19.10
> > > Description:
> > >
> > > (huge thanks to @Komzpa - Darafei Praliaskouski who figured out the
> > > culprit)
> > >
> > > PostgreSQL 12.0 (Ubuntu 12.0-2.pgdg19.10+1) on x86_64-pc-linux-gnu,
> > compiled
> > > by gcc (Ubuntu 9.2.1-9ubuntu1) 9.2.1 20191008, 64-bit
> > > POSTGIS="3.0.0 r17983" [EXTENSION] PGSQL="120" GEOS="3.8.0-CAPI-1.13.1 "
> > > PROJ="Rel. 5.2.0, September 15th, 2018" LIBXML="2.9.4" LIBJSON="0.13.1"
> > > LIBPROTOBUF="1.3.1" WAGYU="0.4.3 (Internal)"
> > >
> > > I have a full OpenStreetMap loaded using OpenMapTiles configuration into
> > > multiple machine configurations. A single query (wrapped as a UDF)
> > generates
> > > the whole MVT tile blob, and uses many layers, sub-queries, and UDFs.
> > When
> > > JIT is enabled, the query takes 5-10 seconds to run. Disabling JIT makes
> > the
> > > query run in milliseconds.  Same queries on PostgreSQL 10 + Postgis
> > > 2.5.2/GEOS 3.6 have no issues (as there was no JIT).
> >
> > Ugh.
> >
> >
> > > Details:
> > > The query is a concatenation of multiple map layers (binary blobs in MVT
> > > format from ST_AsMVT()). All layers are "UNION ALL-ed" from subqueries,
> > one
> > > query per layer. The "explain analyze" shows huge cost of the first layer
> > > query, regardless of the order of those layers.  Every layer added about
> > > 200-300ms to the execution (not planning!) time, totalling about 5-7
> > > seconds. Yet, I noticed that when I was testing a portion of layers,
> > > removing layer_waterway made query run in milliseconds (I suspect there
> > are
> > > more than one layer that causes this bug).
> >
> > > I will try to reproduce this error using OpenMapTiles's quickstart setup,
> > > but that will involve some heavy customization.  Is this a known bug, or
> > > should I try to get an easy to set up repo?  Thanks!
> >
> > A repro would be very helpful.
> >
> >
> > > * "perf top" results -
> > > https://gist.github.com/nyurik/fa0cb0dd935d50e796cd5d58d8d41067
> >
> > Hm. That's a flat looking profile, making it hard to glean information
> > about what's going wrong. It might just be the size of the emitted
> > code...
> >
> >
> > If it's too hard to generate a quick repro, it might already help if you
> > sent me the bitcode for such a slow query. You can do so by setting
> > 'jit_dump_bitcode' and looking for the *.bc files in the data
> > directory.
> >
> >
> > > * A few explore analyze runs with different layer orderings:
> > >   * https://explain.depesz.com/s/fecC
> > >   * https://explain.depesz.com/s/MpMo
> > >   * https://explain.depesz.com/s/d7RU
> > > * main query -
> > > https://gist.github.com/nyurik/ed80c5e6d20f67fd14d37ea2503a9cbd
> > > * helpers -
> > https://gist.github.com/nyurik/00c4c9d69b7d4fa67c86159b9ce83027
> >
> > This looks like it creates pretty darn huge queries. There's a *lot* of
> > table references here, and each of them has their own targetlist etc.
> >
> > I want to figure out why it's *this* slow, but this many expressions are
> > going to have overhead to JIT. There's 330 generated functions in that
> > last explain. A lot of them never actually executed, because there's
> > basically no content. Presumably there's a lot of redundancy, with the
> > expressions being basically the same - but we're unable to take
> > advantage of that in 12, due to the pointer constants making the
> > expressions not actually be the same, c.f. patch at[1] (which also is
> > why there's no caching).
> >
> > Unfortunately I don't think there's going to be full blown fix for this
> > in 12, although we might be able to improve things a bit. So you'll
> > probably end up having to just disable jit. But if you can try to get me
> > enough data to work with, the situation in 13 might be better.
> >
> > Greetings,
> >
> > Andres Freund
> >
> > [1]
> > https://www.postgresql.org/message-id/20191023163849.sosqbfs5yenocez3%40alap3.anarazel.de
> >
Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: BUG #16827: macOS interrupted syscall leads to a crash
Следующее
От: Tom Lane
Дата:
Сообщение: Re: XX000: iso-8859-1 type of jsonb container.