Обсуждение: BUG #16183: PREPARED STATEMENT slowed down by jit

Поиск
Список
Период
Сортировка

BUG #16183: PREPARED STATEMENT slowed down by jit

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16183
Logged by:          Christian Quest
Email address:      github@cquest.org
PostgreSQL version: 12.1
Operating system:   Ubuntu 18.04
Description:

I'm running several OpenStreetMap map rendering servers. They are all based
on PG/postgis.
After an upgrade from v10 to 12, the usual update process took 10x more time
than it used to.

After digging a few days, it appear that the default jit=on is the cause.

The update tool (osm2pgsql) makes heavy use of PREPARED statements. Could
there be a bug in jit+PREPARED causing the statement to be recompiled on
each execute ?

The other side of the issue is described here:
https://github.com/openstreetmap/osm2pgsql/issues/1045


Re: BUG #16183: PREPARED STATEMENT slowed down by jit

От
Jeff Janes
Дата:
The update tool (osm2pgsql) makes heavy use of PREPARED statements. Could
there be a bug in jit+PREPARED causing the statement to be recompiled on
each execute ?

I don't think that that is a bug, rather it is just how JIT works at the moment (that it is recompiled for each execution).  That might be improved in a future version.

For the slow down to be 10x though does seem rather extreme.  Could you isolate the query and post an execution with "EXPLAIN (ANALYZE)"?  I suspect it is a very cheap query (e.g. a single-row update), but for some reason the planner thinks it is expensive, and so JIT kicks in when it shouldn't.  Because of the high overhead, JIT is only supposed to activate for expensive queries, see jit_above_cost.

Cheers,

Jeff

Re: BUG #16183: PREPARED STATEMENT slowed down by jit

От
Christian Quest
Дата:

The queries are very simple, like:


PREPARE mark_ways_by_node(" POSTGRES_OSMID_TYPE ") AS select id from planet_osm_ways WHERE nodes && ARRAY[$1];


They are all located here in osm2pgsql source code:

https://github.com/openstreetmap/osm2pgsql/blob/master/src/middle-pgsql.cpp#L786


Here are the EXPLAIN/ANALYZE without jit and with jit:


osm=# explain analyze execute mark_ways_by_node(1836953770);
                                                              QUERY PLAN                                                             
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on planet_osm_ways  (cost=2468.37..305182.32 rows=301467 width=8) (actual time=0.039..0.042 rows=2 loops=1)
   Recheck Cond: (nodes && '{1836953770}'::bigint[])
   Heap Blocks: exact=2
   ->  Bitmap Index Scan on planet_osm_ways_nodes  (cost=0.00..2393.00 rows=301467 width=0) (actual time=0.029..0.029 rows=2 loops=1)
         Index Cond: (nodes && '{1836953770}'::bigint[])
 Planning Time: 0.171 ms
 Execution Time: 0.077 ms
(7 rows)


osm=# PREPARE mark_ways_by_node(bigint) AS select id from planet_osm_ways WHERE nodes && ARRAY[$1];
PREPARE
osm=# explain analyze execute mark_ways_by_node(1836953770);
                                                              QUERY PLAN                                                             
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on planet_osm_ways  (cost=2468.37..305182.32 rows=301467 width=8) (actual time=5.775..5.905 rows=2 loops=1)
   Recheck Cond: (nodes && '{1836953770}'::bigint[])
   Heap Blocks: exact=2
   ->  Bitmap Index Scan on planet_osm_ways_nodes  (cost=0.00..2393.00 rows=301467 width=0) (actual time=0.512..0.512 rows=2 loops=1)
         Index Cond: (nodes && '{1836953770}'::bigint[])
 Planning Time: 3.667 ms
 JIT:
   Functions: 4
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.466 ms, Inlining 0.000 ms, Optimization 0.354 ms, Emission 4.634 ms, Total 5.454 ms
 Execution Time: 30.393 ms
(11 rows)


Le 02/01/2020 à 19:00, Jeff Janes a écrit :
The update tool (osm2pgsql) makes heavy use of PREPARED statements. Could
there be a bug in jit+PREPARED causing the statement to be recompiled on
each execute ?

I don't think that that is a bug, rather it is just how JIT works at the moment (that it is recompiled for each execution).  That might be improved in a future version.

For the slow down to be 10x though does seem rather extreme.  Could you isolate the query and post an execution with "EXPLAIN (ANALYZE)"?  I suspect it is a very cheap query (e.g. a single-row update), but for some reason the planner thinks it is expensive, and so JIT kicks in when it shouldn't.  Because of the high overhead, JIT is only supposed to activate for expensive queries, see jit_above_cost.

Cheers,

Jeff

--
Ce message a été vérifié par MailScanner
pour des virus ou des polluriels et rien de
suspect n'a été trouvé.

Re: BUG #16183: PREPARED STATEMENT slowed down by jit

От
Andres Freund
Дата:
Hi,

On 2020-01-02 22:29:31 +0100, Christian Quest wrote:
> osm=# PREPARE mark_ways_by_node(bigint) AS select id from planet_osm_ways
> WHERE nodes && ARRAY[$1];
> PREPARE
> osm=# explain analyze execute mark_ways_by_node(1836953770);
> QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on planet_osm_ways  (cost=2468.37..305182.32 rows=301467
> width=8) (actual time=5.775..5.905 rows=2 loops=1)
>    Recheck Cond: (nodes && '{1836953770}'::bigint[])
>    Heap Blocks: exact=2
>    ->  Bitmap Index Scan on planet_osm_ways_nodes (cost=0.00..2393.00
> rows=301467 width=0) (actual time=0.512..0.512 rows=2 loops=1)
>          Index Cond: (nodes && '{1836953770}'::bigint[])
>  Planning Time: 3.667 ms
>  JIT:
>    Functions: 4
>    Options: Inlining false, Optimization false, Expressions true, Deforming
> true
>    Timing: Generation 0.466 ms, Inlining 0.000 ms, Optimization 0.354 ms,
> Emission 4.634 ms, Total 5.454 ms
>  Execution Time: 30.393 ms
> (11 rows)

I'm not too surprised heuristics down't work, if the row sestimate is
off by ~5 orders of magnitude... Caching could "fix" this by making the
JIT cost less noticable, but you're very liable to get bad plan
decisions as long as you're that far off with estimated plan costs.

Greetings,

Andres Freund



Re: BUG #16183: PREPARED STATEMENT slowed down by jit

От
Jeff Janes
Дата:
On Thu, Jan 2, 2020 at 5:03 PM Christian Quest <cquest@cquest.org> wrote:

osm=# explain analyze execute mark_ways_by_node(1836953770);

                                                              QUERY PLAN                                                             
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on planet_osm_ways  (cost=2468.37..305182.32 rows=301467 width=8) (actual time=0.039..0.042 rows=2 loops=1)
   Recheck Cond: (nodes && '{1836953770}'::bigint[])

I think your estimation here is falling victim to an deficiency in how stats are computed on array types when all values in the array (across all rows) are rare.  See the discussion of this at https://www.postgresql.org/message-id/flat/CAMkU%3D1x2W1gpEP3AQsrSA30uxQk1Sau5VDOLL4LkhWLwrOY8Lw%40mail.gmail.com

(My quick and dirty patch posted there still compiles and works, if you would like to test that it fixes the problem for you.)
 
Because the number of rows is vastly overestimated, so is the cost.  Which then causes JIT to kick in counter-productively, due to the deranged cost exceeding jit_above_cost.

Cheers,

Jeff

Re: BUG #16183: PREPARED STATEMENT slowed down by jit

От
Christian Quest
Дата:
Le 03/01/2020 à 15:50, Jeff Janes a écrit :
On Thu, Jan 2, 2020 at 5:03 PM Christian Quest <cquest@cquest.org> wrote:

osm=# explain analyze execute mark_ways_by_node(1836953770);

                                                              QUERY PLAN                                                             
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on planet_osm_ways  (cost=2468.37..305182.32 rows=301467 width=8) (actual time=0.039..0.042 rows=2 loops=1)
   Recheck Cond: (nodes && '{1836953770}'::bigint[])

I think your estimation here is falling victim to an deficiency in how stats are computed on array types when all values in the array (across all rows) are rare.  See the discussion of this at https://www.postgresql.org/message-id/flat/CAMkU%3D1x2W1gpEP3AQsrSA30uxQk1Sau5VDOLL4LkhWLwrOY8Lw%40mail.gmail.com

(My quick and dirty patch posted there still compiles and works, if you would like to test that it fixes the problem for you.)
 
Because the number of rows is vastly overestimated, so is the cost.  Which then causes JIT to kick in counter-productively, due to the deranged cost exceeding jit_above_cost.

Cheers,

Jeff


This wrong cost may have other side effect, like launching parallel workers.

Another person got the same problem, but my simple fix of disabling jit did not make it for him. My test were done on a smaller database (OpenStreetMap data extract only covering France), his was on a full planet dataset. The computed rows where 10x higher.

We found a workaround (disabling jit and parallel workers for the session), but a more global fix on this wrong evaluation of rows should be considered for other cases ;)

Thanks for your time on this issue.


Christian

Re: BUG #16183: PREPARED STATEMENT slowed down by jit

От
Ryan Lambert
Дата:
On Fri, Jan 3, 2020 at 7:50 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Thu, Jan 2, 2020 at 5:03 PM Christian Quest <cquest@cquest.org> wrote:

osm=# explain analyze execute mark_ways_by_node(1836953770);

                                                              QUERY PLAN                                                             
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on planet_osm_ways  (cost=2468.37..305182.32 rows=301467 width=8) (actual time=0.039..0.042 rows=2 loops=1)
   Recheck Cond: (nodes && '{1836953770}'::bigint[])

I think your estimation here is falling victim to an deficiency in how stats are computed on array types when all values in the array (across all rows) are rare.  See the discussion of this at https://www.postgresql.org/message-id/flat/CAMkU%3D1x2W1gpEP3AQsrSA30uxQk1Sau5VDOLL4LkhWLwrOY8Lw%40mail.gmail.com

(My quick and dirty patch posted there still compiles and works, if you would like to test that it fixes the problem for you.)
 
Because the number of rows is vastly overestimated, so is the cost.  Which then causes JIT to kick in counter-productively, due to the deranged cost exceeding jit_above_cost.

Cheers,

Jeff

Hi Jeff,

I tried applying your original patch and the later patch from that thread to REL_12_STABLE, unfortunately no luck on either.  The original patch errors with:

$ git apply -p1 < array_type_analyze_MCE_V001.patch
error: src/backend/utils/adt/array_typanalyze.c: already exists in working directory

The other with:

$ git apply -p1 < array_typanalyze_0_mce.patch
error: src/backend/utils/adt/array_typanalyze.c: already exists in working directory
error: src/backend/utils/cache/lsyscache.c: already exists in working directory

I am happy to review and test patches that address this particular issue.  Thanks,

Ryan Lambert



Re: BUG #16183: PREPARED STATEMENT slowed down by jit

От
Jeff Janes
Дата:
On Sun, Jan 5, 2020 at 7:21 PM Ryan Lambert <ryan@rustprooflabs.com> wrote:

I tried applying your original patch and the later patch from that thread to REL_12_STABLE, unfortunately no luck on either.  The original patch errors with:

$ git apply -p1 < array_type_analyze_MCE_V001.patch
error: src/backend/utils/adt/array_typanalyze.c: already exists in working directory

I just use the GNU "patch -p1" command.  I don't  know what "git apply" is up to here, and the error message isn't very informative.

By the way, the patch says "However, there is no histogram to fall back on when the MCE array is NULL", which is a bit of nonsense.  It is ndistinct, not histogram, which the scalar case falls back on when MCV is NULL.  But there isn't an ndistinct for MCE either, so the conclusion is still sound.

Cheers,

Jeff

Re: BUG #16183: PREPARED STATEMENT slowed down by jit

От
Ryan Lambert
Дата:
On Mon, Jan 6, 2020 at 10:51 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, Jan 5, 2020 at 7:21 PM Ryan Lambert <ryan@rustprooflabs.com> wrote:

I tried applying your original patch and the later patch from that thread to REL_12_STABLE, unfortunately no luck on either.  The original patch errors with:

$ git apply -p1 < array_type_analyze_MCE_V001.patch
error: src/backend/utils/adt/array_typanalyze.c: already exists in working directory

I just use the GNU "patch -p1" command.  I don't  know what "git apply" is up to here, and the error message isn't very informative.

By the way, the patch says "However, there is no histogram to fall back on when the MCE array is NULL", which is a bit of nonsense.  It is ndistinct, not histogram, which the scalar case falls back on when MCV is NULL.  But there isn't an ndistinct for MCE either, so the conclusion is still sound.

Cheers,

Jeff

Thank you, using patch instead of git apply works.  Your patch [1] applies cleanly and passes make installcheck-world against REL_12_STABLE.  It greatly improves the estimated row counts in the prepared statements [2] that seem to be causing the issue.  I have not yet verified this patch fixes the originally reported performance issue regarding updating data with osm2pgsql [3].  

The following query uses an ID from Colorado region's ways table. A gzipped plain pg_dump (97MB) is available [4] to make this easier to replicate if helpful.

PREPARE mark_ways_by_node(BIGINT) AS
SELECT id FROM public.planet_osm_ways
WHERE nodes && ARRAY[$1];
EXPLAIN(ANALYZE)
EXECUTE mark_ways_by_node(736973985);

The explain results from Unpatched Postgres 12.1 shows estimated 10,222 rows when actual rows=1.


                                                             QUERY PLAN

--------------------------------------------------------------------------------------------------------------
----------------------
 Bitmap Heap Scan on planet_osm_ways  (cost=119.22..28648.09 rows=10222 width=8) (actual time=0.045..0.047 rows=1 loops=1)
   Recheck Cond: (nodes && '{736973985}'::bigint[])
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on planet_osm_ways_nodes  (cost=0.00..116.67 rows=10222 width=0) (actual time=0.032..
0.033 rows=1 loops=1)
         Index Cond: (nodes && '{736973985}'::bigint[])
 Planning Time: 0.288 ms
 Execution Time: 0.102 ms


The same data with the patch [1] applied is much closer with an estimated 68 rows. The OP's query plan showed an estimated row count > 300k when actual rows was 2.   It's logical to suspect the costly JIT / parallel query would not be chosen if their row count estimates improve by similar margins.


                                                           QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
----
 Bitmap Heap Scan on planet_osm_ways  (cost=28.53..295.41 rows=68 width=8) (actual time=0.045..0.046 rows=1 loops=1)
   Recheck Cond: (nodes && '{736973985}'::bigint[])
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on planet_osm_ways_nodes  (cost=0.00..28.51 rows=68 width=0) (actual time=0.034..0.035 rows=1 loops
=1)
         Index Cond: (nodes && '{736973985}'::bigint[])
 Planning Time: 0.209 ms
 Execution Time: 0.092 ms


I'll try to do more testing in a few days to verify if this patch fixes the issue with osm2pgsql updates. 

Thanks,
Ryan

[1] https://www.postgresql.org/message-id/attachment/39315/array_type_analyze_MCE_V001.patch
[2] https://github.com/openstreetmap/osm2pgsql/blob/master/src/middle-pgsql.cpp#L786
[3] https://github.com/openstreetmap/osm2pgsql/issues/1045