Обсуждение: BUG #16183: PREPARED STATEMENT slowed down by jit
The following bug has been logged on the website: Bug reference: 16183 Logged by: Christian Quest Email address: github@cquest.org PostgreSQL version: 12.1 Operating system: Ubuntu 18.04 Description: I'm running several OpenStreetMap map rendering servers. They are all based on PG/postgis. After an upgrade from v10 to 12, the usual update process took 10x more time than it used to. After digging a few days, it appear that the default jit=on is the cause. The update tool (osm2pgsql) makes heavy use of PREPARED statements. Could there be a bug in jit+PREPARED causing the statement to be recompiled on each execute ? The other side of the issue is described here: https://github.com/openstreetmap/osm2pgsql/issues/1045
The update tool (osm2pgsql) makes heavy use of PREPARED statements. Could
there be a bug in jit+PREPARED causing the statement to be recompiled on
each execute ?
The queries are very simple, like:
PREPARE mark_ways_by_node(" POSTGRES_OSMID_TYPE ") AS select id from planet_osm_ways WHERE nodes && ARRAY[$1];
They are all located here in osm2pgsql source code:
https://github.com/openstreetmap/osm2pgsql/blob/master/src/middle-pgsql.cpp#L786
Here are the EXPLAIN/ANALYZE without jit and with jit:
osm=# explain analyze execute mark_ways_by_node(1836953770);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on planet_osm_ways (cost=2468.37..305182.32 rows=301467 width=8) (actual time=0.039..0.042 rows=2 loops=1)
Recheck Cond: (nodes && '{1836953770}'::bigint[])
Heap Blocks: exact=2
-> Bitmap Index Scan on planet_osm_ways_nodes (cost=0.00..2393.00 rows=301467 width=0) (actual time=0.029..0.029 rows=2 loops=1)
Index Cond: (nodes && '{1836953770}'::bigint[])
Planning Time: 0.171 ms
Execution Time: 0.077 ms
(7 rows)
osm=# PREPARE mark_ways_by_node(bigint) AS select id from planet_osm_ways WHERE nodes && ARRAY[$1];
PREPARE
osm=# explain analyze execute mark_ways_by_node(1836953770);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on planet_osm_ways (cost=2468.37..305182.32 rows=301467 width=8) (actual time=5.775..5.905 rows=2 loops=1)
Recheck Cond: (nodes && '{1836953770}'::bigint[])
Heap Blocks: exact=2
-> Bitmap Index Scan on planet_osm_ways_nodes (cost=0.00..2393.00 rows=301467 width=0) (actual time=0.512..0.512 rows=2 loops=1)
Index Cond: (nodes && '{1836953770}'::bigint[])
Planning Time: 3.667 ms
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 0.466 ms, Inlining 0.000 ms, Optimization 0.354 ms, Emission 4.634 ms, Total 5.454 ms
Execution Time: 30.393 ms
(11 rows)
The update tool (osm2pgsql) makes heavy use of PREPARED statements. Could
there be a bug in jit+PREPARED causing the statement to be recompiled on
each execute ?I don't think that that is a bug, rather it is just how JIT works at the moment (that it is recompiled for each execution). That might be improved in a future version.For the slow down to be 10x though does seem rather extreme. Could you isolate the query and post an execution with "EXPLAIN (ANALYZE)"? I suspect it is a very cheap query (e.g. a single-row update), but for some reason the planner thinks it is expensive, and so JIT kicks in when it shouldn't. Because of the high overhead, JIT is only supposed to activate for expensive queries, see jit_above_cost.Cheers,Jeff
--
Ce message a été vérifié par MailScanner
pour des virus ou des polluriels et rien de
suspect n'a été trouvé.
Hi, On 2020-01-02 22:29:31 +0100, Christian Quest wrote: > osm=# PREPARE mark_ways_by_node(bigint) AS select id from planet_osm_ways > WHERE nodes && ARRAY[$1]; > PREPARE > osm=# explain analyze execute mark_ways_by_node(1836953770); > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------- > Bitmap Heap Scan on planet_osm_ways (cost=2468.37..305182.32 rows=301467 > width=8) (actual time=5.775..5.905 rows=2 loops=1) > Recheck Cond: (nodes && '{1836953770}'::bigint[]) > Heap Blocks: exact=2 > -> Bitmap Index Scan on planet_osm_ways_nodes (cost=0.00..2393.00 > rows=301467 width=0) (actual time=0.512..0.512 rows=2 loops=1) > Index Cond: (nodes && '{1836953770}'::bigint[]) > Planning Time: 3.667 ms > JIT: > Functions: 4 > Options: Inlining false, Optimization false, Expressions true, Deforming > true > Timing: Generation 0.466 ms, Inlining 0.000 ms, Optimization 0.354 ms, > Emission 4.634 ms, Total 5.454 ms > Execution Time: 30.393 ms > (11 rows) I'm not too surprised heuristics down't work, if the row sestimate is off by ~5 orders of magnitude... Caching could "fix" this by making the JIT cost less noticable, but you're very liable to get bad plan decisions as long as you're that far off with estimated plan costs. Greetings, Andres Freund
osm=# explain analyze execute mark_ways_by_node(1836953770);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on planet_osm_ways (cost=2468.37..305182.32 rows=301467 width=8) (actual time=0.039..0.042 rows=2 loops=1)
Recheck Cond: (nodes && '{1836953770}'::bigint[])
On Thu, Jan 2, 2020 at 5:03 PM Christian Quest <cquest@cquest.org> wrote:osm=# explain analyze execute mark_ways_by_node(1836953770);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on planet_osm_ways (cost=2468.37..305182.32 rows=301467 width=8) (actual time=0.039..0.042 rows=2 loops=1)
Recheck Cond: (nodes && '{1836953770}'::bigint[])I think your estimation here is falling victim to an deficiency in how stats are computed on array types when all values in the array (across all rows) are rare. See the discussion of this at https://www.postgresql.org/message-id/flat/CAMkU%3D1x2W1gpEP3AQsrSA30uxQk1Sau5VDOLL4LkhWLwrOY8Lw%40mail.gmail.com(My quick and dirty patch posted there still compiles and works, if you would like to test that it fixes the problem for you.)Because the number of rows is vastly overestimated, so is the cost. Which then causes JIT to kick in counter-productively, due to the deranged cost exceeding jit_above_cost.Cheers,Jeff
This wrong cost may have other side effect, like launching parallel workers.
Another person got the same problem, but my simple fix of disabling jit did not make it for him. My test were done on a smaller database (OpenStreetMap data extract only covering France), his was on a full planet dataset. The computed rows where 10x higher.
We found a workaround (disabling jit and parallel workers for the session), but a more global fix on this wrong evaluation of rows should be considered for other cases ;)
Thanks for your time on this issue.
Christian
On Thu, Jan 2, 2020 at 5:03 PM Christian Quest <cquest@cquest.org> wrote:osm=# explain analyze execute mark_ways_by_node(1836953770);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on planet_osm_ways (cost=2468.37..305182.32 rows=301467 width=8) (actual time=0.039..0.042 rows=2 loops=1)
Recheck Cond: (nodes && '{1836953770}'::bigint[])I think your estimation here is falling victim to an deficiency in how stats are computed on array types when all values in the array (across all rows) are rare. See the discussion of this at https://www.postgresql.org/message-id/flat/CAMkU%3D1x2W1gpEP3AQsrSA30uxQk1Sau5VDOLL4LkhWLwrOY8Lw%40mail.gmail.com(My quick and dirty patch posted there still compiles and works, if you would like to test that it fixes the problem for you.)Because the number of rows is vastly overestimated, so is the cost. Which then causes JIT to kick in counter-productively, due to the deranged cost exceeding jit_above_cost.Cheers,Jeff
error: src/backend/utils/adt/array_typanalyze.c: already exists in working directory
error: src/backend/utils/cache/lsyscache.c: already exists in working directory
I tried applying your original patch and the later patch from that thread to REL_12_STABLE, unfortunately no luck on either. The original patch errors with:$ git apply -p1 < array_type_analyze_MCE_V001.patcherror: src/backend/utils/adt/array_typanalyze.c: already exists in working directory
On Sun, Jan 5, 2020 at 7:21 PM Ryan Lambert <ryan@rustprooflabs.com> wrote:I tried applying your original patch and the later patch from that thread to REL_12_STABLE, unfortunately no luck on either. The original patch errors with:$ git apply -p1 < array_type_analyze_MCE_V001.patcherror: src/backend/utils/adt/array_typanalyze.c: already exists in working directoryI just use the GNU "patch -p1" command. I don't know what "git apply" is up to here, and the error message isn't very informative.By the way, the patch says "However, there is no histogram to fall back on when the MCE array is NULL", which is a bit of nonsense. It is ndistinct, not histogram, which the scalar case falls back on when MCV is NULL. But there isn't an ndistinct for MCE either, so the conclusion is still sound.Cheers,Jeff
The following query uses an ID from Colorado region's ways table. A gzipped plain pg_dump (97MB) is available [4] to make this easier to replicate if helpful.
PREPARE mark_ways_by_node(BIGINT) AS
SELECT id FROM public.planet_osm_ways
WHERE nodes && ARRAY[$1];
EXPLAIN(ANALYZE)
EXECUTE mark_ways_by_node(736973985);
The explain results from Unpatched Postgres 12.1 shows estimated 10,222 rows when actual rows=1.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
----------------------
Bitmap Heap Scan on planet_osm_ways (cost=119.22..28648.09 rows=10222 width=8) (actual time=0.045..0.047 rows=1 loops=1)
Recheck Cond: (nodes && '{736973985}'::bigint[])
Heap Blocks: exact=1
-> Bitmap Index Scan on planet_osm_ways_nodes (cost=0.00..116.67 rows=10222 width=0) (actual time=0.032..
0.033 rows=1 loops=1)
Index Cond: (nodes && '{736973985}'::bigint[])
Planning Time: 0.288 ms
Execution Time: 0.102 ms
The same data with the patch [1] applied is much closer with an estimated 68 rows. The OP's query plan showed an estimated row count > 300k when actual rows was 2. It's logical to suspect the costly JIT / parallel query would not be chosen if their row count estimates improve by similar margins.
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
----
Bitmap Heap Scan on planet_osm_ways (cost=28.53..295.41 rows=68 width=8) (actual time=0.045..0.046 rows=1 loops=1)
Recheck Cond: (nodes && '{736973985}'::bigint[])
Heap Blocks: exact=1
-> Bitmap Index Scan on planet_osm_ways_nodes (cost=0.00..28.51 rows=68 width=0) (actual time=0.034..0.035 rows=1 loops
=1)
Index Cond: (nodes && '{736973985}'::bigint[])
Planning Time: 0.209 ms
Execution Time: 0.092 ms
I'll try to do more testing in a few days to verify if this patch fixes the issue with osm2pgsql updates.
Thanks,
Ryan
[1] https://www.postgresql.org/message-id/attachment/39315/array_type_analyze_MCE_V001.patch
[2] https://github.com/openstreetmap/osm2pgsql/blob/master/src/middle-pgsql.cpp#L786
[3] https://github.com/openstreetmap/osm2pgsql/issues/1045