Обсуждение: Unaccounted regression from postgresql 11 in later versions

Поиск
Список
Период
Сортировка

Unaccounted regression from postgresql 11 in later versions

От
Bob Jolliffe
Дата:
Hello

We have an application (https://dhis2.org) which has been using postgresql as a backend for the past 15 years or so.  Gradually moving through pg versions 8,9,10 etc as the years went by.  At the moment a large number of our implementations are using versions 13, 14 and 15.  Unfortunately we have recently discovered that, despite most operations performing considerably better on later versions, there is a particular type of query that is very much slower (up to 100x) than it was on postgresql 11.  We have seen this regression in 13, 14 and 15.  Unfortunately I dont have stats on version 12 yet.

The query is not beautifully crafted.  It is automatically generated from a custom expression language.  We know that it can probably be improved, but at the moment we would really like to know if there is anything we can configure with the SQL as-is to get performance like we had back on pg11.

The example below is a typical such query.  I've attached below that, links to the results of EXPLAIN (ANALYZE, BUFFERS). for pg11 and pg15 on the same physical environment loaded with the same database.  I would appreciate some help trying to understand what we are seeing with the EXPLAIN output and whether there is anything to be done.

EXPLAIN ANALYZE  
select
  count(pi) as value,
  '2022W21' as Weekly
from
  analytics_enrollment_gr3uwzvzpqt as ax
where
  cast(
    (
      select
        "IEMtgZapP2s"
      from
        analytics_event_gr3uWZVzPQT
      where
        analytics_event_gr3uWZVzPQT.pi = ax.pi
        and "IEMtgZapP2s" is not null
        and ps = 'XO45JBGJcXJ'
      order by
        executiondate desc
      limit
        1
    ) as date
  ) < cast('2022-05-30' as date)
  and cast(
    (
      select
        "IEMtgZapP2s"
      from
        analytics_event_gr3uWZVzPQT
      where
        analytics_event_gr3uWZVzPQT.pi = ax.pi
        and "IEMtgZapP2s" is not null
        and ps = 'XO45JBGJcXJ'
      order by
        executiondate desc
      limit
        1
    ) as date
  ) >= cast('2022-05-23' as date)
  and (uidlevel1 = 'Plmg8ikyfrK')
  and (
    coalesce(
      (
        select
          "QEbYS2QOXLf"
        from
          analytics_event_gr3uWZVzPQT
        where
          analytics_event_gr3uWZVzPQT.pi = ax.pi
          and "QEbYS2QOXLf" is not null
          and ps = 'XO45JBGJcXJ'
        order by
          executiondate desc
        limit
          1
      ):: text,
      ''
    ) = 'FIN_CASE_CLASS_CONFIRM_LAB'
  )
limit 1;
-- limit 200001;

The EXPLAIN result for postgresql 11 is here:  https://explain.depesz.com/s/3QfC

The same query on postgresql 15 is here:  https://explain.depesz.com/s/BzpA#html

Whereas the first example takes 23s, the pg15 one takes 243s (this time difference is even more stark when you remove BUFFERS from the explain).  During execution the pg15 query consumes 100% of a CPU core throughout indicating it is probably cpu bound rather than IO.

The plan selected in both cases seems to be exactly the same.  But pg15 seems to make a lot of work of the final aggregation step.  Anecdotally I understand that the same difference is there with pg13 and 14.  The only significant factor I could think of relating to new behaviour in pg13 is the new hash_mem_multiplier configuration and it its relation to work_mem availbale for hash tables.  I have attempted to turn up both hash_mem_multilier and work_mem to ridiculous values and I see no change whatsoever on pg15. 

I also removed the LIMIT and tested again with no significant difference:  https://explain.depesz.com/s/K9Lq

Does anyone have a theory of why pg15 should behave so differently to pg11 here?  Better still, any suggestions for configuration that might make pg15 behave more like pg10.  I am really dreading the prospect of stepping our many live implementations back to pg11 :-(.

Regards
Bob

Re: Unaccounted regression from postgresql 11 in later versions

От
Michael Christofides
Дата:
Does anyone have a theory of why pg15 should behave so differently to pg11 here?  Better still, any suggestions for configuration that might make pg15 behave more like pg10.  I am really dreading the prospect of stepping our many live implementations back to pg11 :-(.

One major factor here appears to be JIT compilation, which is off by default in pg11, but on by default in pg12+.

You can see at the bottom of your slowest query plan that about 233s of the 240s are JIT related.

There is good info in the docs about tuning, or turning off, JIT: https://www.postgresql.org/docs/current/jit-decision.html 

Re: Unaccounted regression from postgresql 11 in later versions

От
Bob Jolliffe
Дата:
Wow Michael you are absolutely right.  Turning jit off results in a query execution about twice as fast as pg11.  That is a huge relief.  I will read the jit related docs and see if there is anything smarter I should be doing other than disabling jit entirely, but it works a treat for this query.

Regards
Bob

On Wed, 31 May 2023 at 11:11, Michael Christofides <michael@pgmustard.com> wrote:
Does anyone have a theory of why pg15 should behave so differently to pg11 here?  Better still, any suggestions for configuration that might make pg15 behave more like pg10.  I am really dreading the prospect of stepping our many live implementations back to pg11 :-(.

One major factor here appears to be JIT compilation, which is off by default in pg11, but on by default in pg12+.

You can see at the bottom of your slowest query plan that about 233s of the 240s are JIT related.

There is good info in the docs about tuning, or turning off, JIT: https://www.postgresql.org/docs/current/jit-decision.html