Обсуждение: increased duration of stats_ext tests with -DCLOBBER_CACHE_ALWAYS

Поиск
Список
Период
Сортировка

increased duration of stats_ext tests with -DCLOBBER_CACHE_ALWAYS

От
Tomas Vondra
Дата:
Hi,

I have two buildfarm animals running tests with -DCLOBBER_CACHE_ALWAYS,
and both of them started having issues with timeouts

  Last file mtime in snapshot: Sat Nov 29 21:41:49 2025 GMT
  ===================================================
  timed out after 14400 secs

The animals are avocet and trilobite, and it's actually a single VM,
with one animal for gcc or clang.

It's fine that these tests take quite a bit of time, but I'm puzzled why
it started to fail. And it seems the stats_ext regression test got much
slower at the end of September.

The last couple runs from avocet have these durations:

11:44:42 ago      timedout [713d9a8] (01:54:23)     Details
7 days 23:33:36 ago      OK [bc32a12] (34:19:40)     Config
16 days 12:19:09 ago      OK [* 23792] (34:32:51)     Config
63 days 03:55:34 ago      OK [91df046] (31:14:46)     Config
71 days 10:57:06 ago      OK [9fc7f6a] (31:51:02)     Config
79 days 05:46:56 ago      OK [454c046] (19:35:25)     Config
87 days 00:36:55 ago      OK [06473f5] (19:31:16)     Config
94 days 19:50:32 ago      OK [f727b63] (19:05:05)     Config
102 days 14:43:13 ago      OK [ef03ea0] (19:09:05)     Config
...

So it was ~19h for a while (started at ~14h about 4y back). And then
between September 14 and 22 it jumped to ~32h. Which seems like a lot.

And that seems to be due to stats_ext changing from

  ok 157       + stats_ext                              855865 ms

to

  ok 157       + stats_ext                            15158426 ms

That's ~14300 seconds, i.e. ~4h. Pretty close to the timeout limit. The
machine is running the test with two locales, so there are multiple runs
of this regression test.

Any ideas what might be causing this? I see exactly the same issue on
the other machine (trilobite). There were no upgrades or other changes
around that time, and no other regression test has this issue.


regards

-- 
Tomas Vondra




Re: increased duration of stats_ext tests with -DCLOBBER_CACHE_ALWAYS

От
Tomas Vondra
Дата:
On 12/2/25 17:05, Tomas Vondra wrote:
> ...
> 
> So it was ~19h for a while (started at ~14h about 4y back). And then
> between September 14 and 22 it jumped to ~32h. Which seems like a lot.
> 
> And that seems to be due to stats_ext changing from
> 
>   ok 157       + stats_ext                              855865 ms
> 
> to
> 
>   ok 157       + stats_ext                            15158426 ms
> 
> That's ~14300 seconds, i.e. ~4h. Pretty close to the timeout limit. The
> machine is running the test with two locales, so there are multiple runs
> of this regression test.
> 

After bisecting this, it seems to have changed in this commit:

commit 1eccb93150707acfcc8f24556a15742a6313c8ac (HEAD -> stats-ext)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sat Sep 20 12:44:52 2025 -0400

    Re-allow using statistics for bool-valued functions in WHERE.

    Commit a391ff3c3, which added the ability for a function's support
    function to provide a custom selectivity estimate for "WHERE ...

And it seems to be "stuck" on the last few queries doing this:

  SELECT * FROM check_estimated_rows('SELECT * FROM sb_2 WHERE
  extstat_small(y)');

Attached is a perf-report for this, but the interesting part is:

  --99.51%--plpgsql_exec_function
        exec_toplevel_block.constprop.0
        exec_stmt_block
        exec_stmts
        exec_stmt_return (inlined)
        exec_eval_expr
        exec_eval_simple_expr (inlined)
        |
         --99.37%--SPI_plan_get_cached_plan
               GetCachedPlan
               |
               |--96.58%--RevalidateCachedQuery
               |      |
               |      |--96.00%--pg_analyze_and_rewrite_withcb
               |      |      parse_analyze_withcb
               |      |      |
               |      |       --96.00%--transformTopLevelStmt (inlined)

I'm not sure if this a problem in the code, or whether it's just that
the test case happens to be this expensive with cache clobbering. Maybe
I simply need to increase the timeout for this machine, so that it does
not fail after 4h? It'd be unfortunate, as the runs are already long,
and this pushes them over 24h ... :-(

regards

-- 
Tomas Vondra

Вложения

Re: increased duration of stats_ext tests with -DCLOBBER_CACHE_ALWAYS

От
Tom Lane
Дата:
Tomas Vondra <tomas@vondra.me> writes:
> On 12/2/25 17:05, Tomas Vondra wrote:
>> So it was ~19h for a while (started at ~14h about 4y back). And then
>> between September 14 and 22 it jumped to ~32h. Which seems like a lot.

> After bisecting this, it seems to have changed in this commit:

> commit 1eccb93150707acfcc8f24556a15742a6313c8ac (HEAD -> stats-ext)
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   Sat Sep 20 12:44:52 2025 -0400
>     Re-allow using statistics for bool-valued functions in WHERE.

Ugh.

> Attached is a perf-report for this, but the interesting part is:
>   --99.51%--plpgsql_exec_function

Yeah, I can imagine that constantly flushing the cached plan for
that plpgsql function would be bad.  Let me see if I can reformulate
that test without using a plpgsql function --- right offhand, it's
not obvious why a built-in function wouldn't serve the purpose
just as well.

            regards, tom lane



Re: increased duration of stats_ext tests with -DCLOBBER_CACHE_ALWAYS

От
Tom Lane
Дата:
I wrote:
> Yeah, I can imagine that constantly flushing the cached plan for
> that plpgsql function would be bad.  Let me see if I can reformulate
> that test without using a plpgsql function --- right offhand, it's
> not obvious why a built-in function wouldn't serve the purpose
> just as well.

I pushed a change for this.  On my Mac laptop, it brings the time
for stats_ext with -DCLOBBER_CACHE_ALWAYS down to ~8 minutes, from
I-didn't-have-the-patience-to-wait-but-it-would-have-been-hours.

BTW, I noticed that neither avocet nor trilobite seem to have
'use_installcheck_parallel' enabled in their BF config files.
That results in the installcheck steps taking longer than the
check step, when they should be the same time or shorter.
You could shave several hours off the animals' runtime by
enabling that.

(I'm not really sure why that's not on by default ... if the
check steps are automatically parallelized, why not installcheck?)

            regards, tom lane



Re: increased duration of stats_ext tests with -DCLOBBER_CACHE_ALWAYS

От
Tomas Vondra
Дата:

On 12/3/25 19:33, Tom Lane wrote:
> I wrote:
>> Yeah, I can imagine that constantly flushing the cached plan for
>> that plpgsql function would be bad.  Let me see if I can reformulate
>> that test without using a plpgsql function --- right offhand, it's
>> not obvious why a built-in function wouldn't serve the purpose
>> just as well.
> 
> I pushed a change for this.  On my Mac laptop, it brings the time
> for stats_ext with -DCLOBBER_CACHE_ALWAYS down to ~8 minutes, from
> I-didn't-have-the-patience-to-wait-but-it-would-have-been-hours.
> 

Thanks!

> BTW, I noticed that neither avocet nor trilobite seem to have
> 'use_installcheck_parallel' enabled in their BF config files.
> That results in the installcheck steps taking longer than the
> check step, when they should be the same time or shorter.
> You could shave several hours off the animals' runtime by
> enabling that.
> 

Good idea, I enabled that for both animals.


-- 
Tomas Vondra