On Thu, Apr 13, 2023 at 09:14:01PM -0700, Peter Geoghegan wrote:
> I find that if I run the following test against a standard debug build
> on HEAD, my local installation reliably segfaults:
>
> $ meson test --setup running --suite test_rls_hooks-running
>
> Attached is a "bt full" run from gdb against a core dump. The query
> "EXPLAIN (costs off) SELECT * FROM rls_test_permissive;" runs when the
> backend segfaults.
>
> The top frame of the back trace is suggestive of a use-after-free:
>
> #0 copyObjectImpl (from=0x7f7f7f7f7f7f7f7e) at copyfuncs.c:187
> 187 switch (nodeTag(from))
> ...
>
> "git bisect" suggests that the problem began at commit 6ee30209,
> "SQL/JSON: support the IS JSON predicate".
>
> It's a bit surprising that the bug reproduces when I run a standard
> test, and yet we appear to have a bug that's about 2 weeks old. There
> may be something unusual about my system that will turn out to be
> relevant -- though there is nothing particularly exotic about this
> machine. My repro doesn't rely on concurrent execution, or timing, or
> anything like that -- it's quite reliable.
I was able to reproduce this yesterday but not today.
I think what happened is that you (and I) are in the habbit of running
"meson test tmp_install" to compile new binaries and install them into
./tmp_install, and then run a server out from there. But nowadays
there's also "meson test install_test_files". I'm not sure what
combination of things are out of sync, but I suspect you forgot one of
0) compile *and* install the new binaries; or 1) restart the running
postmaster; or, 2) install the new shared library ("test files").
I saw the crash again when I did this:
time ninja
time meson test tmp_install install_test_files regress/regress # does not recompile, BTW
./tmp_install/usr/local/pgsql/bin/postgres -D ./testrun/regress/regress/tmp_check/data -p 5678 -c autovacuum=no&
git checkout HEAD~222
time meson test tmp_install install_test_files
time PGPORT=5678 meson test --setup running test_rls_hooks-running/regress
In this case, I'm not sure if there's anything to blame meson for; the
issue is running server, which surely has different structures since
last month.
--
Justin