Re: Latches vs lwlock contention

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Latches vs lwlock contention
Дата
Msg-id 4961142a-7958-4229-8329-8777b4b72690@iki.fi
обсуждение исходный текст
Ответ на Re: Latches vs lwlock contention  (Alexander Lakhin <exclusion@gmail.com>)
Список pgsql-hackers
On 27/03/2025 07:00, Alexander Lakhin wrote:
> I've discovered that the following script:
> export PGOPTIONS='-c lock_timeout=1s'
> createdb regression
> for i in {1..100}; do
> echo "ITERATION: $i"
> psql -c "CREATE TABLE t(i int);"
> cat << 'EOF' | psql &
> DO $$
> DECLARE
>      i int;
> BEGIN
>     FOR i IN 1 .. 5000000 LOOP
>      INSERT INTO t VALUES (1);
>    END LOOP;
> END;
> $$;
> EOF
> sleep 1
> psql -c "DROP TABLE t" &
> cat << 'EOF' | psql &
> COPY t FROM STDIN;
> 0
> \.
> EOF
> wait
> 
> psql -c "DROP TABLE t" || break;
> done
> 
> causes a segmentation fault on master (it fails on iterations 5, 4, 26 
> for me):
> ITERATION: 26
> CREATE TABLE
> ERROR:  canceling statement due to lock timeout
> ERROR:  canceling statement due to lock timeout
> invalid command \.
> ERROR:  syntax error at or near "0"
> LINE 1: 0
>          ^
> server closed the connection unexpectedly
> 
> Core was generated by `postgres: law regression [local] 
> idle                                         '.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  GrantLockLocal (locallock=0x5a1d75c35ba8, owner=0x5a1d75c18630) at 
> lock.c:1805
> 1805            lockOwners[i].owner = owner;
> (gdb) bt
> #0  GrantLockLocal (locallock=0x5a1d75c35ba8, owner=0x5a1d75c18630) at 
> lock.c:1805
> #1  0x00005a1d51e93ee7 in GrantAwaitedLock () at lock.c:1887
> #2  0x00005a1d51ea1e58 in LockErrorCleanup () at proc.c:814
> #3  0x00005a1d51b9a1a7 in AbortTransaction () at xact.c:2853
> #4  0x00005a1d51b9abc6 in AbortCurrentTransactionInternal () at xact.c:3579
> #5  AbortCurrentTransaction () at xact.c:3457
> #6  0x00005a1d51eafeda in PostgresMain (dbname=<optimized out>, 
> username=0x5a1d75c139b8 "law") at postgres.c:4440
> 
> (gdb) p lockOwners
> $1 = (LOCALLOCKOWNER *) 0x0
> 
> git bisect led me to 3c0fd64fe.
> Could you please take a look?

Great, thanks for the repro! With that, I was able to capture the 
failure with 'rr' and understand what happens: Commit 3c0fd64fe removed 
"lockAwaited = NULL;" from LockErrorCleanup(). Because of that, if the 
lock had been granted to us, and if LockErrorCleanup() was called twice, 
the second call would call GrantAwaitedLock() even if the lock was 
already released and cleaned up.

I've pushed a fix to put that back.

-- 
Heikki Linnakangas
Neon (https://neon.tech)



В списке pgsql-hackers по дате отправления: