Re: Expose lock group leader pid in pg_stat_activity

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: Expose lock group leader pid in pg_stat_activity
Дата
Msg-id 20200316042752.GE26184@telsasoft.com
обсуждение исходный текст
Ответ на Re: Expose lock group leader pid in pg_stat_activity  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Expose lock group leader pid in pg_stat_activity  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
On Tue, Jan 28, 2020 at 12:36:41PM +0100, Julien Rouhaud wrote:
> So, AFAICT the LockHashPartitionLockByProc is required when
> iterating/modifying lockGroupMembers or lockGroupLink, but just
> getting the leader pid should be safe.

This still seems unsafe:

git show -U11 -w --patience b025f32e0b src/backend/utils/adt/pgstatfuncs.c
+                       /*
+                        * If a PGPROC entry was retrieved, display wait events and lock
+                        * group leader information if any.  To avoid extra overhead, no
+                        * extra lock is being held, so there is no guarantee of
+                        * consistency across multiple rows.
+                        */
...
+                               PGPROC     *leader;
...
+                               leader = proc->lockGroupLeader;
+                               if (leader)
+                               {
# does something guarantee that leader doesn't change ?
+                                       values[29] = Int32GetDatum(leader->pid);
+                                       nulls[29] = false;
                                }

Michael seems to have raised the issue:

On Thu, Jan 16, 2020 at 04:49:12PM +0900, Michael Paquier wrote:
> And actually, the way you are looking at the leader's PID is visibly
> incorrect and inconsistent because the patch takes no shared LWLock on
> the leader using LockHashPartitionLockByProc() followed by
> LWLockAcquire(), no?  That's incorrect because it could be perfectly
> possible to crash with this code between the moment you check if 
> lockGroupLeader is NULL and the moment you look at
> lockGroupLeader->pid if a process is being stopped in-between and
> removes itself from a lock group in ProcKill().  

But I don't see how it was addressed ?

I read this:

src/backend/storage/lmgr/lock.c:         * completely valid.  We cannot safely dereference another backend's
src/backend/storage/lmgr/lock.c-         * lockGroupLeader field without holding all lock partition locks, and
src/backend/storage/lmgr/lock.c-         * it's not worth that.)

I think if you do:
|LWLockAcquire(&proc->backendLock, LW_SHARED);
..then it's at least *safe* to access leader->pid, but it may be inconsistent
unless you also call LockHashPartitionLockByProc.

I wasn't able to produce a crash, so maybe I missed something.

-- 
Justin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: effective_io_concurrency's steampunk spindle maths
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: Berserk Autovacuum (let's save next Mandrill)