Re: group locking: incomplete patch, just for discussion

Поиск

Список

Период

Сортировка

От	Greg Stark
Тема	Re: group locking: incomplete patch, just for discussion
Дата	3 ноября 2014 г. 15:19:15
Msg-id	CAM-w4HOGY9SpAJS5v0PpKw3En7U-DGa=zUPCuGLbEFVy1PPtKw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: group locking: incomplete patch, just for discussion (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: group locking: incomplete patch, just for discussion Re: group locking: incomplete patch, just for discussion
Список	pgsql-hackers

Дерево обсуждения

On Sat, Nov 1, 2014 at 9:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> 1. Any non-trivial piece of PostgreSQL code is likely to contain
> syscache lookups.
> 2. Syscache lookups had better work in parallel workers, or they'll be
> all but useless.

I've been using parallel sorts and index builds in my mental model of
how this will be used. I note that sorts go out of their way to look
up all the syscache entries in advance precisely so that tuplesort
doesn't start doing catalog lookups in the middle of the sort. In
general I think what people are imagining is that the parallel workers
will be running low-level code like tuplesort that has all the
databasey stuff like catalog lookups done in advance and just operates
on C data structures like function pointers. And I think that's a
valuable coding discipline to enforce, it avoids having low level
infrastructure calling up to higher level abstractions which quickly
becomes hard to reason about.

However in practice I think you're actually right -- but not for the
reasons you've been saying. I think the parallel workers *should* be
written as low level infrastructure and not be directly doing syscache
lookups or tuple locking etc. However there are a million ways in
which Postgres is extensible which causes loops in the call graph that
aren't apparent in the direct code structure. For instance, what
happens if the index you're building is an expression index or partial
index? Worse, what happens if those expressions have a plpython
function that does queries using SPI....

But those are the kinds of user code exploiting extensibility are the
situations where we need a deadlock detector and where you might need
this infrastructure. We wouldn't and shouldn't need a deadlock
detector for our own core server code. In an ideal world some sort of
compromise that enforces careful locking rules where all locks are
acquired in advance and parallel workers are prohibited from obtaining
locks in the core code while still allowing users to a free-for-all
and detecting deadlocks at runtime for them would be ideal. But I'm
not sure there's any real middle ground here.

-- 
greg

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: group locking: incomplete patch, just for discussion