Re: Let's make PostgreSQL multi-threaded

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: Let's make PostgreSQL multi-threaded
Дата	6 июня 2023 г. 14:13:47
Msg-id	CA+TgmoY=hioNYW124e0CZ6Lbo_pVyeMw_rK+9fzzH6Aa85RYgw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Let's make PostgreSQL multi-threaded (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: Let's make PostgreSQL multi-threaded
Список	pgsql-hackers

Дерево обсуждения

On Tue, Jun 6, 2023 at 9:40 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I'm not sure that there's a strong consensus, but I do think it's a good idea.

Let me elaborate on this a bit.

I think one of PostgreSQL's bigger problems right now is that it
doesn't scale as far as users would like. Beyond a couple of hundred
connections, everything goes to heck. Back in the day, the big
scalability problems were around locking, but we've done a pretty good
job cleaning that stuff up over the issues. Now, the problem when you
run a ton of PostgreSQL connections isn't so much that PostgreSQL
stops working as it is that the OS stops working. PostgreSQL backends
use a lot of memory, even if they're idle. Some of that is for stuff
that we could optimize but haven't, like catcache and relcache
entries, and some of it is for stuff that we can't do anything about,
like per-process page tables. But the problem isn't just RAM, either.
I've seen machines running >1000 PostgreSQL backends where kill -9
took many *minutes* to work because the OS was overwhelmed. I don't
know exactly what goes wrong inside the kernel, but clearly something
does.

Not all databases have this problem, and PostgreSQL isn't going to be
able to stop having it without some kind of major architectural
change. Changing from a process model to a threaded model might be
insufficient, because while I think that threads consume fewer OS
resources than processes, what is really needed, in all likelihood, is
the ability to have idle connections have neither a process nor a
thread associated with them until they cease being idle. That's a huge
project and I'm not volunteering to do it, but if we want to have the
same kind of scalability as some competing products, that is probably
a place to which we ultimately need to go. Getting out of the current
model where every backend has an arbitrarily large amount of state
hanging off of random global variables, not all of which are even
known to any central system, is a critical step in that journey.

Also, programming with DSA and shm_mq sucks. It's doable (proof by
example) but it's awkward and it takes a long time and the performance
isn't great. Here again, threads instead of processes is no panacea.
For as long as we support a process model - and my guess is that we're
talking about a very long time - new features are going to have to
work with those systems or else be optional. But the amount of sheer
mental energy that is required to deal with DSA means we're unlikely
to ever have a rich library of parallel primitives. Maybe we wouldn't
anyway, volunteer efforts are hard to predict, but this is certainly
not helping. I do think that there's some danger that if sharing
memory becomes as easy as calling palloc(), we'll end up with memory
leaks that could eventually take the whole system down. We need to
give some thought to how to avoid or manage that danger.

Even think about something like the main lock table. That's a fixed
size hash table, so lock exhaustion is a real possibility. If we
weren't limited to a fixed-size shared memory segment, we could let
that thing grow without a server restart. We might not want to let it
grow infinitely, but we could raise the maximum size by 100x and
allocate as required and I think we'd just be better off. Doing that
as things stand would require nailing down that amount of memory
forever whether it's ever needed or not, which doesn't seem like a
good idea. But doing something where the memory can be allocated only
if it's needed would avoid user-facing errors with relatively little
cost.

I think doing something like this is going to be a huge effort, and
frankly, there's probably no point in anybody other than a handful of
people (Heikki, Andres, a handful of others) even trying. There's too
many ways to go wrong, and this has to be done really well to be worth
doing at all. But if somebody with the requisite expertise wants to
have a go at it, I don't think we should tell them "no, we don't want
that" on principle. Let's talk about whether a specific proposal is
good or bad, and why it's good or bad, rather than falling back on an
essentially religious argument. It's not an article of faith that
PostgreSQL should not use threads: it's a technology decision. The
difficulty of reversing the decision made long ago should weigh
heavily in evaluating any proposal to do so, but the potential
benefits of such a change should be considered, too.

--
Robert Haas
EDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Let's make PostgreSQL multi-threaded