[multithreading] extension compatibility

Поиск
Список
Период
Сортировка
От Robert Haas
Тема [multithreading] extension compatibility
Дата
Msg-id CA+TgmoYsKJnVjj94HJTSOeS1=TmAgV5L=DVTd_TYL=XUKCbfEA@mail.gmail.com
обсуждение исходный текст
Ответы Re: [multithreading] extension compatibility
Re: [multithreading] extension compatibility
Список pgsql-hackers
Hi,

At 2024.pgconf.dev, Heikki did a session on multithreading PostgreSQL
which I was unfortunately unable to attend due my involvement with
another session, and then we had an unconference discussion which I
was able to attend and at which I volunteered to have a look at a
couple of tasks, including "Extension Marking System (marking
extensions as thread-safe)". So in this email I'd like to (1) say a
few things about multithreading for PostgreSQL in general, (2) spell
out my understanding of the extension compatibility problem
specifically, and then (3) discuss possible solutions to that problem.
See also https://wiki.postgresql.org/wiki/Multithreading

== Multithreading Generally ==

I believe there is a consensus in the PostgreSQL developer community,
or at least among committers, that a multi-threaded programming model
would be superior to a multi-process programming model as we have now.
I won't be surprised if a few people disagree with that as a general
statement, and others may view it as better in theory but so difficult
in practice as to be not worth doing, but I believe that the consensus
is otherwise. I do understand that switching to threads introduces
some new stability risks, which are not to be taken lightly, but it
also opens the door to various performance improvements, and even
functionality, that are not feasible today. I do not believe that it
would be necessary, as has been alleged previously, to halt all other
development for a lengthy period of time while such a conversion is
undertaken, nor do I believe that the community would or should accept
such a solution were someone to propose it. I do believe that there
are some difficult problems to be solved in order to make it work at
all, and I believe even more strongly that a good deal of follow-up
work will be necessary to reap the potential benefits of such a
change. I also believe that it's absolutely necessary that both models
coexist side by side for a period of time. I think we will eventually
want to abandon the multi-process model, because I think over time the
benefits of using threads will accumulate until they are overwhelming
and the process model will end up appearing to be an obstacle to
progress. However, I don't think we'll be able to do that particularly
soon, because I think it's going to take a while to fully stabilize
the thread model even as far as the core code is concerned, and
extensions will take even longer to catch up. I realize Heikki in
particular is hoping for a quick transition; I don't see that as
feasible, but like everything else about this, opinions are going to
vary.

Obligatory disclaimer: Everything above (and below) is just a
statement of what I believe, and everyone is free to dispute it. As
always, I cannot speak to objective truth, but I can tell you what I
think.

== The Extension Compatibility Problem ==

I don't know yet whether we're going to end up with a system where the
same build of PostgreSQL can produce processes or threads depending on
configuration or whether it's going to be a build option, but I'm
guessing the latter is more likely. Certainly, if an extension is
assuming that its global variables are session-local and they suddenly
become global to the cluster, chaos will ensue. The same is true for
the core code, and will need to be solved by annotating global
variables so that the appropriate ones can be made thread-local and
the others can be given whatever treatment is appropriate considering
how they are used. The details of how this annotation system will work
are TBD, but the point for this email is that extension global
variables, including file-level globals, will need the same kinds of
annotations that we use in the core code in order to work. Other
adjustments may also be needed.

I think there are two severable problems here. One is that, if an
extension is built for use with a non-threaded PostgreSQL, we
shouldn't permit it to be used with a threaded PostgreSQL, even if the
major version and other details are compatible. Hence, threading or
the lack of it must become part of the data set up by PG_MODULE_MAGIC.
Maybe this problem goes away if we decide that threads-vs-processes is
a configuration option rather than a build-time option, but even then,
we might still end up with a build-time option indicating whether
threads are even a possibility, so I think it's pretty likely we need
this in some form. If or when the process model eventually dies, then
we can take this out again.

The other problem is that we probably want a way for extensions to
signal that they are believed to work with threading. It's a little
bit debatable whether this is a good idea, because (1) some people are
going to blindly state that their extension works fine with threading
even if they haven't actually made the necessary changes and (2) one
could simply declare that making an extension thread-ready is part of
supporting whatever PostgreSQL release adds threading as an option and
(3) one could also declare that extension authors should just document
what they do or don't support rather than doing anything in code.
However, I think it makes sense to try to make extensions fail to
compile against a threaded PostgreSQL unless the extension declares
that it supports such builds of PostgreSQL. I think that by doing
this, we'll make it a LOT easier for packagers to find out what
extensions still need updating. A packager could possibly do light
testing of an extension and fail to miss the fact that the extension
doesn't actually work properly against a threaded PostgreSQL, but you
can't fail to notice a compile failure. There's still going to be some
chaos because of (1), but I think we can mitigate that with good
messaging: documentation, wiki pages, and blog posts explaining that
this is coming and how to adapt to it can help a lot, IMHO.

== Extension Compatibility Solutions ==

The attached patch is a sketch of one possible approach: PostgreSQL
signals whether it is multithreaded by defining or not defining
PG_MULTITHREADING in pg_config_manual.h, and an extension signals
thread-readiness by defining PG_THREADSAFE_EXTENSION before including
any PostgreSQL headers other than postgres.h. If PostgreSQL is built
multithreaded and the extension does not signal thread-safety, you get
something like this:

../pgsql/src/test/modules/dummy_seclabel/dummy_seclabel.c:20:1: error:
static assertion failed due to requirement '1 == 0': must define
PG_THREADSAFE_EXTENSION or use unthreaded PostgreSQL
PG_MODULE_MAGIC;

I'm not entirely happy with this solution because the results are
confusing if PG_THREADSAFE_EXTENSION is declared after including
fmgr.h. Perhaps this can be adequately handled by documenting and
demonstrating the right pattern, or maybe somebody has a better idea.

Another idea I considered was to replace the PG_MODULE_MAGIC;
declaration with something that allows for arguments, like
PG_MODULE_MAGIC(.process_model = false, .thread_model = true). But on
further reflection, that seems like the wrong thing. AFAICS, that's
going to tell you at runtime about something that you really want to
know at compile time. But this kind of idea might need more thought if
we decide that the *same* build of PostgreSQL can either launch
processes or threads per session, because then we'd to know which
extensions were available in whichever mode applied to the current
session.

That's all I've got for today.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dmitry Dolgov
Дата:
Сообщение: Re: Test 031_recovery_conflict fails when a conflict counted twice
Следующее
От: Jelte Fennema-Nio
Дата:
Сообщение: Re: Extension security improvement: Add support for extensions with an owned schema