Re: One process per session lack of sharing

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: One process per session lack of sharing
Дата
Msg-id CAMsr+YGZiHdKHrWHurhp3841Rngcrz=+-mhWK6pz618jAHqw2w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: One process per session lack of sharing  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: One process per session lack of sharing  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers


On 18 July 2016 at 02:27, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jul 15, 2016 at 4:28 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> I don't think anyone's considering moving from multi-processing to
> multi-threading in PostgreSQL. I really, really like the protection that the
> shared-nothing-by-default process model gives us, among other things.

We get some very important protection by having the postmaster in a
separate address space from the user processes, but separating the
other backends from each other has no value.  If one of the backends
dies, we take provisions to make sure they all die, which is little or
no different from what would happen if we had the postmaster as one
process and all of the other backends as threads within a second
process.  As far as I can see, running each and every backend in a
separate process has downsides but no upsides.  It slows down the
system and makes it difficult to share data between processes without
much in the way of benefits.

That's a good point, the random memory overwrites that Tom mentioned aside. 

I think that memory leaks we currently ignore as insignificant one-time losses that'll get cleaned up on backend exit will become relvant so some cleanup work would be needed there, but it's not technically difficult and valgrind is a wonderful thing.

One minor thing to be aware of is that cPython has a horrible threading design with a single massive lock for the whole interpreter. PL/Python will perform apallingly in a threaded context. It looks like it has more recently gained support for separate interpreters (each with their own GIL) within a single process though, if I'm reading the docs correctly:


so maybe it'd just require some plpython tweaks to switch to the right interpreter for the current backend.

I don't think plpython issues are a huge cause for hand-wringing anyway, really. TBH, if it really is practical to move Pg to a threaded model and not as hard as I thought, I do see the advantages. Mainly because I'd _love_ efficient embedded Java and C# runtime, it's downright embarrassing to tell people they should write procs in Perl or Python (or TCL!) if they can't do it in plpgsql.

If we can stand the macro code pollution, it'd be interesting to do a minimal conversion as an experiment and let some buildfarm members start digesting it once it runs. Find issues slowly over time, make it an experimental build option. We could do things like transparently use threadids whereever we currently expose a PID in the UI, change the bgworker backend to spawn threads rather than procs, etc. (One nice consequence would be the possibility of getting rid of most of EXEC_BACKEND since the postmaster launching the backend proc would be a one-time thing, once it was stable enough to make the thread model the only option on Windows).

It'd be very helpful to find a nice portable library that abstracts platform threading specifics and has a less horrid API than pthreads, rather than having to DIY. (See e.g.: https://software.intel.com/en-us/blogs/2006/10/19/why-windows-threads-are-better-than-posix-threads ). Or use C++11 <thread> :p [dives for fireproof suit]

> Where I agreed with you, and where I think Robert sounded like he was
> agreeing, was that our current design where we have one executor per user
> sessions and can't suspend/resume sessions is problematic.

The problems are very closely related.  The problem with suspending
and resuming sessions is that you need to keep all of the session's
global variable contents (except for any caches that are safe to
rebuild) until the session is resumed; and we have no way of
discovering all of the global variables a process is using and no
general mechanism that can be used to serialize and deserialize them.

Right. In our per-process model we'd have to provide a subsystem that serializes/deserializes session state and has callbacks for extensions to register their own save/restore callbacks. We'd probably want it even if the improbable happened and Pg moved to threads - which until this thread I would've considered the same as saying "pigs fly" or "the US gets universal health care". Since we'd want to be able to essentially page out idle sessons, though it'd be more of a nice-to-have than a necessity.

Individual Pg subsystems would probably register callbacks with the save/restore subsystem, rather than trying to have the save/restore subsystem have the knowledge to reach into all the other subsystems. Since we'd block save/restore when there's an active query of course, it might not actually be that bad. Especially if we started with save/restore only on idle state, not idle-in-transaction, i.e. start with transaction pooling.

Since I got started with Pg, I've taken it as given that PostgreSQL Will Never Use Threads, Don't Even Talk About It. As taboo as query hints or more so. Is this actually a serious option?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: One process per session lack of sharing
Следующее
От: Noah Misch
Дата:
Сообщение: Re: dumping database privileges broken in 9.6