Обсуждение: One process per session lack of sharing

Поиск

Список

Период

Сортировка

One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

12 июля 2016 г., 16:59:53

Hi

Is  there  any  plan  to  implement  "session  per  thread" or "shared
sessions between thread"?
We  have analyzed  the  ability to contribute  pgSql to jvm bytecode compiler but with
current   thread   model  this  idea  is  far  from optimal.(Vm can be different of course.
But currently we use oracle and jvm is important for us)

We have faced with some lack of sharing resources.
So in our test memory usage per session:
Oracle: about 5M
MSSqlServer: about 4M
postgreSql: about 160М

It's discussed on pgsql-general@postgresql.org:
http://www.mail-archive.com/pgsql-general@postgresql.org/msg206452.html


>I think the "problem" that he is having is fixable only by changing how
>PostgreSQL itself works. His problem is a PL/pgSQL function which is 11K
>lines in length. When invoked, this function is "compiled" into a large
>tokenized parse tree. This parse tree is only usable in the session which
>invoked the the function. Apparently this parse tree takes a lot of memory.
>And "n" concurrent users of this, highly used, function will therefore
>require "n" times as much memory because the parse tree is _not_
>shareable.  This is explained in:
>https://www.postgresql.org/docs/9.5/static/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING

Next  interesting  answer(from  Karl  Czajkowski  <karlcz@isi.edu>  in
private):
>  But, I search the
> archives of the mailing list, and when others have previously
> suggested such caching or reuse, it was immediately shot down by core
> developers.

Re: One process per session lack of sharing

От

Tom Lane

Дата:

12 июля 2016 г., 17:19:07

AMatveev@bitec.ru writes:
> Is  there  any  plan  to  implement  "session  per  thread" or "shared
> sessions between thread"?

No, not really.  The amount of overhead that would add --- eg, the need
for locking on what used to be single-use caches --- makes the benefit
highly questionable.  Also, most people who need this find that sticking
a connection pooler in front of the database solves their problem, so
there's not that much motivation to do a ton of work inside the database
to solve it there.
        regards, tom lane

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

12 июля 2016 г., 17:41:05

Hi

> AMatveev@bitec.ru writes:
>> Is  there  any  plan  to  implement  "session  per  thread" or "shared
>> sessions between thread"?

> No, not really.  The amount of overhead that would add --- eg, the need
> for locking on what used to be single-use caches --- makes the benefit
> highly questionable.
A two-layer cache is the best answer.
>  Also, most people who need this find that sticking
> a connection pooler in front of the database solves their problem
It has some disadvantages. Lack of temporary table for example
Practical  usage  of  that  table  with  connection  poller is  highly
questionable.
And so on.
> , so
> there's not that much motivation to do a ton of work inside the database
> to solve it there.
It is clear. Thank you.


--

Re: One process per session lack of sharing

От

Robert Haas

Дата:

13 июля 2016 г., 22:59:35

On Tue, Jul 12, 2016 at 9:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> AMatveev@bitec.ru writes:
>> Is  there  any  plan  to  implement  "session  per  thread" or "shared
>> sessions between thread"?
>
> No, not really.  The amount of overhead that would add --- eg, the need
> for locking on what used to be single-use caches --- makes the benefit
> highly questionable.  Also, most people who need this find that sticking
> a connection pooler in front of the database solves their problem, so
> there's not that much motivation to do a ton of work inside the database
> to solve it there.

I agree that there's not really a plan to implement this, but I don't
agree that connection pooling solves the whole problem.  Most people
can't get by with statement pooling, so in practice you are looking at
transaction pooling or session pooling.  And that means that you can't
really keep the pool size as small as you'd like because backends can
be idle in transaction for long enough to force the pool size to be
pretty large.  Also, pooling causes the same backends to get reused
for different sessions which touch different relations and different
functions so that, for example, the relcache and the PL/pgsql function
caches grow until every one of those sessions has everything cached
that any client needs.  That can cause big problems.

So, I actually think it would be a good idea to think about this.  The
problem, of course, is that as long as we allow arbitrary parts of the
code - including extension code - to declare global variables and
store arbitrary stuff in them without any coordination, it's
impossible to imagine hibernating and resuming a session without a
risk of things going severely awry.  This was a major issue for
parallel query, but we've solved it, mostly, by designating the things
that rely on global variables as parallel-restricted, and there
actually aren't a ton of those.  So I think it's imaginable that we
can get to a point where we can, at least in some circumstances, let a
backend exit and reconstitute its state at a later time.  It's not an
easy project, but I think it is one we will eventually need to do.
Insisting that the current model is working is just sticking our head
in the sand.  It's mostly working, but there are workloads where it
fails badly - and competing database products survive a number of
scenarios where we just fall on our face.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: One process per session lack of sharing

От

Craig Ringer

Дата:

14 июля 2016 г., 08:34:43

On 14 July 2016 at 03:59, Robert Haas <robertmhaas@gmail.com> wrote:

I agree that there's not really a plan to implement this, but I don't
agree that connection pooling solves the whole problem. Most people
can't get by with statement pooling, so in practice you are looking at
transaction pooling or session pooling. And that means that you can't
really keep the pool size as small as you'd like because backends can
be idle in transaction for long enough to force the pool size to be
pretty large. Also, pooling causes the same backends to get reused
for different sessions which touch different relations and different
functions so that, for example, the relcache and the PL/pgsql function
caches grow until every one of those sessions has everything cached
that any client needs. That can cause big problems.

So, I actually think it would be a good idea to think about this.

I agree. It's been on my mind for a while, but I've been assuming it's likely to involve such architectural upheaval as to be impractical.

Right now PostgreSQL conflates "user session state" and "execution engine" into one process. This means we need an external connection pooler to handle things if we want more user connections than we can efficiently handle in terms of number of executors. Current poolers don't do much to keep track of user state, they just arbitrate access to executors and expect applications to re-establish any needed state (SET vars, LISTEN, etc) or not use features that require persistence across the current pooling level.

This leaves users in the hard position of using very high, inefficient max_connections values to keep track of application<->DB state or jump through awkward hoops to use transaction pooling, either at the application level (Java appserver pools, etc) or through a proxy.

If using the high max_connections approach the user must also ensure that they don't have all those max_connections actually doing work at the same time using some kind of external coordination. Otherwise they'll thrash the server and face out of memory issues (especially with our rather simplistic work_mem management, etc) and poor performance.

The solution, to me, is to separate "user state" and "executor". Sounds nice, but we use global variables _everywhere_ and it's assumed throughout the code that we have one user session for the life of a backend, though with some exceptions for SET SESSION AUTHORIZATION. It's not likely to be fun.

The
problem, of course, is that as long as we allow arbitrary parts of the
code - including extension code - to declare global variables and
store arbitrary stuff in them without any coordination, it's
impossible to imagine hibernating and resuming a session without a
risk of things going severely awry.

Yeah. We'd definitely need a session state management mechanism with save and restore functionality.

There's also stuff like:

* LISTEN

* advistory locking at the session level

* WITH HOLD cursors

that isn't simple to just save and restore. Those are some of the same things that are painful with transaction pooling right now.

This was a major issue for
parallel query, but we've solved it, mostly, by designating the things
that rely on global variables as parallel-restricted, and there
actually aren't a ton of those. So I think it's imaginable that we
can get to a point where we can, at least in some circumstances, let a
backend exit and reconstitute its state at a later time. It's not an
easy project, but I think it is one we will eventually need to do.

I agree on both points, but I think "not easy" is rather an understatement.

Starting with a narrow scope would help. Save/restore GUCs and the other easy stuff, and disallow sessions that are actively LISTENing, hold advisory locks, have open cursors, etc from being saved and restored.

BTW, I think this would also give us a useful path toward allowing connection poolers to change the active user and re-authenticate on an existing backend. Right now you have to use SET ROLE or SET SESSION AUTHORIZATION (ugh) and can't stop the client you hand the connection to from just RESETing back to the pooler's user and doing whatever it wants.

Insisting that the current model is working is just sticking our head
in the sand. It's mostly working, but there are workloads where it
fails badly - and competing database products survive a number of
scenarios where we just fall on our face.

Yep, and like parallel query it's a long path, but it's one we've got to face sooner or later.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: One process per session lack of sharing

От

Craig Ringer

Дата:

14 июля 2016 г., 08:48:44

On 12 July 2016 at 21:57, <AMatveev@bitec.ru> wrote:

Hi

Is there any plan to implement "session per thread" or "shared
sessions between thread"?

As has been noted by others, there isn't any such plan right now.

PostgreSQL isn't threaded. It uses a multi-processing shared-nothing-by-default memory with explicit shared memory, plus copy-on-write memory forking from the postmaster for initial backend state (except on Windows, where we emulate that). It relies on processes being cheap and light-weight.

This is a very poor fit with Java's thread-based shared-by-default model with expensive heavyweight process startup and cheap threads. Process forking doesn't clone all threads and the JVM has lots of worker threads, so we can't start the JVM once in the postmaster then clone it with each forked postgres backend. Plus the JVM just isn't designed to cope with that and would surely get thoroughly confused when its file handles are cloned, its process ID changes, etc. This is one of the reasons PL/Java has never really taken off. We can mitigate the JVM startup costs a bit by preloading the JVM libraries into the postmaster and using the JVM's base class library preloading, but unless you're running trivial Java code you still do a lot of work at each JVM start after the postgres backend forks.

We have analyzed the ability to contribute pgSql to jvm bytecode compiler but with
current thread model this idea is far from optimal.(Vm can be different of course.
But currently we use oracle and jvm is important for us)

Yep, that's a real sticking point for a number of people.

The usual solution at this point is to move most of the work into an application-server mid-layer. That moves work further away from the DB, which has its own costs, and isn't something you're likely to be happy with if you're looking at things like optimising PL/PgSQL with a bytecode compiler. But it's the best we have right now.

We have faced with some lack of sharing resources.
So in our test memory usage per session:
Oracle: about 5M
MSSqlServer: about 4M
postgreSql: about 160М

Yep, that sounds about right. Unfortunately.

You may be able to greatly reduce that cost if you can store your cached compiled data in a shared memory segment created by your extension. This will get a bit easier with the new dynamic shared memory infrastructure, but it's going to be no fun at all to make that play with the JVM. You'll probably need a lot of JNI.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: One process per session lack of sharing

От

Vladimir Sitnikov

Дата:

14 июля 2016 г., 09:28:50

Craig>That moves work further away from the DB, which has its own costs, and isn't something you're likely to be happy with if you're looking at things like optimising PL/PgSQL with a bytecode compiler. But it's the best we have right now.

What if JVM was started within a background worker?

Then JVM can spawn several threads that serve PL requests on a "thread per backend" basis.

Craig>You may be able to greatly reduce that cost if you can store your cached compiled data in a shared memory segment created by your extension.

Craig>This will get a bit easier with the new dynamic shared memory infrastructure, but it's going to be no fun at all to make that play with the JVM. You'll probably need a lot of JNI.

There's https://github.com/jnr/jnr-ffi that enables to call C functions without resorting to writing JNI wrappers.

Vladimir

Re: One process per session lack of sharing

От

Craig Ringer

Дата:

14 июля 2016 г., 09:35:35

On 14 July 2016 at 14:28, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:

Craig>That moves work further away from the DB, which has its own costs, and isn't something you're likely to be happy with if you're looking at things like optimising PL/PgSQL with a bytecode compiler. But it's the best we have right now.

What if JVM was started within a background worker?
Then JVM can spawn several threads that serve PL requests on a "thread per backend" basis.

You can't really execute the plpgsql-compiled-to-bytecode outside the user session, so you need a JVM in it anyway.

You probably could have a bgworker or pool of bgworkers doing your plpgsql compilation and caching. But because your plpgsql might reference uncommitted catalog entries in the local backend, you'd the bgworker to join an exported snapshot from the backend you're compiling the plpgsql for. If it doesn't let you avoid having a jvm in each backend it's not likely to be too useful.

Craig>You may be able to greatly reduce that cost if you can store your cached compiled data in a shared memory segment created by your extension.
Craig>This will get a bit easier with the new dynamic shared memory infrastructure, but it's going to be no fun at all to make that play with the JVM. You'll probably need a lot of JNI.

There's https://github.com/jnr/jnr-ffi that enables to call C functions without resorting to writing JNI wrappers.

Yes, and JNA as well.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

14 июля 2016 г., 11:43:14

Hi

> On Tue, Jul 12, 2016 at 9:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> AMatveev@bitec.ru writes:
>>> Is  there  any  plan  to  implement  "session  per  thread" or "shared
>>> sessions between thread"?
>>...
>> so
>> there's not that much motivation to do a ton of work inside the database
>> to solve it there.

> I agree that there's not really a plan to implement this, but I don't
> ...

> So, I actually think it would be a good idea to think about this.

I just want to note that converting global variables to  thread-specific variables.
It's large work offcourse.
But it's not seemed to be a ton of work.
And it's the most part of refactoring for  "session  per  thread".
Offcourse that's not all.
But it  seemed to be the most valuable reason not to do that work.

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

14 июля 2016 г., 15:16:52

Hi

>  It's mostly working, but there are workloads where it
> fails badly - and competing database products survive a number of
> scenarios where we just fall on our face.

> So, I actually think it would be a good idea to think about this.

Just to think.

http://www.tiobe.com/tiobe_index

The pl/sql has 18-th position.
Where is pgSql.

I've  looked  up the Ide for pgSql. If compare with oracle I can say
there  is  not  tools  which  can  compete  with "PlSql developer" for
example. (functinality / price)

https://www.allroundautomations.com/plsqldev.html?gclid=CjwKEAjw8Jy8BRCE0pOC9qzRhkMSJABC1pvJepfRpWeyMJ7CTZzlQE_PojlBO0vqGIZvVSW4jiQxShoC4PLw_wcB
Why?
May it because choosing another database is more profitable?

I can't say for others, but for us:
Offcourse We can implement some of our task in postgreSql.
But when I think on full migration, it's just not real.
We can contribute something but we can't work against postgreSql architecture.
Our  calculation  shows  that  it is cheaper to implement "Session per
thread"   themselfs for example.   But  it's more cheaper to buy Oracle(Even if we
would write from scratch).
And there is just no customers which want to pay for that.
Note, we don't have enough skill at postgreSql and the think that postgresql core team may do for a month, we can do
foryears.
 
So  in  our  layer  we just can't attract resource for that task.

At  other side there is people who have infrastructure, skills and experience but they
fill comfortable as is ""

> there's not that much motivation to do a ton of work inside the database
> to solve it there.
It's clear, they work on there task. We all work on our task.
But it's just a wall.
It's sad.

There is proverbial in russia: "It's shine and poverty of open source"

May be it is this case :)

Re: One process per session lack of sharing

От

Craig Ringer

Дата:

15 июля 2016 г., 11:28:22

On 14 July 2016 at 16:41, <AMatveev@bitec.ru> wrote:

Hi

> On Tue, Jul 12, 2016 at 9:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> AMatveev@bitec.ru writes:
>>> Is there any plan to implement "session per thread" or "shared
>>> sessions between thread"?
>>...
>> so
>> there's not that much motivation to do a ton of work inside the database
>> to solve it there.

> I agree that there's not really a plan to implement this, but I don't
> ...

> So, I actually think it would be a good idea to think about this.

I just want to note that converting global variables to thread-specific variables.

I don't think anyone's considering moving from multi-processing to multi-threading in PostgreSQL. I really, really like the protection that the shared-nothing-by-default process model gives us, among other things.

I'm personally not absolutely opposed to threading, but you'll find it hard to convince anyone it's worth the huge work required to ensure that everything in PostgreSQL is done thread-safely, adapt all our logic to handle thread IDs where we use process IDs, etc. It'd be a massive amount of work for no practical gain for most users, and a huge reliability loss in the short to medium term as we ironed out all the bugs.

Where I agreed with you, and where I think Robert sounded like he was agreeing, was that our current design where we have one executor per user sessions and can't suspend/resume sessions is problematic.

It's large work offcourse.
But it's not seemed to be a ton of work.

Er.... yeah, it really is. It's not just the mechanical changes. It's verifying that everything's correct on all the supported platforms. Ensuring that all the C library stuff we do is thread-safe, all the SSL stuff, etc. Getting rid of all the function-static variable use. Lots more.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

15 июля 2016 г., 12:31:28

Hi


>>>> Is  there  any  plan  to  implement  "session  per  thread" or "shared
>>>> sessions between thread"?


> I'm personally not absolutely opposed to threading, but you'll find
> it hard to convince anyone it's worth the huge work required to
> ensure that everything in PostgreSQL is done thread-safely
It's  clear  for  me, I understand that organizing that work is really very
hard. It's work for new segment of market in long perspective.
For   most  open  source  project this is very difficult. In some case
it may be not possible at all.

But  in the most cases there is proverb: "We make the road by walking on it"

It's very important just to start.

And may be the right start is to fix the Faq
https://wiki.postgresql.org/wiki/FAQ#Why_does_PostgreSQL_use_so_much_memory.3F
>Why does PostgreSQL use so much memory?
>Despite appearances, this is absolutely normal
It's not normal. It's "as is". You should use pgBouncer. See "Re: [HACKERS] One process per session lack of sharing"
And it is why
>there are workloads where it
>fails badly - and competing database products survive a number of
>scenarios where we just fall on our face


> Er.... yeah, it really is. It's not just the mechanical changes.
> It's verifying that everything's correct on all the supported
> platforms. Ensuring that all the C library stuff we do is
> thread-safe, all the SSL stuff, etc. Getting rid of all the
> function-static variable use. Lots more.
In the most cases the work can be done part by part.
May be there is such parts. It's not necessary to do everything at once.

Re: One process per session lack of sharing

От

Pavel Stehule

Дата:

15 июля 2016 г., 13:06:29

2016-07-15 11:29 GMT+02:00 <AMatveev@bitec.ru>:

Hi

>>>> Is there any plan to implement "session per thread" or "shared
>>>> sessions between thread"?

> I'm personally not absolutely opposed to threading, but you'll find
> it hard to convince anyone it's worth the huge work required to
> ensure that everything in PostgreSQL is done thread-safely
It's clear for me, I understand that organizing that work is really very
hard. It's work for new segment of market in long perspective.
For most open source project this is very difficult. In some case
it may be not possible at all.

But in the most cases there is proverb: "We make the road by walking on it"

It's very important just to start.

I disagree - there is lot of possible targets with much higher benefits - columns storage, effective execution - compiled execution, implementation of temporal databases, better support for dynamic structures, better support for XML, JSON, integration of connection pooling, ...

There is only few use cases - mostly related to Oracle emulation when multi threading is necessary - and few can be solved better - PLpgSQL to C compilation and similar techniques.

The organization of work is hard, but pretty harder is doing this work - and doing it without impact on current code base, current users. MySQL is thread based database - is better than Postgres, or there is more users migrated from Orace? Not.

Regards

Pavel

And may be the right start is to fix the Faq
https://wiki.postgresql.org/wiki/FAQ#Why_does_PostgreSQL_use_so_much_memory.3F
>Why does PostgreSQL use so much memory?
>Despite appearances, this is absolutely normal
It's not normal. It's "as is". You should use pgBouncer. See "Re: [HACKERS] One process per session lack of sharing"
And it is why
>there are workloads where it
>fails badly - and competing database products survive a number of
>scenarios where we just fall on our face

> Er.... yeah, it really is. It's not just the mechanical changes.
> It's verifying that everything's correct on all the supported
> platforms. Ensuring that all the C library stuff we do is
> thread-safe, all the SSL stuff, etc. Getting rid of all the
> function-static variable use. Lots more.
In the most cases the work can be done part by part.
May be there is such parts. It's not necessary to do everything at once.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

15 июля 2016 г., 13:22:38

Hi


> I disagree - there is lot of possible targets with much higher
> benefits - columns storage, effective execution - compiled
> execution, implementation of temporal databases, better support for
> dynamic structures, better support for XML, JSON, integration of connection pooling, ...
Off course  the  task is different so optimal configuration is different too.
So the best balance between process per thread can change.
But now he is in one extreme point.


> There is only few use cases - mostly related to Oracle emulation
It's few cases for one and it's most cases for others.
> when multi threading is necessary - and few can be solved better -
> PLpgSQL to C compilation and similar techniques.
It's few cases for one and it's most cases for others.
In our cases we just buy oracle and it's would be cheeper.
Off  course  if  our customers for some reason would agree to pay  for that
technique. We have nothing against.

> The organization of work is hard, but pretty harder is doing this
> work - and doing it without impact on current code base, current
> users. MySQL is thread based database - is better than Postgres, or
> there is more users migrated from Orace? Not.

We want to decide our task by PostgreSql as easy as by Oracle.
So you can say  You should buy oracle and You will be right.

I'm just interested if this is the position of the majority.

Re: One process per session lack of sharing

От

Pavel Stehule

Дата:

15 июля 2016 г., 13:39:17

2016-07-15 12:20 GMT+02:00 <AMatveev@bitec.ru>:

Hi

> I disagree - there is lot of possible targets with much higher
> benefits - columns storage, effective execution - compiled
> execution, implementation of temporal databases, better support for
> dynamic structures, better support for XML, JSON, integration of connection pooling, ...
Off course the task is different so optimal configuration is different too.
So the best balance between process per thread can change.
But now he is in one extreme point.

> There is only few use cases - mostly related to Oracle emulation
It's few cases for one and it's most cases for others.
> when multi threading is necessary - and few can be solved better -
> PLpgSQL to C compilation and similar techniques.
It's few cases for one and it's most cases for others.
In our cases we just buy oracle and it's would be cheeper.
Off course if our customers for some reason would agree to pay for that
technique. We have nothing against.

> The organization of work is hard, but pretty harder is doing this
> work - and doing it without impact on current code base, current
> users. MySQL is thread based database - is better than Postgres, or
> there is more users migrated from Orace? Not.

We want to decide our task by PostgreSql as easy as by Oracle.
So you can say You should buy oracle and You will be right.

Can be nice, if we can help to all Oracle users - but it is not possible in this world :( - there is lot of barriers - threading is only one, second should be different design of PL/SQL - it is based on out processed, next can be libraries, JAVA integration, and lot of others. I believe so lot of users can be simple migrated, NTT has statistics - 60% is migrated just with using Orafce. But still there will be 10% where migration is not possible without significant refactoring. I don't believe so is cheaper to modify Postgres to support threads than modify some Oracle applications.

The threading for Postgres is not small projects - it can require hundreds man days.

I'm just interested if this is the position of the majority.

sure - it is my personal opinion.

Regards

Pavel

Re: One process per session lack of sharing

От

Craig Ringer

Дата:

15 июля 2016 г., 14:03:18

On 15 July 2016 at 18:05, Pavel Stehule <pavel.stehule@gmail.com> wrote:

There is only few use cases - mostly related to Oracle emulation when multi threading is necessary - and few can be solved better - PLpgSQL to C compilation and similar techniques.

Right.

If amatveev (username, unsure of full name) wants to improve PL/PgSQL performance and the ability of a JVM to share resources between backends, then it would be more productive to focus on that than on threading.

As for "fixing" the FAQ... for the great majority of people the FAQ entry on memory use is accurate. Sure, if you load a JVM into each backend and load a bunch of cached data in it, you'll get bad memory use. So don't do that. You're not measuring PostgreSQL, you're measuring PostgreSQL-plus-my-JVM-extension. Why does it use so much memory? 'cos it loads a whole bunch of stuff into each backend.

Now, there are other cases where individual PostgreSQL backends use lots of memory. But that FAQ entry refers to the common misconception that each PostgreSQL process's reported memory use is the actual system memory it uses. That isn't the case because most systems account badly for shared memory, and it confuses a lot of people. The FAQ entry doesn't need fixing.

Maybe the FAQ entry needs rewording to qualify it so it says that "in most cases" it's just shared memory mis-accounting. But that's about it.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

15 июля 2016 г., 14:27:19

Hi


> Can be nice, if we can help to all Oracle users - but it is not
> possible in this world :( - there is lot of barriers - threading is
> only one, second should be different design of PL/SQL - it is based
> on out processed, next can be libraries, JAVA integration, and lot
> of others. I believe so lot of users can be simple migrated, NTT has
> statistics - 60% is migrated just with using Orafce. But still there
> will be 10% where migration is not possible without significant
> refactoring.

The most of our customers now use oracle enterprise edition.
You can know better how important this is.

But I agree with you that in other cases we can use PostgreSql.
We  can  use  postgreSql  with some disadvantages of pgBouncer anywhare
where  the  scalability  is not main risk.(Such customers usually don't
buy Enterprise)

>I don't believe so is cheaper to modify Postgres to
> support threads than modify some Oracle applications.

The key is Scaling.
Some parallels processing just can not be divorced from data without reducing performance.
It  very  difficult  question  would  be  it  possible  at  all to get
comparable performance at application server for such cases.
If we "inject" applications server to postgreSql for that scalability and functionality we need multithreading.

If customization for every project is not big.
It's may be tuned. But from some point the tuning is not profitable.
(The database works in 24x7 and we need the ability to fix bugs on the fly)
So If for some reason we would start to use postgresql.
There is always a question what to choose funcionality or scalability.
And usually our customers need both.

>I don't believe so is cheaper
For us it's may be not cheaper. It's just imposible.

Re: One process per session lack of sharing

От

Pavel Stehule

Дата:

15 июля 2016 г., 14:56:18

2016-07-15 13:25 GMT+02:00 <AMatveev@bitec.ru>:

Hi

> Can be nice, if we can help to all Oracle users - but it is not
> possible in this world :( - there is lot of barriers - threading is
> only one, second should be different design of PL/SQL - it is based
> on out processed, next can be libraries, JAVA integration, and lot
> of others. I believe so lot of users can be simple migrated, NTT has
> statistics - 60% is migrated just with using Orafce. But still there
> will be 10% where migration is not possible without significant
> refactoring.

The most of our customers now use oracle enterprise edition.
You can know better how important this is.

But I agree with you that in other cases we can use PostgreSql.
We can use postgreSql with some disadvantages of pgBouncer anywhare
where the scalability is not main risk.(Such customers usually don't
buy Enterprise)

>I don't believe so is cheaper to modify Postgres to
> support threads than modify some Oracle applications.

The key is Scaling.
Some parallels processing just can not be divorced from data without reducing performance.
It very difficult question would be it possible at all to get
comparable performance at application server for such cases.
If we "inject" applications server to postgreSql for that scalability and functionality we need multithreading.

but parallel processing doesn't requires threading support - see PostgreSQL 9.6 features.

I am not sure, but I am thinking so PL/SQL is based on processed and not on threads too. So maybe this discussion is little bit out, because we use different terms.

Regards

Pavel

If customization for every project is not big.
It's may be tuned. But from some point the tuning is not profitable.
(The database works in 24x7 and we need the ability to fix bugs on the fly)
So If for some reason we would start to use postgresql.
There is always a question what to choose funcionality or scalability.
And usually our customers need both.

>I don't believe so is cheaper
For us it's may be not cheaper. It's just imposible.

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

15 июля 2016 г., 14:57:18

Hi


> If amatveev (username, unsure of full name) wants to improve
> PL/PgSQL performance and the ability of a JVM to share resources
> between backends, then it would be more productive to focus on that than on threading.

Note, I've statred this post with
https://www.postgresql.org/message-id/flat/409604420.20160711111532%40bitec.ru#409604420.20160711111532@bitec.ru

Oracle: about 5M
MSSqlServer: about 4M
postgreSql: about 160М


It's 11K loc of pgSql.

And our code base is more than 4000k(for pgSql) lines of code.

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

15 июля 2016 г., 15:56:52

Hi


> but parallel processing doesn't requires threading support - see PostgreSQL 9.6 features.

To   share  dynamic  execution  code between threads much more easy(If sharing this code between process is possible).
There  is  many  other  interaction techniques  between threads which is
absence between process.

Re: One process per session lack of sharing

От

Pavel Stehule

Дата:

15 июля 2016 г., 16:07:57

2016-07-15 14:54 GMT+02:00 <AMatveev@bitec.ru>:

Hi

> but parallel processing doesn't requires threading support - see PostgreSQL 9.6 features.

To share dynamic execution code between threads much more easy(If sharing this code between process is possible).
There is many other interaction techniques between threads which is
absence between process.

This is true, only when data are immutable and in memory. Elsewhere it is false idea.

Regards

Pavel

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

15 июля 2016 г., 16:16:14

Hi


> This is true, only when data are immutable and in memory. Elsewhere it is false idea.

For   case   when    the  server  works  24x7  and you need ability to
fix bugs(or update) on the fly in any app code. It's usual.

Re: One process per session lack of sharing

От

james

Дата:

15 июля 2016 г., 19:43:40

On 15/07/2016 09:28, Craig Ringer wrote:
> I don't think anyone's considering moving from multi-processing to 
> multi-threading in PostgreSQL. I really, really like the protection 
> that the shared-nothing-by-default process model gives us, among other 
> things.
>
As I understand it, the main issue is that it is hard to integrate 
extensions that use heavyweight runtimes and are focussed on isolation 
within a virtual machine.  Its not just

Perhaps it would be possible for the postmaster (or a delegate process) 
to host such a runtime, and find a way for a user process that wants to 
use such a runtime to communicate with it, whether by copying function 
parameters over RPC or by sharing some of its address space explicitly 
to the runtime to operate on directly.

Such a host delegate process could be explicitly built with multithread 
support and not 'infect' the rest of the code with its requirements.

Using granular RPC is nice for isolation but I am concerned that the 
latencies might be high.

Re: One process per session lack of sharing

От

Pavel Stehule

Дата:

15 июля 2016 г., 20:02:27

2016-07-15 18:43 GMT+02:00 james <james@mansionfamily.plus.com>:

On 15/07/2016 09:28, Craig Ringer wrote:
I don't think anyone's considering moving from multi-processing to multi-threading in PostgreSQL. I really, really like the protection that the shared-nothing-by-default process model gives us, among other things.

As I understand it, the main issue is that it is hard to integrate extensions that use heavyweight runtimes and are focussed on isolation within a virtual machine. Its not just

Perhaps it would be possible for the postmaster (or a delegate process) to host such a runtime, and find a way for a user process that wants to use such a runtime to communicate with it, whether by copying function parameters over RPC or by sharing some of its address space explicitly to the runtime to operate on directly.

Such a host delegate process could be explicitly built with multithread support and not 'infect' the rest of the code with its requirements.

Using granular RPC is nice for isolation but I am concerned that the latencies might be high.

What I know about Oracle, PL/SQL, Java - all is executed as outprocess calls. I am sure, so PL doesn't share process with SQL engine there

Regards

Pavel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: One process per session lack of sharing

От

Pavel Stehule

Дата:

15 июля 2016 г., 21:01:38

2016-07-15 19:57 GMT+02:00 Матвеев Алексей <amatveev@global-system.ru>:

Hi

>What I know about Oracle, PL/SQL, Java - all is executed as outprocess calls. I am sure, so PL doesn't share process with SQL engine there

You are highly not correct.
Sorry, I will answer more when i come to work.

https://sqlmdx.net/2014/09/30/measuring-context-switches/

Regards

Pavel

Re: One process per session lack of sharing

От

Матвеев Алексей

Дата:

15 июля 2016 г., 21:04:28

Hi

>What I know about Oracle, PL/SQL, Java - all is executed as outprocess calls. I am sure, so PL doesn't share process
withSQL engine there 

You are highly not correct.
Sorry, I will answer more when i come to work.

Re: One process per session lack of sharing

От

Craig Ringer

Дата:

17 июля 2016 г., 10:21:28

On 16 July 2016 at 00:43, james <james@mansionfamily.plus.com> wrote:

On 15/07/2016 09:28, Craig Ringer wrote:
I don't think anyone's considering moving from multi-processing to multi-threading in PostgreSQL. I really, really like the protection that the shared-nothing-by-default process model gives us, among other things.

As I understand it, the main issue is that it is hard to integrate extensions that use heavyweight runtimes and are focussed on isolation within a virtual machine. Its not just

Perhaps it would be possible for the postmaster (or a delegate process) to host such a runtime, and find a way for a user process that wants to use such a runtime to communicate with it, whether by copying function parameters over RPC or by sharing some of its address space explicitly to the runtime to operate on directly.

It is, and the JVM supports that, but it's so costly that it would eliminate most of the benefits this user is seeking from the kind of sharing they want. It also needs at least a minimal JVM running in the target backends.

The issue here is an architectural mismatch between PostgreSQL and the JVM, made worse by the user's very stored-proc-heavy code. Some other runtime that's designed to co-operate with a multiprocessing environment could well be fine, but the JVM isn't. At least, the Sun/Oracle/OpenJDK JVM isn't.

They could explore doing their bytecode compilation for another runtime that's more friendly toward multiprocessing (maybe Mono? Haven't tried) and/or look at using PostgreSQL's shared memory facilities within their target runtime. It's not like this is insoluible without completely rebuilding PostgreSQL.

For them it'd be great if PostgreSQL used multi-threading instead of multi-processing, but it doesn't, and it's not likely to. So they've got to find other solutions to their difficulties within PostgreSQL or use another product.

Such a host delegate process could be explicitly built with multithread support and not 'infect' the rest of the code with its requirements.

Not if it lives in the postmaster. It'd have to be forked and communicated with at one remove.

Using granular RPC is nice for isolation but I am concerned that the latencies might be high.

Yep. When your goal is performance and you're trying to move stuff closer to the DB and out of an appserver, it's likely counterproductive.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: One process per session lack of sharing

От

Craig Ringer

Дата:

17 июля 2016 г., 10:23:47

On 15 July 2016 at 20:54, <AMatveev@bitec.ru> wrote:

Hi

> but parallel processing doesn't requires threading support - see PostgreSQL 9.6 features.

To share dynamic execution code between threads much more easy(If sharing this code between process is possible).
There is many other interaction techniques between threads which is
absence between process.

We have shared memory.

How do you think the buffer cache works?

Lots more could be shared, too. Cached plans, for example.

It's possible. You just have to change how you think about it, and you might not be able to do it within the framework of the JVM. So it's quite possibly not worth it for you. Nonetheless, don't assume it can't be done just because you can't do it the way you're used to thinking of doing it.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: One process per session lack of sharing

От

Jan Wieck

Дата:

17 июля 2016 г., 16:30:06

On Sun, Jul 17, 2016 at 3:23 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Lots more could be shared, too. Cached plans, for example.

But the fact that PostgreSQL has transactional DDL complicates things like

a shared plan cache and shared PL/pgSQL execution trees. Those things are

much easier in a trivial database implementation, where an ALTER TABLE is

just trampling over a running transaction.

Regards, Jan

Jan Wieck
Senior Postgres Architect

http://pgblog.wi3ck.info

Re: One process per session lack of sharing

От

Simon Riggs

Дата:

17 июля 2016 г., 16:44:17

On 12 July 2016 at 09:57, <AMatveev@bitec.ru> wrote:

We have faced with some lack of sharing resources.
So in our test memory usage per session:
Oracle: about 5M
MSSqlServer: about 4M
postgreSql: about 160М

Using shared resources also has significant problems, so care must be taken.

I think its clear that threading is out, but it is far from being the only solution to reducing the memory overhead of sharing.

Analysing the overhead and suggesting practical ways forward may help.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: One process per session lack of sharing

От

Robert Haas

Дата:

17 июля 2016 г., 21:27:36

On Fri, Jul 15, 2016 at 4:28 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> I don't think anyone's considering moving from multi-processing to
> multi-threading in PostgreSQL. I really, really like the protection that the
> shared-nothing-by-default process model gives us, among other things.

We get some very important protection by having the postmaster in a
separate address space from the user processes, but separating the
other backends from each other has no value.  If one of the backends
dies, we take provisions to make sure they all die, which is little or
no different from what would happen if we had the postmaster as one
process and all of the other backends as threads within a second
process.  As far as I can see, running each and every backend in a
separate process has downsides but no upsides.  It slows down the
system and makes it difficult to share data between processes without
much in the way of benefits.

> I'm personally not absolutely opposed to threading, but you'll find it hard
> to convince anyone it's worth the huge work required to ensure that
> everything in PostgreSQL is done thread-safely, adapt all our logic to
> handle thread IDs where we use process IDs, etc. It'd be a massive amount of
> work for no practical gain for most users, and a huge reliability loss in
> the short to medium term as we ironed out all the bugs.

It would actually be pretty simple to allow PostgreSQL to be compiled
to use either processes or threads, provided that you don't mind using
something like GCC's __thread keyword.  When compiling with threads,
slap __thread on every global variable we have (using some kind of
macro trick, no doubt), spawn threads instead of processes wherever
you like, and I think you're more or less done.  There could be some
problems with third-party libraries we use, but I bet there probably
wouldn't be all that many problems.  Of course, there's not
necessarily a whole lot of benefit to such a minimal transformation,
but you could certainly do useful things on top of it.  For example,
the parallel query code could arrange to pass pointers to existing
data structures in some cases instead of copying those data structures
as we do currently.  Spinning up a new thread and giving it pointers
to some of the old thread's data structures is probably a lot faster
than spinning up a new process and serializing and deserializing those
data structures, so you wouldn't necessarily have to do all that much
work before the "thread model" compile started to have noticeable
advantages over the "process model" compile.  You could pass tuples
around directly rather than by copying them, too.  A lot of things
that we might want to do in this area would expose us to the risk of
server-lifespan memory leaks, and we'd need to spend time and energy
figuring out how to minimize those risks, but enough other people have
written complex, long-running multithreaded programs that I think it
is probably possible to do so without unduly compromising reliability.

> Where I agreed with you, and where I think Robert sounded like he was
> agreeing, was that our current design where we have one executor per user
> sessions and can't suspend/resume sessions is problematic.

The problems are very closely related.  The problem with suspending
and resuming sessions is that you need to keep all of the session's
global variable contents (except for any caches that are safe to
rebuild) until the session is resumed; and we have no way of
discovering all of the global variables a process is using and no
general mechanism that can be used to serialize and deserialize them.
The problem with using threads is that code which uses global
variables will not be thread-safe unless all of those variables are
thread-local.  Getting our hands around the uncontrolled use of global
variables - doubtless at the risk of breaking third-party code - seems
crucial.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: One process per session lack of sharing

От

Jim Nasby

Дата:

17 июля 2016 г., 22:04:57

On 7/14/16 12:34 AM, Craig Ringer wrote:
> Starting with a narrow scope would help. Save/restore GUCs and the other
> easy stuff, and disallow sessions that are actively LISTENing, hold
> advisory locks, have open cursors, etc from being saved and restored.

Along the lines of narrow scope... I wonder about allowing functions to 
execute in a separate process that communicates back to the main 
backend. That would allow unsafe languages to operate under a different 
OS user that was tightly restricted (ie: nobody/nogroup), but it could 
also allow for a pool of "function executors". Depending on how it was 
structured, it might also insulate the database from having to panic if 
a function crashed it's process.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: One process per session lack of sharing

От

Tom Lane

Дата:

17 июля 2016 г., 22:13:03

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jul 15, 2016 at 4:28 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> I don't think anyone's considering moving from multi-processing to
>> multi-threading in PostgreSQL. I really, really like the protection that the
>> shared-nothing-by-default process model gives us, among other things.

> We get some very important protection by having the postmaster in a
> separate address space from the user processes, but separating the
> other backends from each other has no value.

I do not accept that proposition in the least.  For one thing, debugging
becomes an order of magnitude harder when you've got multiple threads
in the same address space: you have essentially zero guarantees about
what one thread might have done to the supposedly-private state of
another one.

> ... enough other people have
> written complex, long-running multithreaded programs that I think it
> is probably possible to do so without unduly compromising reliability.

I would bet that every single successful project of that sort has been
written with threading in mind from the get-go.  Trying to retro-fit
threading onto thirty years' worth of single-threaded coding is a recipe
for breaking your project; even if you had control of all the code running
in the address space, which we assuredly do not.
        regards, tom lane

Re: One process per session lack of sharing

От

Robert Haas

Дата:

18 июля 2016 г., 04:34:07

On Sun, Jul 17, 2016 at 3:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jul 15, 2016 at 4:28 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>>> I don't think anyone's considering moving from multi-processing to
>>> multi-threading in PostgreSQL. I really, really like the protection that the
>>> shared-nothing-by-default process model gives us, among other things.
>
>> We get some very important protection by having the postmaster in a
>> separate address space from the user processes, but separating the
>> other backends from each other has no value.
>
> I do not accept that proposition in the least.  For one thing, debugging
> becomes an order of magnitude harder when you've got multiple threads
> in the same address space: you have essentially zero guarantees about
> what one thread might have done to the supposedly-private state of
> another one.

Well, that's true, in theory.  In practice, random memory clobbers are
a pretty rare type of bug.  The chances that thread A crashed because
thread B overwrote its supposedly-private state are just not very
high.  Also, such bugs are extremely hard to troubleshoot even when
there is only process and one thread involved, so it's not like things
are a rose garden today.  I don't buy the argument that it's worth
giving up arbitrary amounts of performance and functionality for this.

>> ... enough other people have
>> written complex, long-running multithreaded programs that I think it
>> is probably possible to do so without unduly compromising reliability.
>
> I would bet that every single successful project of that sort has been
> written with threading in mind from the get-go.  Trying to retro-fit
> threading onto thirty years' worth of single-threaded coding is a recipe
> for breaking your project; even if you had control of all the code running
> in the address space, which we assuredly do not.

I admit that it is risky, but I think there are things that could be
done to limit the risk.  I don't believe we can indefinitely continue
to ignore the potential performance benefits of making a switch like
this.  Breaking a thirty-year old code base irretrievably would be
sad, but letting it fade into irrelevance because we're not willing to
make the architecture changes that are needed to remain relevant would
be sad, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: One process per session lack of sharing

От

Robert Haas

Дата:

18 июля 2016 г., 04:34:26

On Sun, Jul 17, 2016 at 3:04 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 7/14/16 12:34 AM, Craig Ringer wrote:
>> Starting with a narrow scope would help. Save/restore GUCs and the other
>> easy stuff, and disallow sessions that are actively LISTENing, hold
>> advisory locks, have open cursors, etc from being saved and restored.
>
> Along the lines of narrow scope... I wonder about allowing functions to
> execute in a separate process that communicates back to the main backend.
> That would allow unsafe languages to operate under a different OS user that
> was tightly restricted (ie: nobody/nogroup), but it could also allow for a
> pool of "function executors". Depending on how it was structured, it might
> also insulate the database from having to panic if a function crashed it's
> process.

You can do this sort of thing with background workers today.  Just
create a parallel context and set the entrypoint to code that will
execute the guts of the function.  Retrieve the results using a
shm_mq.  With somewhat more work, you could have persistent background
workers that get reused for one function call after another instead of
being continually spun up and torn down.  However, I suspect the IPC
costs would make this rather slow.  Thomas Munro mentioned to me an
IPC facility called "doors" a while back which, if I understood him
correctly, is supposed to let you do a remote procedure call which
also transfers the current processes' time slice to the process on the
other end of the door.  Maybe that would be faster, and fast enough;
or maybe not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: One process per session lack of sharing

От

Jan Wieck

Дата:

18 июля 2016 г., 05:00:21

On Sun, Jul 17, 2016 at 9:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Jul 17, 2016 at 3:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jul 15, 2016 at 4:28 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>>> I don't think anyone's considering moving from multi-processing to
>>> multi-threading in PostgreSQL. I really, really like the protection that the
>>> shared-nothing-by-default process model gives us, among other things.
>
>> We get some very important protection by having the postmaster in a
>> separate address space from the user processes, but separating the
>> other backends from each other has no value.
>
> I do not accept that proposition in the least. For one thing, debugging
> becomes an order of magnitude harder when you've got multiple threads
> in the same address space: you have essentially zero guarantees about
> what one thread might have done to the supposedly-private state of
> another one.

Well, that's true, in theory. In practice, random memory clobbers are
a pretty rare type of bug. The chances that thread A crashed because
thread B overwrote its supposedly-private state are just not very
high. Also, such bugs are extremely hard to troubleshoot even when
there is only process and one thread involved, so it's not like things
are a rose garden today. I don't buy the argument that it's worth
giving up arbitrary amounts of performance and functionality for this.

The random memory clobbers are partially rare because they often aren't

triggered. Many of them are of the dangling pointer type and in a single

threaded process, there is less of a chance to allocate and overwrite the

free'd and then used memory.

But you are right, all of them are tricky to hunt and I remember using

hardware watch points and what not in the past. They make your day

though when you finally find them. So you will be sorry when the last

one is hunted down.

>> ... enough other people have
>> written complex, long-running multithreaded programs that I think it
>> is probably possible to do so without unduly compromising reliability.
>
> I would bet that every single successful project of that sort has been
> written with threading in mind from the get-go. Trying to retro-fit
> threading onto thirty years' worth of single-threaded coding is a recipe
> for breaking your project; even if you had control of all the code running
> in the address space, which we assuredly do not.

I admit that it is risky, but I think there are things that could be
done to limit the risk. I don't believe we can indefinitely continue
to ignore the potential performance benefits of making a switch like
this. Breaking a thirty-year old code base irretrievably would be
sad, but letting it fade into irrelevance because we're not willing to
make the architecture changes that are needed to remain relevant would
be sad, too.

I have to agree with Robert on that one. We have been "thinking" about

multi-threading some 16 years ago already. We were aware of the dangers

and yet at least considered doing it some day for things like a parallel

executor. And that would probably be our best bang for the buck still.

The risks of jamming all sessions into a single, multi-threaded process are

huge. Think of snapshot visibility together with catalog cache invalidations.

I'd say no to that one as a first step.

But multi-threading the executor or even certain utility commands at first

should not be rejected purely on the notion that "we don't have multithreading

today."

Regards, Jan

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jan Wieck
Senior Postgres Architect

http://pgblog.wi3ck.info

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

18 июля 2016 г., 12:19:57

Hi


>  Such a host delegate process could be explicitly built with
> multithread support and not 'infect' the rest of the code with its requirements.
>  
>  Using granular RPC is nice for isolation but I am concerned that the latencies might be high.
I agree with you.
Moreover I think that some decision have not sense with this "thread model" in any way.
For example Embedded oracle XML DB applications:
http://docs.oracle.com/cd/B28359_01/appdev.111/b28369/xdb23jv1.htm#i1043708
"""
You can run a Java servlet.
Servlets work better as the top-level entry point into Oracle Database, and require using HTTP(S) as the protocol to
accessOracle Database.
 
"""

> What I know about Oracle, PL/SQL, Java - all is executed as
> outprocess calls. I am sure, so PL doesn't share process with SQL engine there

It's better to say that java is executed like outprocess calls.
It's done for wide libraries support.
But it is not separate  process.
"""
The JDBC server-side internal driver,
the Oracle JVM, the database, and the SQL engine all run within the same address space, and therefore, the issue of
networkround-trips is irrelevant
 
"""
http://docs.oracle.com/cd/E11882_01/java.112/e16548/overvw.htm#JJDBC28026

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

18 июля 2016 г., 12:27:10

Hi


> The issue here is an architectural mismatch between PostgreSQL and
> the JVM, made worse by the user's very stored-proc-heavy code. Some
> other runtime that's designed to co-operate with a multiprocessing
> environment could well be fine, but the JVM isn't. At least, the Sun/Oracle/OpenJDK JVM isn't.

Actually  the  lack of threads make any vm quite limit in some aspects of scalability.
The  desire  to  use  jvm  is  the  result  that there is no desire to
reinvent the wheel.

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

18 июля 2016 г., 12:40:41

Hi


> I admit that it is risky, but I think there are things that could be
> done to limit the risk.  I don't believe we can indefinitely continue
> to ignore the potential performance benefits of making a switch like
> this.  Breaking a thirty-year old code base irretrievably would be
> sad, but letting it fade into irrelevance because we're not willing to
> make the architecture changes that are needed to remain relevant would
> be sad, too.

I can add, that nowadays it seems
that the paralleling processing is the only way to scale.
They  can't  wait  that  CPU  Clock  Speeds Increased in in the coming
years.

I understand that use of thread has some difficulties.
I can not understand why use of thread can have disadvantages.
Actually  I  think  that  parallelling  using threads is much easy than
parallelling  using processes.

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

18 июля 2016 г., 13:06:25

Hi

> There's https://github.com/jnr/jnr-ffi that enables to call C
> functions without resorting to writing JNI wrappers.
I have not said that you are wrong.
It's the dark side of "like seprate process"
They can cheaply call sql from jvm.
And they can't cheaply call jvm from sql.

Jvm in oracle  appeared a long time ago.
May by when java thread model had many faults.(befor jvm 1.6 for example)
Nowadays it seems to have sense only for compatibility.

Re: One process per session lack of sharing

От

Dave Cramer

Дата:

18 июля 2016 г., 15:10:20

On 18 July 2016 at 06:04, <AMatveev@bitec.ru> wrote:

Hi

> There's https://github.com/jnr/jnr-ffi that enables to call C
> functions without resorting to writing JNI wrappers.
I have not said that you are wrong.
It's the dark side of "like seprate process"
They can cheaply call sql from jvm.
And they can't cheaply call jvm from sql.

This https://github.com/davecramer/plj-new is a very old project that did work at one time which attempted to do RPC calls to the jvm to address exactly this problem.

However "cheaply" calling jvm from sql or vice-versa is not really possible.

I do like the idea of the background worker and shared memory though.

Dave Cramer

davec@postgresintl.com

www.postgresintl.com

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

18 июля 2016 г., 16:43:56

Hi

> This https://github.com/davecramer/plj-new is a very old project
> that did work at one time which attempted to do RPC calls to the jvm to address exactly this problem.

> However "cheaply" calling jvm from sql or vice-versa is not really possible.
> I do like the idea of the background worker and shared memory though.

It's not opposite concepts. It's like two level cache.
Something    is   best   with   shared memory.
When "a sharing of upper layer" is best with shared process.
And there is something that should not sharing at all.
Any deviation is always overhead.

But to be honest I  really do not like "sharing".
It is against human nature. 
And  I  will  be  really happy when there are processors with infinite
performance and memory with infinite size.
:)))

Re: One process per session lack of sharing

От

Greg Stark

Дата:

18 июля 2016 г., 17:04:33

On Mon, Jul 18, 2016 at 2:41 PM,  <AMatveev@bitec.ru> wrote:
> And  I  will  be  really happy when there are processors with infinite
> performance and memory with infinite size.
>:)))

Well for what it's worth there's no theoretical difference between
multi-process and multi-threaded. They're just two different APIs to
create shared memory and other kernel data structures and they both
allow all the sharing you want.  In the API Postgres uses everything
starts out non-shared and we explicitly set up the parts we want
shared. In the other nearly everything starts shared though it's
possible to unshare parts. Once they're set up the CPU and MMU don't
really care what kernel API was used to set them up.

In other words, there's no theoretical reason you couldn't have adapt
a JVM to create a large shared memory segment using mmap or SysV
shared memory and live entirely within that including the Java stacks
and object allocations and so on. The net result of what's running on
the CPU can actually end up being exactly equivalent (though I suspect
a natural implementation would end up being slightly different) and
there are pros and cons to each API.

-- 
greg

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

18 июля 2016 г., 18:23:21

Hi


> In other words, there's no theoretical reason you couldn't have adapt
> a JVM to create a large shared memory segment using mmap or SysV
I  think  even  if  I  was  the  leader in OS development, I could not
correctly answer your question.
So just let discuss.
Ok, I agree with you that there is no " theoretical reason "
But  in  practice  I  think  the  main  reason  that OS(VM) developers
implement this things differently.

>there's no theoretical reason you couldn't
Why does Os developers make threads?
If there is no reason the use of thread just waste?
Why do most(any) common web server or balancer use thread?
May be they are bad theoretical?
and so on

But to be more constructive.
I  just  don't know  how to make between process things that we can easily do
between threads, in most os.

I don't know how to share mutable execution code.
so i just cant imagine how to implement
http://docs.oracle.com/javase/7/docs/technotes/guides/vm/multiple-language-support.html#invokedynamic
in optimal way.

I  don't understand why mutex has overhead compare to critical section
for windows.
http://stackoverflow.com/questions/800383/what-is-the-difference-between-mutex-and-critical-section

And so on.

So  I  think  as long as  process and thread have different function in OS,
use process like thread will have overhead in practice.

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

18 июля 2016 г., 19:28:35

Hi

>So  I  think  as long as  process and thread have different function in OS,
>use process like thread will have overhead in practice.

But  There  are  other  negative  things.  I think parallel oriented
library usually do not work with process.
So Jvm integration is not exception. It is regularity.

Re: One process per session lack of sharing

От

Robert Haas

Дата:

18 июля 2016 г., 22:44:42

On Mon, Jul 18, 2016 at 10:03 AM, Greg Stark <stark@mit.edu> wrote:
> On Mon, Jul 18, 2016 at 2:41 PM,  <AMatveev@bitec.ru> wrote:
>> And  I  will  be  really happy when there are processors with infinite
>> performance and memory with infinite size.
>>:)))
>
> Well for what it's worth there's no theoretical difference between
> multi-process and multi-threaded. They're just two different APIs to
> create shared memory and other kernel data structures and they both
> allow all the sharing you want.  In the API Postgres uses everything
> starts out non-shared and we explicitly set up the parts we want
> shared. In the other nearly everything starts shared though it's
> possible to unshare parts. Once they're set up the CPU and MMU don't
> really care what kernel API was used to set them up.

That's totally true, but allocating additional shared memory after
system startup is a pain.  It's true that we now have DSM, but the
fact that DSM segments can be mapped in different addresses in
different processes is a nuisance.  We will still get a lot of benefit
out of having DSM, but it's not right to imply that threading wouldn't
make things easier.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: One process per session lack of sharing

От

Robert Haas

Дата:

18 июля 2016 г., 22:48:11

On Sun, Jul 17, 2016 at 10:00 PM, Jan Wieck <jan@wi3ck.info> wrote:
>> I admit that it is risky, but I think there are things that could be
>> done to limit the risk.  I don't believe we can indefinitely continue
>> to ignore the potential performance benefits of making a switch like
>> this.  Breaking a thirty-year old code base irretrievably would be
>> sad, but letting it fade into irrelevance because we're not willing to
>> make the architecture changes that are needed to remain relevant would
>> be sad, too
>
> I have to agree with Robert on that one. We have been "thinking" about
> multi-threading some 16 years ago already. We were aware of the dangers
> and yet at least considered doing it some day for things like a parallel
> executor. And that would probably be our best bang for the buck still.
>
> The risks of jamming all sessions into a single, multi-threaded process are
> huge. Think of snapshot visibility together with catalog cache
> invalidations.
> I'd say no to that one as a first step.
>
> But multi-threading the executor or even certain utility commands at first
> should not be rejected purely on the notion that "we don't have
> multithreading
> today."

I think the risk profile is exactly the opposite of what you are
suggesting here.  If we provide an option to compile the server with
all global variables converted to thread-local variables, there's
really not a whole lot that can break, AFAICS.  We'll technically be
multi-threaded but the code need not know or care about the other
threads; only in the event of a memory clobber can they affect each
other.

On the other hand, if we start trying to create multiple threads per
processes just for certain purposes - be it the executor or certain
utility commands - any C code that is reachable within the secondary
threads needs to have specific provisions for thread-safety.  That
would create all kinds of problems, no doubt.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: One process per session lack of sharing

От

Andres Freund

Дата:

18 июля 2016 г., 22:53:16

On 2016-07-18 15:47:58 -0400, Robert Haas wrote:
> I think the risk profile is exactly the opposite of what you are
> suggesting here.  If we provide an option to compile the server with
> all global variables converted to thread-local variables, there's
> really not a whole lot that can break, AFAICS.

Using TLS will slow down things noticeably though. So if we were to go
there, we'd have to make up for some constant slowdown.

> We'll technically be multi-threaded but the code need not know or care
> about the other threads; only in the event of a memory clobber can
> they affect each other.

But that'll make it pretty hard to take advantage of multi-threading to
a meaningful degree. Except for being able to create shared memory after
the fact - quite useful! - there'd not be much point.

- Andres

Re: One process per session lack of sharing

От

Craig Ringer

Дата:

19 июля 2016 г., 03:33:28

On 19 July 2016 at 03:53, Andres Freund <andres@anarazel.de> wrote:

On 2016-07-18 15:47:58 -0400, Robert Haas wrote:
> I think the risk profile is exactly the opposite of what you are
> suggesting here. If we provide an option to compile the server with
> all global variables converted to thread-local variables, there's
> really not a whole lot that can break, AFAICS.

Using TLS will slow down things noticeably though. So if we were to go
there, we'd have to make up for some constant slowdown.

Does TLS really have more of a performance impact than process context switches?

Genuine question, I'm clueless in the area.

> We'll technically be multi-threaded but the code need not know or care
> about the other threads; only in the event of a memory clobber can
> they affect each other.

But that'll make it pretty hard to take advantage of multi-threading to
a meaningful degree. Except for being able to create shared memory after
the fact - quite useful! - there'd not be much point.

Yeah. TLS is too simplistic. To do any useful parallel work without serialization/deserialization, some threads need to be in groups with other threads, sharing some data structures but having private copies of others.

That said, TLS can be used as a reasonable default state, then individual data structures extracted and shared more formally when needed. So it's a sane starting point.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: One process per session lack of sharing

От

Andres Freund

Дата:

19 июля 2016 г., 03:39:24

On 2016-07-19 08:33:20 +0800, Craig Ringer wrote:
> On 19 July 2016 at 03:53, Andres Freund <andres@anarazel.de> wrote:
> 
> > On 2016-07-18 15:47:58 -0400, Robert Haas wrote:
> > > I think the risk profile is exactly the opposite of what you are
> > > suggesting here.  If we provide an option to compile the server with
> > > all global variables converted to thread-local variables, there's
> > > really not a whole lot that can break, AFAICS.
> >
> > Using TLS will slow down things noticeably though. So if we were to go
> > there, we'd have to make up for some constant slowdown.
> >
> >
> Does TLS really have more of a performance impact than process context
> switches?

Those aren't really alternatives, unless I'm missing something. You have
context switches between threads as well. They're a bit cheaper (due to
less TLB invalidations), but generally not a *whole* lot.  What TLS
requires is basically for every thread local variable to go through one
(IIRC sometimes two) additional layer of indirection.  For workloads
which are bottlenecked on single core performance (i.e. most of pg,
regardless of parallel query), that can be painful.

Andres

Re: One process per session lack of sharing

От

Craig Ringer

Дата:

19 июля 2016 г., 03:56:36

On 18 July 2016 at 02:27, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jul 15, 2016 at 4:28 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> I don't think anyone's considering moving from multi-processing to
> multi-threading in PostgreSQL. I really, really like the protection that the
> shared-nothing-by-default process model gives us, among other things.

We get some very important protection by having the postmaster in a
separate address space from the user processes, but separating the
other backends from each other has no value. If one of the backends
dies, we take provisions to make sure they all die, which is little or
no different from what would happen if we had the postmaster as one
process and all of the other backends as threads within a second
process. As far as I can see, running each and every backend in a
separate process has downsides but no upsides. It slows down the
system and makes it difficult to share data between processes without
much in the way of benefits.

That's a good point, the random memory overwrites that Tom mentioned aside.

I think that memory leaks we currently ignore as insignificant one-time losses that'll get cleaned up on backend exit will become relvant so some cleanup work would be needed there, but it's not technically difficult and valgrind is a wonderful thing.

One minor thing to be aware of is that cPython has a horrible threading design with a single massive lock for the whole interpreter. PL/Python will perform apallingly in a threaded context. It looks like it has more recently gained support for separate interpreters (each with their own GIL) within a single process though, if I'm reading the docs correctly:

https://pl.python.org/docs/api/threads.html

so maybe it'd just require some plpython tweaks to switch to the right interpreter for the current backend.

I don't think plpython issues are a huge cause for hand-wringing anyway, really. TBH, if it really is practical to move Pg to a threaded model and not as hard as I thought, I do see the advantages. Mainly because I'd _love_ efficient embedded Java and C# runtime, it's downright embarrassing to tell people they should write procs in Perl or Python (or TCL!) if they can't do it in plpgsql.

If we can stand the macro code pollution, it'd be interesting to do a minimal conversion as an experiment and let some buildfarm members start digesting it once it runs. Find issues slowly over time, make it an experimental build option. We could do things like transparently use threadids whereever we currently expose a PID in the UI, change the bgworker backend to spawn threads rather than procs, etc. (One nice consequence would be the possibility of getting rid of most of EXEC_BACKEND since the postmaster launching the backend proc would be a one-time thing, once it was stable enough to make the thread model the only option on Windows).

It'd be very helpful to find a nice portable library that abstracts platform threading specifics and has a less horrid API than pthreads, rather than having to DIY. (See e.g.: https://software.intel.com/en-us/blogs/2006/10/19/why-windows-threads-are-better-than-posix-threads ). Or use C++11 <thread> :p [dives for fireproof suit]

> Where I agreed with you, and where I think Robert sounded like he was

> agreeing, was that our current design where we have one executor per user
> sessions and can't suspend/resume sessions is problematic.

The problems are very closely related. The problem with suspending
and resuming sessions is that you need to keep all of the session's
global variable contents (except for any caches that are safe to
rebuild) until the session is resumed; and we have no way of
discovering all of the global variables a process is using and no
general mechanism that can be used to serialize and deserialize them.

Right. In our per-process model we'd have to provide a subsystem that serializes/deserializes session state and has callbacks for extensions to register their own save/restore callbacks. We'd probably want it even if the improbable happened and Pg moved to threads - which until this thread I would've considered the same as saying "pigs fly" or "the US gets universal health care". Since we'd want to be able to essentially page out idle sessons, though it'd be more of a nice-to-have than a necessity.

Individual Pg subsystems would probably register callbacks with the save/restore subsystem, rather than trying to have the save/restore subsystem have the knowledge to reach into all the other subsystems. Since we'd block save/restore when there's an active query of course, it might not actually be that bad. Especially if we started with save/restore only on idle state, not idle-in-transaction, i.e. start with transaction pooling.

Since I got started with Pg, I've taken it as given that PostgreSQL Will Never Use Threads, Don't Even Talk About It. As taboo as query hints or more so. Is this actually a serious option?

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: One process per session lack of sharing

От

AMatveev@bitec.ru

Дата:

19 июля 2016 г., 14:20:28

Hi


> Using TLS will slow down things noticeably though. So if we were to go
> there, we'd have to make up for some constant slowdown.
I can not understand why?

I've read
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686749(v=vs.85).aspx
and
http://david-grs.github.io/tls_performance_overhead_cost_linux/
"""
The results are quite straightforward: no overhead at all.
"""
0x0000000000404f40 <+0>:     inc    DWORD PTR [rip+0x202382]vs0x0000000000404f50 <+0>:     inc    DWORD PTR
fs:0xfffffffffffffffc

It's clear.

Re: One process per session lack of sharing

От

Robert Haas

Дата:

19 июля 2016 г., 20:42:53

On Mon, Jul 18, 2016 at 8:56 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Since I got started with Pg, I've taken it as given that PostgreSQL Will
> Never Use Threads, Don't Even Talk About It. As taboo as query hints or more
> so. Is this actually a serious option?

I'm sure that depends on who you ask.  But for myself, yes, I think
it's a serious option.  I think that the sort of minimal conversion
that I discussed above could be done and made stable enough to label
as "we have this experimental option..." by one competent developer in
the course of one release cycle without otherwise unduly disrupting
development.  From that base, we could consider patches to optimize
the thread model case, and maybe after gaining some experience and
letting it shake out for a release or two we'd decide that the thread
model is ready to be officially supported.  I bet there would be a lot
of interest in the thread model from the user and developer
communities.  It would probably be a significant win on Windows -
where I understand that the ratio of process creation cost : thread
creation cost is much worse than it is on Linux - and it would
probably open up numerous possible optimizations for parallel query.
It would probably also have some downsides and likely some horrible
bugs, but that's why you start it out as an experimental feature.

In short, I believe the conventional wisdom on this topic is
misguided.  Most of the previous discussions of using threading have
assumed that we'd go through all of the backend-private stuff and make
it thread-safe.  That's a bad plan, first because it's not very
well-defined, second because it could slow down parts of the system
that rely on the absence of synchronization primitives in certain code
paths, and third because it requires a single mammoth act of
development on a scale that would be extremely hard to make
successful.  However, the method that I'm proposing is a completely
different kettle of fish.  It is not, as Tom points out, entirely
without danger, but it allows the first patch to be a mostly
mechanical transformation and then allows incremental development on
top of that framework.  For that reason, I believe it's at least an
order of magnitude less impractical than the "go through and make
everything thread-safe" approach.  Unlike that approach, it also makes
it realistically possible to support both models.

None of that means that it will necessarily work out, but I'm bullish.
Even though parallel query has already had more bug reports than I
would have liked, it's good evidence that you can make architectural
changes that touch almost every part of the system without necessarily
breaking everything.  You just have to go slow, be methodical, and
budget time to fix the bugs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: One process per session lack of sharing

От

Andres Freund

Дата:

19 июля 2016 г., 21:20:11

On 2016-07-19 14:18:22 +0300, AMatveev@bitec.ru wrote:
> Hi
> 
> 
> > Using TLS will slow down things noticeably though. So if we were to go
> > there, we'd have to make up for some constant slowdown.
> I can not understand why?
> 
> I've read
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms686749(v=vs.85).aspx
> and
> http://david-grs.github.io/tls_performance_overhead_cost_linux/
> """
> The results are quite straightforward: no overhead at all.
> """
> 
>  0x0000000000404f40 <+0>:     inc    DWORD PTR [rip+0x202382]
>  vs
>  0x0000000000404f50 <+0>:     inc    DWORD PTR fs:0xfffffffffffffffc

Not really true IIRC. For one segment offset stuff is encoded more
widely, and for another, it'll generate more uops in many
microarchitectures.  Also, we actually *do* qualify for the exception in
the blog you linked above: We have a fair amount of dynamically linked
code.

Re: One process per session lack of sharing

От

Teodor Sigaev

Дата:

20 июля 2016 г., 19:53:25

> On 12 July 2016 at 09:57, <AMatveev@bitec.ru <mailto:AMatveev@bitec.ru>> wrote:
>
>     We have faced with some lack of sharing resources.
>     So in our test memory usage per session:
>     Oracle: about 5M
>     MSSqlServer: about 4M
>     postgreSql: about 160М
>
>
> Using shared resources also has significant problems, so care must be taken.

To see memory allocation by opinion of pgsql it's possible to use
https://github.com/postgrespro/memstat

This module (9.6+) could collect information about MemoryContexts allocation.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: One process per session lack of sharing