Postgres with pthread

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Postgres with pthread
Дата
Msg-id 9defcb14-a918-13fe-4b80-a0b02ff85527@postgrespro.ru
обсуждение исходный текст
Ответы Re: Postgres with pthread  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Postgres with pthread  (Andres Freund <andres@anarazel.de>)
Re: Postgres with pthread  (Simon Riggs <simon@2ndquadrant.com>)
Список pgsql-hackers
Hi hackers,

As far as I remember, several years ago when implementation of intra-query parallelism was just started there was discussion whether to use threads or leave traditional Postgres process architecture. The decision was made to leave processes. So now we have bgworkers, shared message queue, DSM, ...
The main argument for such decision was that switching to threads will require rewriting of most of Postgres code.
It seems to be quit reasonable argument and and until now I agreed with it.

But recently I wanted to check it myself.
The first problem with porting Postgres to pthreads is static variables widely used in Postgres code.
Most of modern compilers support thread local variables, for example GCC provides __thread keyword.
Such variables are placed in separate segment which is address through segment register (at Intel).
So access time to such variables is the same as to normal static variables.

Certainly may be not all compilers have builtin support of TLS and may be not at all hardware platforms them are implemented ias efficiently as at Intel.
So certainly such approach decreases portability of Postgres. But IMHO it is not so critical.

What I have done:
1. Add session_local (defined as __thread) to definition of most of static and global variables.
I leaved some variables pointed to shared memory as static. Also I have to changed initialization of some static variables,
because address of TLS variable can not be used in static initializers.
2. Change implementation of GUCs to make them thread specific.
3. Replace fork() with pthread_create
4. Rewrite file descriptor cache to be global (shared by all threads).

I have not changed all Postgres synchronization primitives and shared memory.
It took me about one week of work.

What is  not done yet:
1. Handling of signals (I expect that Win32 code can be somehow reused here).
2. Deallocation of memory and closing files on backend (thread) termination.
3. Interaction of postmaster and backends with PostgreSQL auxiliary processes (threads), such as autovacuum, bgwriter, checkpointer, stat collector,...

What are the advantages of using threads instead of processes?

1. No need to use shared memory. So there is no static limit for amount of memory which can be used by Postgres. No need in distributed shared memory and other stuff designed to share memory between backends and bgworkers.
2. Threads significantly simplify implementation of parallel algorithms: interaction and transferring data between threads can be done easily and more efficiently.
3. It is possible to use more efficient/lightweight synchronization primitives. Postgres now mostly relies on its own low level sync.primitives which user-level implementation
is using spinlocks and atomics and then fallback to OS semaphores/poll. I am not sure how much gain can we get by replacing this primitives with one optimized for threads.
My colleague from Firebird community told me that just replacing processes with threads can obtain 20% increase of performance, but it is just first step and replacing sync. primitive
can give much greater advantage. But may be for Postgres with its low level primitives it is not true.
4. Threads are more lightweight entities than processes. Context switch between threads takes less time than between process. And them consume less memory. It is usually possible to spawn more threads than processes.
5. More efficient access to virtual memory. As far as all threads are sharing the same memory space, TLB is used much efficiently in this case.
6. Faster backend startup. Certainly starting backend at each user's request is bad thing in any case. Some kind of connection pooling should be used in any case to provide acceptable performance. But in any case, start of new backend process in postgres causes a lot of page faults which have dramatical impact on performance. And there is no such problem with threads.

Certainly, processes are also having some advantages comparing with threads:
1. Better isolation and error protection
2. Easier error handling
3. Easier control of used resources

But it is a theory. The main idea of this prototype was to prove or disprove this expectation at practice.
I didn't expect large differences in performance because synchronization primitives are not changed and I performed my experiments at Linux where threads/processes are implemented in similar way.

Below are some results (1000xTPS) of select-only (-S) pgbench with scale 100 at my desktop with quad-core i7-4770 3.40GHz and 16Gb of RAM:

Connections    Vanilla/default       Vanilla/prepared   pthreads/default
     pthreads/prepared
10                    100                        191                       106                         207
100                  67                          131                       105                         168
1000                41                          65                         55                           102

As you can see, for small number of connection results are almost similar. But for large number of connection pthreads provide less degradation.

You can look at my prototype here:
https://github.com/postgrespro/postgresql.pthreads.git

But please notice that it is very raw prototype. A lot of stuff is not working yet. And supporting all of exited Postgres functionality requires
much more efforts (and even more efforts are needed for optimizing Postgres for this architecture).

I just want to receive some feedback and know if community is interested in any further work in this direction.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: ALTER TABLE ADD COLUMN fast default
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Postgres with pthread