Обсуждение: AW: Using Threads?

Поиск
Список
Период
Сортировка

AW: Using Threads?

От
Zeugswetter Andreas SB
Дата:
> And using the following program for timing thread creation 
> and cleanup:
> 
> #include <pthread.h>
> 
> threadfn() { pthread_exit(0); }

I think you would mainly need to test how the system behaves, if 
the threads and processes actually do some work in parallel, like:

threadfn() {int i; for (i=0; i<10000000;) {i++}; pthread_exit(0); }

In a good thread implementation 10000 parallel processes tend to get way less 
cpu than 10000 parallel threads, making threads optimal for the very many clients case
(like > 3000).

Andreas


Re: Using Threads?

От
Bruce Guenter
Дата:
On Tue, Dec 05, 2000 at 10:07:37AM +0100, Zeugswetter Andreas SB wrote:
> > And using the following program for timing thread creation
> > and cleanup:
> >
> > #include <pthread.h>
> >
> > threadfn() { pthread_exit(0); }
>
> I think you would mainly need to test how the system behaves, if
> the threads and processes actually do some work in parallel, like:
>
> threadfn() {int i; for (i=0; i<10000000;) {i++}; pthread_exit(0); }

The purpose of the benchmark was to time how long it took to create and
destroy a process or thread, nothing more.  It was not creating
processes in parallel for precisely that reason.  The point in dispute
was that threads took much less time to create than processes.

> In a good thread implementation 10000 parallel processes tend to get way less
> cpu than 10000 parallel threads, making threads optimal for the very many clients case
> (like > 3000).

Why do you believe this?  In the "classical" thread implementation, each
process would get the same amount of CPU, no matter how many threads was
running in it.  That would mean that many parallel processes would get
more CPU in total than many threads in one process.
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: Using Threads?

От
markw@mohawksoft.com
Дата:
I have been watching this thread vs non-threaded discussion and am completely with the
process-only crew for a couple reasons, but lets look at a few things:

The process vs threads benchmark which showed 160us vs 120us, only did the process
creation, not the delayed hit of the "copy on write" pages in the new process. Just forking
is not as simple as forking, once the forked process starts to work, memory that is not
explicitly shared is copied to the new process once it is modified. So this is a hit,
possibly a big hit. Threads are far more efficient, it really is hard to debate.

I can see a number of reasons why a multithreaded version of a database would be good.
Asynchronous I/O perhaps, or even parallel joins, but with that being said, I think
stability and work are by far the governing factors. Introducing multiple threads into a
non-multithreaded code base invariably breaks everything.

So, we want to weight the possible performance gains of multithreads vs all the work and
effort to make them work reliably. The question is fundamentally, where are we spending our
time? If we are spending our time in context switches, then multithreading may be a way of
reducing this, however, in all the applications I have built with postgres, it is always
(like most databases) I/O bound or bound by computation.

I think the benefits of rewriting code to be multithreaded are seldom worth the work and
the risks, unless there is a clear advantage to do so. I think most would agree that any
increase in performance gained by going multithreaded would be minimal, and the amount of
work to do so would be  great.



Re: Using Threads?

От
Tom Lane
Дата:
markw@mohawksoft.com writes:
> The process vs threads benchmark which showed 160us vs 120us, only did
> the process creation, not the delayed hit of the "copy on write" pages
> in the new process. Just forking is not as simple as forking, once the
> forked process starts to work, memory that is not explicitly shared is
> copied to the new process once it is modified. So this is a hit,
> possibly a big hit.

There aren't going to be all that many data pages needing the COW
treatment, because the postmaster uses very little data space of its
own.  I think this would become an issue if we tried to have the
postmaster pre-cache catalog information for backends, however (see
my post elsewhere in this thread).
        regards, tom lane


Re: Using Threads?

От
Bruce Guenter
Дата:
On Tue, Dec 05, 2000 at 02:52:48PM -0500, Tom Lane wrote:
> There aren't going to be all that many data pages needing the COW
> treatment, because the postmaster uses very little data space of its
> own.  I think this would become an issue if we tried to have the
> postmaster pre-cache catalog information for backends, however (see
> my post elsewhere in this thread).

Would that pre-cached data not be placed in a SHM segment?  Such
segments don't do COW, so this would be a non-issue.
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/