Обсуждение: AW: Using Threads?
> And using the following program for timing thread creation > and cleanup: > > #include <pthread.h> > > threadfn() { pthread_exit(0); } I think you would mainly need to test how the system behaves, if the threads and processes actually do some work in parallel, like: threadfn() {int i; for (i=0; i<10000000;) {i++}; pthread_exit(0); } In a good thread implementation 10000 parallel processes tend to get way less cpu than 10000 parallel threads, making threads optimal for the very many clients case (like > 3000). Andreas
On Tue, Dec 05, 2000 at 10:07:37AM +0100, Zeugswetter Andreas SB wrote: > > And using the following program for timing thread creation > > and cleanup: > > > > #include <pthread.h> > > > > threadfn() { pthread_exit(0); } > > I think you would mainly need to test how the system behaves, if > the threads and processes actually do some work in parallel, like: > > threadfn() {int i; for (i=0; i<10000000;) {i++}; pthread_exit(0); } The purpose of the benchmark was to time how long it took to create and destroy a process or thread, nothing more. It was not creating processes in parallel for precisely that reason. The point in dispute was that threads took much less time to create than processes. > In a good thread implementation 10000 parallel processes tend to get way less > cpu than 10000 parallel threads, making threads optimal for the very many clients case > (like > 3000). Why do you believe this? In the "classical" thread implementation, each process would get the same amount of CPU, no matter how many threads was running in it. That would mean that many parallel processes would get more CPU in total than many threads in one process. -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
I have been watching this thread vs non-threaded discussion and am completely with the process-only crew for a couple reasons, but lets look at a few things: The process vs threads benchmark which showed 160us vs 120us, only did the process creation, not the delayed hit of the "copy on write" pages in the new process. Just forking is not as simple as forking, once the forked process starts to work, memory that is not explicitly shared is copied to the new process once it is modified. So this is a hit, possibly a big hit. Threads are far more efficient, it really is hard to debate. I can see a number of reasons why a multithreaded version of a database would be good. Asynchronous I/O perhaps, or even parallel joins, but with that being said, I think stability and work are by far the governing factors. Introducing multiple threads into a non-multithreaded code base invariably breaks everything. So, we want to weight the possible performance gains of multithreads vs all the work and effort to make them work reliably. The question is fundamentally, where are we spending our time? If we are spending our time in context switches, then multithreading may be a way of reducing this, however, in all the applications I have built with postgres, it is always (like most databases) I/O bound or bound by computation. I think the benefits of rewriting code to be multithreaded are seldom worth the work and the risks, unless there is a clear advantage to do so. I think most would agree that any increase in performance gained by going multithreaded would be minimal, and the amount of work to do so would be great.
markw@mohawksoft.com writes: > The process vs threads benchmark which showed 160us vs 120us, only did > the process creation, not the delayed hit of the "copy on write" pages > in the new process. Just forking is not as simple as forking, once the > forked process starts to work, memory that is not explicitly shared is > copied to the new process once it is modified. So this is a hit, > possibly a big hit. There aren't going to be all that many data pages needing the COW treatment, because the postmaster uses very little data space of its own. I think this would become an issue if we tried to have the postmaster pre-cache catalog information for backends, however (see my post elsewhere in this thread). regards, tom lane
On Tue, Dec 05, 2000 at 02:52:48PM -0500, Tom Lane wrote: > There aren't going to be all that many data pages needing the COW > treatment, because the postmaster uses very little data space of its > own. I think this would become an issue if we tried to have the > postmaster pre-cache catalog information for backends, however (see > my post elsewhere in this thread). Would that pre-cached data not be placed in a SHM segment? Such segments don't do COW, so this would be a non-issue. -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/