Re: Urgent: 10K or more connections
От | Gianni Mariani |
---|---|
Тема | Re: Urgent: 10K or more connections |
Дата | |
Msg-id | 3F1974D9.2050108@mariani.ws обсуждение исходный текст |
Ответ на | Re: Urgent: 10K or more connections (Sean Chittenden <sean@chittenden.org>) |
Список | pgsql-general |
Sean Chittenden wrote: >>>PostgreSQL will never be single proc, multi-threaded, and I don't >>>think it should be for reliability's sake. See my above post, >>>however, as I think I may have a better way to handle "lots of >>>connections" without using threads. -sc >>> >>> >>never is a VERY long time ... Also, the single proc/multiple proc >>thing does not have to be exclusive. Meaning you could "tune" the >>system so that it could do either. >> >> > >True. This topic has come up a zillion times in the past though. The >memory segmentation and reliability that independent processes give >you is huge and the biggest reason why _if_ PostgreSQL does >spontaneously wedge itself (like MySQL does all too often), you're >only having to cope with a single DB connection being corrupt, >invalid, etc. Imagine a threaded model where the process was horked >and you loose 1000 connections worth of data in a SEGV. *shudder* >Unix is reliable at the cost of memory segmentation... something that >I dearly believe in. If that weren't worth anything, then I'd run >everything in kernel and avoid the context switching, which is pretty >expensive. > > Yep, but if you design it right, you can have both. A rare occasion where you can have the cake and eat it too. >>I have developed a single process server that handled thousands of >>connections. I've also developed a single process database (a while >>back) that handled multiple connections but I'm not sure I would do >>it the "hard" way again as the cost of writing the code for keeping >>context was not insignificant, although there are much better ways >>of doing it than how I did it 15 years ago. >> >> > >Not saying it's not possible, just that at this point, reliability is >more paramount than handling additional connections. With copy on >write VM's being abundant these days, a lot of the size that you see >with PostgreSQL is shared. Memory profiling and increasing the number >of read only pages would be an extremely interesting exercise that >could yield some slick results in terms of reducing the memory foot >print of PG's children. > > Context switching and cache thrashing are the killers in a multiple process model. There is a 6-10x performance penalty for running in separate processes vs running in a single process (and single thread) which I observed when doing benchmarking on a streaming server. Perhaps a better scheduler (like the O(1) scheduler in Linux 2.6.* would improve that but I just don't know. >>What you talk about is very fundamental and I would love to have >>another go at it .... however you're right that this won't happen >>any time soon. Connection pooling is a fundamentally flawed way of >>overcoming this problem. A different design could render a >>significantly higher feasable connection count. >> >> > >Surprisingly, it's not that complex at least handling a large number >of FDs and figuring out which ones have data on them and need to be >passed to a backend. I'm actually using the model for monitoring FD's >from thttpd and reapplying bits where appropriate. It's abstraction >of kqueue()/poll()/select() is nice enough to not want to reinvent the >wheel (same with its license). Hopefully ripping through the incoming >data and figuring out which backend pool to send a connection to won't >be that bad, but I have next to no experience with writing that kind >of code and my Stevens is hidden away in one of 23 boxes from a move >earlier this month. I only know that Apache 1.3 does this with >obviously huge success on basically every *nix so it can't be too >hard. > > No epoll ?
В списке pgsql-general по дате отправления: