Re: Progress on fast path sorting, btree index creation time
От | Robert Haas |
---|---|
Тема | Re: Progress on fast path sorting, btree index creation time |
Дата | |
Msg-id | CA+TgmoZO1xSz+YiqZ2mRoKMcMqtb+JiR0Lz43CNe6de7--QDAA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Progress on fast path sorting, btree index creation time (Peter Geoghegan <peter@2ndquadrant.com>) |
Список | pgsql-hackers |
On Wed, Feb 8, 2012 at 8:33 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > It doesn't necessarily matter if we increase the size of the postgres > binary by 10%, precisely because most of that is not going to be in > play from one instant to the next. As Tom says, that doesn't jive with my experience. If you add on enough binary bloat, you will have more page faults. It's true (as I think you pointed out upthread) that packing all the copies of quicksort into the binary one after the other minimizes the effect on other things, but it doesn't eliminate it entirely. If you've got this: <random other stuff> ... <a zillion copies of quicksort> .... <more other stuff> ...then a branch from the "random other stuff" section of the binary to the "more other stuff" section of the binary may cost more. For example, suppose the OS does X amount of readahead. By stuffing all those copies of quicksort into the middle there, you increase the chances that the page you need was beyond the readahead window. Or, if it wasn't going to be in the readahead window either way, then you increase the distance that the disk head needs to move to find the required block. These costs are very small, no question about it. They are almost impossible to measure individually, in much the same way that the cost of pollution or greenhouse gas emissions is difficult to measure. But it's an error to assume that because the costs are individually small that they will never add up to anything material. As long as you keep insisting on that, it's hard to have a rational conversation. We can reasonably debate the magnitude of the costs, but to assert that they don't exist gets us nowhere. Suppose we could get a 10% speedup on sin(numeric) by adding 40GB to the binary size. Would you be in favor of that? Do you think it would hurt performance on any other workloads? Would our packagers complain at all? Surely your same argument would apply to that case in spades: anyone who is not using the gigantic hard-coded lookup table will not pay any portion of the cost of it. > It would be difficult for me to measure such things objectively, but > I'd speculate that the proprietary databases have much larger binaries > than ours, while having far fewer features, precisely because they > started applying tricks like this a long time ago. You could counter > that their code bases probably look terrible, and you'd have a point, > but so do I. That might be true; I have no idea. There are probably lots of reasons why their code bases look terrible, including a long history of backward compatibility with now-defunct versions, a variety of commercial pressures, and the fact that they don't have to take flak in the public square for what their code looks like. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления:
Следующее
От: Robert HaasДата:
Сообщение: Re: Progress on fast path sorting, btree index creation time