Re: Linux max on shared buffers?
От | Curt Sampson |
---|---|
Тема | Re: Linux max on shared buffers? |
Дата | |
Msg-id | Pine.NEB.4.44.0207201818160.553-100000@angelic.cynic.net обсуждение исходный текст |
Ответ на | Re: Linux max on shared buffers? (Jan Wieck <JanWieck@Yahoo.com>) |
Список | pgsql-general |
On Fri, 19 Jul 2002, Jan Wieck wrote: > I still don't completely understand what you are proposing. What I > understood so far is that you want to avoid double buffering (OS buffer > plus SHMEM). Wouldn't that require that the access to a block in the > file (table, index, sequence, ...) has to go directly through a mmapped > region of that file? Right. > Let's create a little test case to discuss. I have two tables, 2 > Gigabyte in size each (making 4 segments of 1 GB total) plus a 512 MB > index for each. Now I join them in a query, that results in a nestloop > doing index scans. > > On a 32 bit system you cannot mmap both tables plus the indexes at the > same time completely. But the access of the execution plan is reading > one tables index, fetching the heap tuples from it by random access, and > inside of that loop doing the same for the second table. So chances are, > that this plan randomly peeks around in the entire 5 Gigabyte, at least > you cannot predict which blocks it will need. Well, you can certainly predict the index blocks. So after some initial reads to get to the bottom level of the index, you might map a few megabytes of it contiguously because you know you'll need it. While you're at it, you can inform the OS you're using it sequentially (so it can do read-ahead--even though it otherwise looks to the OS like the process is doing random reads) by doing an madvise() with MADV_SEQUENTIAL. > So far so good. Now what do you map when? Can you map multiple > noncontigous 8K blocks out of each file? Sure. You can just map one 8K block at a time, and when you've got lots of mappings, start dropping the ones you've not used for a while, LRU-style. How many mappings you want to keep "cached" for your process would depend on the overhead of doing the map versus the overhead of having a lot of system calls. Personally, I think that the overhead of having tons of mappings is pretty low, but I'll have to read through some kernel code to make sure. At any rate, it's no problem to change the figure depending on any factor you like. > If so, how do you coordinate that all backends in summary use at > maximum the number of blocks you want PostgreSQL to use.... You don't. Just map as much as you like; the operating system takes care of what blocks will remain in memory or be written out to disk (or dropped if they're clean), bringing in a block from disk when you reference one that's not currently in physical memory, and so on. > And if a backend needs a block and the max is reached already, how > does it tell the other backends to unmap something? You don't. The mappings are completely separate for every process. > I assume I am missing something very important here.... Yeah, you're missing that the OS does all of the work for you. :-) Of course, this only works on systems with a POSIX mmap, which those particular HP systems Tom mentioned obviously don't have. For those systems, though, I expect running as a 64-bit program fixes the problem (because you've got a couple billion times as much address space). But if postgres runs on 32-bit systems with the same restrictions, we'd probably just have to keep the option of using read/write instead, and take the performance hit that we do now. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
В списке pgsql-general по дате отправления: