Обсуждение: concurrent Postgres on NUMA - howto ?
Folks: I'm planning a port of Postgres to a multiprocessor architecture in which all nodes have both local memory and fast access to a shared memory. Shared memory it more expensive than local memory. My intent is to put the shmem & lock structures in shared memory, but use a copy-in / copy-out approach to maintain coherence in the buffer cache:- copy buffer from shared memroy on buffer allocate- write back buffer to shared memorywhen it is dirtied. Is that enough ? The idea sketch is as follows (mostly, changes contained to storage/buffer/bufmgr.c): -change BufferAlloc, etc, to create a node-local copy of the buffer (from shared memory). Copy both the BufferDesc entry and the buffer->data array -change WriteBuffer to copy the (locally changed) bufferto shared memory (this is the point in which the BM_DIRTYbit is set).[ I am assuming the buffer is locked & thisis a safe time to make the buffer visible to other backends]. [Assume, for this discussion, that the sem / locks structs in shared memory have been ported & work ]. Ditto for the hash access. My concern is whether that is enough to maintain consistency in the buffer cache (i.e, are there other places in the code where a backend might have a leftover pointer to somewhere in the buffer cache ? ) Because, in the scheme above, the buffer cache is not directly accessible to the backend except via this copy in / copy -out approach. [BTW, I think this might be a way of providing a 'cluster' version of Postgers, by using some global communication module to obtain/post the 'buffer cache' values] thanks regards Mauricio mbjsql@hotmail.com _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com
"Mauricio Breternitz" <mbjsql@hotmail.com> writes: > My concern is whether that is enough to maintain consistency > in the buffer cache No, it isn't --- for one thing, WriteBuffer wouldn't cause other backends to update their copies of the page. At the very least you'd need to synchronize where the LockBuffer calls are, not where WriteBuffer is called. I really question whether you want to do anything like this at all. Seems like accessing the shared buffers right where they are will be fastest; your approach will entail a huge amount of extra data copying. Considering that a backend doesn't normally touch every byte on a page that it accesses, I wouldn't be surprised if full-page copying would net out to being more shared-memory traffic, rather than less. regards, tom lane
Tom: Notice that WriteBuffer would just put the fresh copy of the page out in the shared space. Other backends would get the latest copy of the page when THEY execute BufferAlloc() afterwards. [Remember, backends would not have a local buffer cache, only (temporary) copies of one buffer per BufferAlloc()/release pair]. [Granted about the bandwidth needs. In my target arch, access to shmem is costlier and local mem, and cannot be done via pointers (so a lot of code that might have pointers inside the shmem buffer may need to be tracked down & changed)]. My idea is to use high-bandwidth access via the copy-in/copy-out approach (hopefully pay only once that round-trip cost once per pair BufferAlloc -> make buffer dirty]. [Mhy reasoning for this is that a backend needs to have exclusive access to a buffer when it writes to it. And I think it 'advertises' the new buffer contents to the world when it sets the BM_DIRTY flag.] About your suggestion of LockBuffer as synchronization points - a simple protocol might be: - copy 'in' the buffer on a READ. SHARE or lock acquire (may have to becareful on an upgrade of a READ to a write lock) - copy 'out' the buffer on a WRITE lock release I wouldappreciate comments and input on this approach, as I foresee putting a lot of effort into it soon, regards Mauricio >From: Tom Lane <tgl@sss.pgh.pa.us> >To: "Mauricio Breternitz" <mbjsql@hotmail.com> >CC: pgsql-hackers@postgresql.org >Subject: Re: [HACKERS] concurrent Postgres on NUMA - howto ? >Date: Mon, 23 Apr 2001 19:43:05 -0400 > >"Mauricio Breternitz" <mbjsql@hotmail.com> writes: > > My concern is whether that is enough to maintain consistency > > in the buffer cache > >No, it isn't --- for one thing, WriteBuffer wouldn't cause other >backends to update their copies of the page. At the very least you'd >need to synchronize where the LockBuffer calls are, not where >WriteBuffer is called. > >I really question whether you want to do anything like this at all. >Seems like accessing the shared buffers right where they are will be >fastest; your approach will entail a huge amount of extra data copying. >Considering that a backend doesn't normally touch every byte on a page >that it accesses, I wouldn't be surprised if full-page copying would >net out to being more shared-memory traffic, rather than less. > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/users-lounge/docs/faq.html _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com
"Mauricio Breternitz" <mbjsql@hotmail.com> writes: > Notice that WriteBuffer would just put the fresh copy of the page > out in the shared space. > Other backends would get the latest copy of the page when > THEY execute BufferAlloc() afterwards. You seem to be assuming that BufferAlloc is mutually exclusive across backends --- it's not. As I said, you'd have to look at transferring data at LockBuffer time to make this work. > [Granted about the bandwidth needs. In my target arch, > access to shmem is costlier and local mem, and cannot be done > via pointers What? How do you manage to memcpy out of shmem then? > (so a lot of code that might have pointers inside the > shmem buffer may need to be tracked down & changed)]. You're correct, Postgres assumes it can have pointers to data inside the page buffers. I don't think changing that is feasible. I find it hard to believe that you can't have pointers to shmem though; IMHO it's not shmem if it can't be pointed at. > [Mhy reasoning for this is that a backend needs to have exclusive > access to a buffer when it writes to it. And I think it 'advertises' > the new buffer contents to the world when it sets the BM_DIRTY flag.] No. BM_DIRTY only advises the buffer manager that the page must eventually be written back to disk; it does not have anything to do with when/whether other backends see data changes within the page. One more time: LockBuffer is what you need to be looking at. regards, tom lane