Обсуждение: Anyone understand shared-memory space usage?

Поиск
Список
Период
Сортировка

Anyone understand shared-memory space usage?

От
Tom Lane
Дата:
It used to be that Postgres' shared memory was sized on the basis of
the hard-wired MaxBackendId constant.  I have altered things so that
it is sized on the basis of the actual -N switch given to the postmaster
at postmaster start time.  This makes it a lot easier to stress the
algorithm ;-), and what I find is that it ain't too robust.

In particular, using current cvs sources try to start the postmaster
with "-N 1" (only one backend allowed).  The backend can be started
all right, but as soon as you try to do much of anything, it falls over:

$ startpg.debug -N 1
$ psql regression
Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL
  type \? for help on slash commands  type \q to quit  type \g or terminate with semicolon to execute queryYou are
currentlyconnected to the database: regression
 

regression=> \d
NOTICE:  ShmemAlloc: out of memory
pqReadData() -- backend closed the channel unexpectedly.


I conclude from this that the model of shared memory usage embodied
in LockShmemSize() (in src/backend/storage/lmgr/lock.c) isn't very
accurate: at small N it's not allocating enough memory.

Does anyone understand the data structures that are allocated in
shared memory well enough to fix LockShmemSize() properly?
Or should I just kluge it, say by making LockShmemSize() work from
something like MAX(maxBackends,10) ?
        regards, tom lane


Re: [HACKERS] Anyone understand shared-memory space usage?

От
Bruce Momjian
Дата:
Does the bottom of the backend flowchart help?


> It used to be that Postgres' shared memory was sized on the basis of
> the hard-wired MaxBackendId constant.  I have altered things so that
> it is sized on the basis of the actual -N switch given to the postmaster
> at postmaster start time.  This makes it a lot easier to stress the
> algorithm ;-), and what I find is that it ain't too robust.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Anyone understand shared-memory space usage?

От
Bruce Momjian
Дата:
I would look in:
CreateSharedMemoryAndSemaphores(IPCKey key, int maxBackends){...    size = BufferShmemSize() +
LockShmemSize(maxBackends);

LockShmemSize looks like a terrible mess, but my assumption is that the
problem is in there.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Anyone understand shared-memory space usage?

От
Tom Lane
Дата:
I wrote:
> Does anyone understand the data structures that are allocated in
> shared memory well enough to fix LockShmemSize() properly?

No one volunteered, so I dug into the code and think I have it fixed
now.  Leastwise you can run the regression tests even at -N 1 (but
you have to put a "sleep" into regress.sh --- it seems that when you
quit psql, it takes a second or two before the postmaster will accept
another connection.  Should backend shutdown take that long??)

It turned out that there were really, really serious problems both in
shared-memory space estimation and in dynahash.c itself.  I'm simply
amazed we have not seen more bug reports traceable to running out
of shared memory and/or hashtable errors.  Some lowlights:

* One out of every thirty records allocated in a hashtable was simply
being wasted, because the allocator failed to include it in the table's
freelist.

* The routine for expanding a hashtable's top-level directory could
never have worked; I conclude that it's never been executed.  (At
default settings it would not be called until the table has exceeded
64K entries, so I can believe we've never seen it run...)

* I think the routine for deleting a hashtable is also broken, because
it individually frees records that it did not allocate individually.
I don't understand why this isn't making the memory management stuff
coredump.  Maybe we never free a hashtable?

* Setup of fixed-directory hashtables (ShmemInitHash) was sadly broken;
it's really incredible that it worked at all, because it was (a)
misestimating the size of the space it needed to allocate and then
(b) miscalculating where the directory should be within that space.
As near as I can tell, we have been running with hashtable directories
sitting in space not actually allocated to them.  Compared to this,
the fact that the routine also forgot to tell dynahash.c what size
directory it had made hardly matters.

* Several places were estimating the sizes of hashtables using code
that was not quite right (and assumed far more than it should've
about the inner structure of hashtables anyway).  Also, having
(mis)calculated the sizes of the major tables in shared memory,
we were requesting a total shared memory block exactly equal to
their sum, with no allowance for smaller data structures (like the
shmem index table) nor any safety factor for estimation error.


I would like someone to check my work; if the code was really as
broken as I think it was, we should have been seeing more problems
than we were.  See my changes committed last night
insrc/include/utils/hsearch.hsrc/backend/utils/hash/dynahash.csrc/backend/storage/ipc/shmem.csrc/backend/storage/ipc/ipci.csrc/backend/storage/buffer/buf_init.csrc/backend/storage/lmgr/lock.csrc/backend/storage/smgr/mm.c
        regards, tom lane

PS: I am now wondering whether Daryl Dunbar's problems might not be
due to the shared-memory hash table for locks getting larger than other
people have seen it get.  Because of the errors in ShmemInitHash, I
would not be at all surprised to see the system fall over once that
table exceeds 256 entries (or some small multiple thereof).


Re: Anyone understand shared-memory space usage?

От
Tom Lane
Дата:
I wrote:
> I would like someone to check my work; if the code was really as
> broken as I think it was, we should have been seeing more problems
> than we were.

I spent an hour tracing through startup of 6.4.x, and I now understand
why the thing doesn't crash despite the horrible bugs in ShmemInitHash.
Read on, if you have a strong stomach.

First off, ShmemInitHash allocates too small a chunk of space for
the hash header + directory (because it computes the size of the
directory as log2(max_size) *bytes* not longwords).  Then, it computes
the wrong address for the directory --- the expressioninfoP->dir = (long *) (location + sizeof(HHDR));
looks good until you remember that location is a pointer to long not
a pointer to char.  Upshot: the address computed for "dir" is typically
168 bytes past the end of the space actually allocated for it.

Why is this not fatal?  Well, the very next ShmemAlloc call is always
to create the first "segment" of the hashtable; this is always for 1024
bytes, so the dir pointer is no longer pointing to nowhere.  It is in
fact pointing at the 42'nd entry of its own first segment.  (HHGTTG fans
can find deep significance in this.)  In other words entry 42 of the
hash segment points back at the segment itself.

When you work through the logic in dynahash.c, you discover that the
upshot of this is that (a) the segment appears to be the first item on
its own 42'nd hash-bucket chain, and (b) the 0'th and 42'nd hash-bucket
chains are therefore the same list, or more accurately the 0'th chain is
the cdr of the 42'nd chain since it doesn't appear to contain the
segment itself.

As long as no searched-for hash key with a hash value of 0 or 42
happens to match whatever the first few words of the segment are,
things pretty much work.  The only way you'd really notice is that
hash_seq() will report some of the hashtable records twice, and will
also report one completely bogus "record" that is the hash segment.
Our uses of hash_seq() are apparently robust enough not to be bothered.

Things don't go to hell in a handbasket until and unless the hashtable
is expanded past 256 entries.  At that point another segment is allocated
and its pointer is stored in slot 43 of the old segment, causing all the
table entries that were in hashbucket 43 to instantly disappear from
view --- they can't be found by searching the table anymore.  Also,
hashchain 43 now appears to be the same as hashchain 256 (the first 
of the new segment), but that's not going to bother anyone any worse
than the first duplicated chain did.

I think it's entirely likely that this set of bugs can account for flaky
behavior seen in installations with more than 256 shared-memory buffers
(postmaster -B > 256), more than 256 simultaneously held locks (have no
idea how to translate that into user terms), or more than 256 concurrent
backends.  I'm still wondering whether that might describe Daryl
Dunbar's problem with locks not getting released, for example.
        regards, tom lane


Re: [HACKERS] Re: Anyone understand shared-memory space usage?

От
Bruce Momjian
Дата:
> I think it's entirely likely that this set of bugs can account for flaky
> behavior seen in installations with more than 256 shared-memory buffers
> (postmaster -B > 256), more than 256 simultaneously held locks (have no
> idea how to translate that into user terms), or more than 256 concurrent
> backends.  I'm still wondering whether that might describe Daryl
> Dunbar's problem with locks not getting released, for example.

People have reported sloness/bugs with hash index lookups.  Does this
relate to that?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Re: Anyone understand shared-memory space usage?

От
Tom Lane
Дата:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> I think it's entirely likely that this set of bugs can account for flaky
>> behavior seen in installations with more than 256 shared-memory buffers
>> (postmaster -B > 256), more than 256 simultaneously held locks (have no
>> idea how to translate that into user terms), or more than 256 concurrent
>> backends.  I'm still wondering whether that might describe Daryl
>> Dunbar's problem with locks not getting released, for example.

> People have reported sloness/bugs with hash index lookups.  Does this
> relate to that?

It looks like the routines in src/backend/access/hash/ don't use the
code in src/backend/utils/hash/ at all, so my guess is that whatever
bugs might lurk in hash indexes are unrelated.
        regards, tom lane