Обсуждение: Fwd: 8.0 Beta3 worked, RC1 didn't!

Поиск

Список

Период

Сортировка

Fwd: 8.0 Beta3 worked, RC1 didn't!

От

Tom Lane

Дата:

24 декабря 2004 г., 21:00:56

Forwarding the attached in case anyone missed it on -general.

The shmem attach address shown in his messages (00DC0000) seems mighty
low.  What I am suspecting is:
   1. Postmaster boots, creates shmem, and for some idiotic reason
      2003 Server creates the shmem segment just above the end of
      regular memory.
   2. When subprocesses launch and re-read GUC settings, for one
      reason or another they use up a little more RAM than the
      postmaster did.
   3. Subprocesses fail to attach to shmem because the target
      address is now in their regular RAM range.

I don't know why 2003 Server has such a brain-dead choice of shmem
address assignment, nor why listen_addresses might prompt a little extra
growth of RAM usage.  But the theory seems to fit the available facts.

If this is correct then we have to do something to force a smarter
choice of shmem address on Windows.  One brute-force way to do it
might be to malloc a couple hundred K just before the postmaster
attaches to shmem, and then release?

Theory B is that somehow UsedShmemSegAddr is not being passed down
accurately in this case, but that seems a mite improbable.

            regards, tom lane

------- Forwarded Message

Date:    23 Dec 2004 08:33:12 -0800
From:    nico@def2shoot.com (Nicolas COUSSEMACQ)
To:      pgsql-general@postgresql.org
Subject: [GENERAL] 8.0 Beta3 worked, RC1 didn't!

I have the same problem !

When I setup Postgres 8.0 Beta 4 on a Windows Xp or 2003 Server, it works
parfectly with parameter listen_adresses set to '*' or localhost.
I have been testing Beta5, RC1 and RC2 on my XP workstation and there is no
problem, event if I accept external connections ( listen_adresses  = '*').
Then I tried to setup Beta5, RC1 or RC2 on a station with 2003 Server, I can
only acces the Database when listen_adresses  = localhost. If i set
listen_adresses  = '*', i have a connection problem in PgAdmin saying "Could
not recieve server response to SSL negociation packet : Connection reset by
peer (0X00002746/10054). It appends when I launch pgadmin directly logged on
the station, when i'm connected with remote access and even from my XP
workstation.
The log file contains many lines such these ones :
2004-12-23 16:55:17 FATAL:  could not attach to proper memory at fixed
address: shmget(key=5432001, addr=00DC0000) failed: Invalid argument
2004-12-23 16:55:17 FATAL:  could not attach to proper memory at fixed
address: shmget(key=5432001, addr=00DC0000) failed: Invalid argument
2004-12-23 16:55:17 LOG:  background writer process (PID 680) exited with
exit code 0
2004-12-23 16:55:17 LOG:  terminating any other active server processes
2004-12-23 16:55:17 LOG:  all server processes terminated; reinitializing

If I switch the listen_addresses parameter back to localhost', I can connect
to the DB in PgAdmin from the server screen or remote acces.


Those these information help you ?


""A. Mous"" <a.mous@shaw.ca> a �crit dans le message de
news:000801c4e7d1$058c5300$6500a8c0@PETER...
> Hi all,
>
> I'm using psql 8.0.0 on a client's site who's running win server 2003.
> We've had him on beta 3 for some time, and no problems at all (yes, in a
> sense, he is a beta tester as well, but doesn't know it!).  Today I tried
to
> upgrade the db to RC1 and had some problems.
>
> Remote clients connect to this database, so I have to set listen_addresses
=
> '*' in the posrgresql.conf file.  This is the only change to the config
> file.  Doing this with RC1 and trying to connect locally with through psql
> resulted in the following error message:
>
> "could not receive server response to SSL negotiation packet; connection
> reset by peer (0x00002746/10054)"
>
> Removing the modified line in the config file resolved the problem
> (locally), however, no clients can connect!  Beta 3 does not seem to have
> this issue, so we had to revert back to it for now.
>
> I would appreciate any ideas that some of you may have.  Much thanks,
>
> -Peter
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

------- End of Forwarded Message

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

Gary Doades

Дата:

24 декабря 2004 г., 21:50:17

AFAIK Win32 does not care where in private process address space the
"shared memory" segment is. It can be mapped to different addresses in
different processes and still share the same physical address space.
This is why Win32 puts the private shared address anywhere in its own
address space, because it doesn't matter.

All that is needed is to create a *named* memory mapped segment of a
particular size and get other process to map to the same name for the
same memory segment size and it automagically works.

If you try to force it to any particular private process address you may
fail as you don't always know where program code (DLLs etc.) may be loaded.

Cheers,
Gary.

Tom Lane wrote:
> Forwarding the attached in case anyone missed it on -general.
>
> The shmem attach address shown in his messages (00DC0000) seems mighty
> low.  What I am suspecting is:
>    1. Postmaster boots, creates shmem, and for some idiotic reason
>       2003 Server creates the shmem segment just above the end of
>       regular memory.
>    2. When subprocesses launch and re-read GUC settings, for one
>       reason or another they use up a little more RAM than the
>       postmaster did.
>    3. Subprocesses fail to attach to shmem because the target
>       address is now in their regular RAM range.
>
> I don't know why 2003 Server has such a brain-dead choice of shmem
> address assignment, nor why listen_addresses might prompt a little extra
> growth of RAM usage.  But the theory seems to fit the available facts.
>
> If this is correct then we have to do something to force a smarter
> choice of shmem address on Windows.  One brute-force way to do it
> might be to malloc a couple hundred K just before the postmaster
> attaches to shmem, and then release?
>
> Theory B is that somehow UsedShmemSegAddr is not being passed down
> accurately in this case, but that seems a mite improbable.
>
>             regards, tom lane
>
> ------- Forwarded Message
>
> Date:    23 Dec 2004 08:33:12 -0800
> From:    nico@def2shoot.com (Nicolas COUSSEMACQ)
> To:      pgsql-general@postgresql.org
> Subject: [GENERAL] 8.0 Beta3 worked, RC1 didn't!
>
> I have the same problem !
>
> When I setup Postgres 8.0 Beta 4 on a Windows Xp or 2003 Server, it works
> parfectly with parameter listen_adresses set to '*' or localhost.
> I have been testing Beta5, RC1 and RC2 on my XP workstation and there is no
> problem, event if I accept external connections ( listen_adresses  = '*').
> Then I tried to setup Beta5, RC1 or RC2 on a station with 2003 Server, I can
> only acces the Database when listen_adresses  = localhost. If i set
> listen_adresses  = '*', i have a connection problem in PgAdmin saying "Could
> not recieve server response to SSL negociation packet : Connection reset by
> peer (0X00002746/10054). It appends when I launch pgadmin directly logged on
> the station, when i'm connected with remote access and even from my XP
> workstation.
> The log file contains many lines such these ones :
> 2004-12-23 16:55:17 FATAL:  could not attach to proper memory at fixed
> address: shmget(key=5432001, addr=00DC0000) failed: Invalid argument
> 2004-12-23 16:55:17 FATAL:  could not attach to proper memory at fixed
> address: shmget(key=5432001, addr=00DC0000) failed: Invalid argument
> 2004-12-23 16:55:17 LOG:  background writer process (PID 680) exited with
> exit code 0
> 2004-12-23 16:55:17 LOG:  terminating any other active server processes
> 2004-12-23 16:55:17 LOG:  all server processes terminated; reinitializing
>
> If I switch the listen_addresses parameter back to localhost', I can connect
> to the DB in PgAdmin from the server screen or remote acces.
>
>
> Those these information help you ?
>
>
> ""A. Mous"" <a.mous@shaw.ca> a écrit dans le message de
> news:000801c4e7d1$058c5300$6500a8c0@PETER...
>
>>Hi all,
>>
>>I'm using psql 8.0.0 on a client's site who's running win server 2003.
>>We've had him on beta 3 for some time, and no problems at all (yes, in a
>>sense, he is a beta tester as well, but doesn't know it!).  Today I tried
>
> to
>
>>upgrade the db to RC1 and had some problems.
>>
>>Remote clients connect to this database, so I have to set listen_addresses
>
> =
>
>>'*' in the posrgresql.conf file.  This is the only change to the config
>>file.  Doing this with RC1 and trying to connect locally with through psql
>>resulted in the following error message:
>>
>>"could not receive server response to SSL negotiation packet; connection
>>reset by peer (0x00002746/10054)"
>>
>>Removing the modified line in the config file resolved the problem
>>(locally), however, no clients can connect!  Beta 3 does not seem to have
>>this issue, so we had to revert back to it for now.
>>
>>I would appreciate any ideas that some of you may have.  Much thanks,
>>
>>-Peter
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 9: the planner will ignore your desire to choose an index scan if your
>>      joining column's datatypes do not match
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
> ------- End of Forwarded Message
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>
>

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

Tom Lane

Дата:

24 декабря 2004 г., 22:03:31

Gary Doades <gpd@gpdnet.co.uk> writes:
> AFAIK Win32 does not care where in private process address space the
> "shared memory" segment is. It can be mapped to different addresses in
> different processes and still share the same physical address space.
> This is why Win32 puts the private shared address anywhere in its own
> address space, because it doesn't matter.

Win32 may not care, but we do.  The shared memory segment must be mapped
at the same address in every backend.

> If you try to force it to any particular private process address you may
> fail as you don't always know where program code (DLLs etc.) may be loaded.

This is (or ought to be) irrelevant, because we are only talking about
instances of a single executable.

            regards, tom lane

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

Gary Doades

Дата:

24 декабря 2004 г., 22:15:30

Tom Lane wrote:
> Gary Doades <gpd@gpdnet.co.uk> writes:
>
>>AFAIK Win32 does not care where in private process address space the
>>"shared memory" segment is. It can be mapped to different addresses in
>>different processes and still share the same physical address space.
>>This is why Win32 puts the private shared address anywhere in its own
>>address space, because it doesn't matter.
>
>
> Win32 may not care, but we do.  The shared memory segment must be mapped
> at the same address in every backend.

Forgive me for not knowing the internals of postgres, but why? As long
as all the shared memory is accessed from the same relative offsets from
the private starting address it will refer to the same physical shared
memory address and should work.

Is this to maintain compatibility with the other platforms way of doing
things, or the postgres internal architecture?

If this is the case then your suggestion may be the only one, to
artificially bump up the first free address and hope that it is enough.
Seems a bit hit and miss though (probably more hit than miss) since it's
not easily known what the extra allocation for the subsequent backends
may be.

>>If you try to force it to any particular private process address you may
>>fail as you don't always know where program code (DLLs etc.) may be loaded.
>
>
> This is (or ought to be) irrelevant, because we are only talking about
> instances of a single executable.
>
Agreed, as long as you can't have code dynamically linked from one
backend, but not another.

Cheers,
Gary.

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

Tom Lane

Дата:

24 декабря 2004 г., 22:30:17

Gary Doades <gpd@gpdnet.co.uk> writes:
> Tom Lane wrote:
>> Win32 may not care, but we do.  The shared memory segment must be mapped
>> at the same address in every backend.

> Forgive me for not knowing the internals of postgres, but why? As long
> as all the shared memory is accessed from the same relative offsets from
> the private starting address it will refer to the same physical shared
> memory address and should work.

Because we use absolute addresses in many cases.  There was once a
convention of making everything relative to ShmemBase, but we've
abandoned that for reasons of code simplicity (and to a lesser extent
performance).  There are still some places using relative offsets but
they are gradually going away.  We are not reversing that decision
just because some flavors of Windows have stupid algorithms for
assigning default shmem addresses.

> If this is the case then your suggestion may be the only one, to
> artificially bump up the first free address and hope that it is enough.
> Seems a bit hit and miss though (probably more hit than miss) since it's
> not easily known what the extra allocation for the subsequent backends
> may be.

The needed extra allocation should really be *zero*.  Keep in mind that
the intention of the EXEC_BACKEND code is to emulate the Unix case where
backends are spawned by fork().  Therefore the state of the backend at
the point where it needs to attach to shmem should really be hardly at
all different from the state of the postmaster.  I'm moderately
interested to find out why changing listen_addresses seems to affect
this, but on the strength of the available evidence I'd suspect it's a
matter of just a few bytes that happens to exceed an allocation
boundary.

It might be that we could solve the problem by rethinking the order of
operations --- maybe we should reattach to shared memory during
restore_backend_variables, before the exec'd backend has had a chance to
do much of anything.

            regards, tom lane

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

Bruce Momjian

Дата:

24 декабря 2004 г., 23:37:49

Tom Lane wrote:
> Forwarding the attached in case anyone missed it on -general.
>
> The shmem attach address shown in his messages (00DC0000) seems mighty
> low.  What I am suspecting is:
>    1. Postmaster boots, creates shmem, and for some idiotic reason
>       2003 Server creates the shmem segment just above the end of
>       regular memory.
>    2. When subprocesses launch and re-read GUC settings, for one
>       reason or another they use up a little more RAM than the
>       postmaster did.
>    3. Subprocesses fail to attach to shmem because the target
>       address is now in their regular RAM range.
>
> I don't know why 2003 Server has such a brain-dead choice of shmem
> address assignment, nor why listen_addresses might prompt a little extra
> growth of RAM usage.  But the theory seems to fit the available facts.
>
> If this is correct then we have to do something to force a smarter
> choice of shmem address on Windows.  One brute-force way to do it
> might be to malloc a couple hundred K just before the postmaster
> attaches to shmem, and then release?
>
> Theory B is that somehow UsedShmemSegAddr is not being passed down
> accurately in this case, but that seems a mite improbable.

I am confused.  I thought we used a hard-coded location for shared
memory on Win32.

I thought it was 00xDB0000 something but I can't find any mention of
that.  Was it removed?  Are we now starting the postgres.exe binary and
assuming we can map to the same shared memory address as postmaster.exe?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

Tom Lane

Дата:

25 декабря 2004 г., 00:28:03

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I thought it was 00xDB0000 something but I can't find any mention of
> that.  Was it removed?  Are we now starting the postgres.exe binary and
> assuming we can map to the same shared memory address as postmaster.exe?

Looks that way to me; and I think it considerably safer than using any
hard-wired address.  My current feeling is that the problem stems from
waiting too long to reattach to shared memory, and that we ought to do
that as soon as we can read the shmem address info from the temp file.

Just had a thought ... is it possible that this problem was introduced
by the recent changes to pass backend variables in shared memory instead
of in a temp file?  ISTM fairly possible that mapping that memory is
going to interfere with where we need to map the main shared memory
block.  I see that it gets unmapped after being read, but maybe the
damage is already done.

            regards, tom lane

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

"Magnus Hagander"

Дата:

28 декабря 2004 г., 00:53:19

I have tried to, and am unable to reproduce this on any of my 2003 machines. I have tried with both RC1 and RC2.

For those who reported the problem:
1) To reproduce, I installed from the MSI installer and just changed the listen_address parameter. Did you change
anything*else* in your configuration? In postgresql.conf or anywhere else in pg? 

2) Does this happen in a freshly initdb:ed database, or only when there is data? Does this happen directly after server
(service)startup, or does it require the database to be running for a while with connections/disconnections before it
happens?

3) Do you have any non-OS software installed on the machine(s) that are showing this problem?

4) What's the value of shared_buffers in postgresql.conf?


Tom,
why is DC000000 so low? That's still 10Mb into the process, right? Granted, it's not high, but it's not *that* low. (A
simpletest program with all parameters at default get it's first address allocated at 003D2438 for me. A freshly
MapViewOfFile()dmemory ends up at 003f0000. If I go for a larger test block (such as 50Mb), the mapped memory is moved
upto 004d0000. I get very simlar results on XP and 2003. 


There are unfortunatly several places in the shmem code that will return EINVAL. So there is currently no way to detect
exactlywhere the problem is. What do you think of adding a couple of elog()s at each place to help identifying them? 


//Magnus


>-----Original Message-----
>From: pgsql-hackers-win32-owner@postgresql.org
>[mailto:pgsql-hackers-win32-owner@postgresql.org] On Behalf Of Tom Lane
>Sent: den 24 december 2004 16:01
>To: pgsql-hackers-win32@postgresql.org
>Subject: [pgsql-hackers-win32] Fwd: 8.0 Beta3 worked, RC1 didn't!
>
>
>Forwarding the attached in case anyone missed it on -general.
>
>The shmem attach address shown in his messages (00DC0000) seems mighty
>low.  What I am suspecting is:
>   1. Postmaster boots, creates shmem, and for some idiotic reason
>      2003 Server creates the shmem segment just above the end of
>      regular memory.
>   2. When subprocesses launch and re-read GUC settings, for one
>      reason or another they use up a little more RAM than the
>      postmaster did.
>   3. Subprocesses fail to attach to shmem because the target
>      address is now in their regular RAM range.
>
>I don't know why 2003 Server has such a brain-dead choice of shmem
>address assignment, nor why listen_addresses might prompt a
>little extra
>growth of RAM usage.  But the theory seems to fit the available facts.
>
>If this is correct then we have to do something to force a smarter
>choice of shmem address on Windows.  One brute-force way to do it
>might be to malloc a couple hundred K just before the postmaster
>attaches to shmem, and then release?
>
>Theory B is that somehow UsedShmemSegAddr is not being passed down
>accurately in this case, but that seems a mite improbable.
>
>            regards, tom lane
>
>------- Forwarded Message
>
>Date:    23 Dec 2004 08:33:12 -0800
>From:    nico@def2shoot.com (Nicolas COUSSEMACQ)
>To:      pgsql-general@postgresql.org
>Subject: [GENERAL] 8.0 Beta3 worked, RC1 didn't!
>
>I have the same problem !
>
>When I setup Postgres 8.0 Beta 4 on a Windows Xp or 2003
>Server, it works
>parfectly with parameter listen_adresses set to '*' or localhost.
>I have been testing Beta5, RC1 and RC2 on my XP workstation
>and there is no
>problem, event if I accept external connections (
>listen_adresses  = '*').
>Then I tried to setup Beta5, RC1 or RC2 on a station with 2003
>Server, I can
>only acces the Database when listen_adresses  = localhost. If i set
>listen_adresses  = '*', i have a connection problem in PgAdmin
>saying "Could
>not recieve server response to SSL negociation packet :
>Connection reset by
>peer (0X00002746/10054). It appends when I launch pgadmin
>directly logged on
>the station, when i'm connected with remote access and even from my XP
>workstation.
>The log file contains many lines such these ones :
>2004-12-23 16:55:17 FATAL:  could not attach to proper memory at fixed
>address: shmget(key=5432001, addr=00DC0000) failed: Invalid argument
>2004-12-23 16:55:17 FATAL:  could not attach to proper memory at fixed
>address: shmget(key=5432001, addr=00DC0000) failed: Invalid argument
>2004-12-23 16:55:17 LOG:  background writer process (PID 680)
>exited with
>exit code 0
>2004-12-23 16:55:17 LOG:  terminating any other active server processes
>2004-12-23 16:55:17 LOG:  all server processes terminated;
>reinitializing
>
>If I switch the listen_addresses parameter back to localhost',
>I can connect
>to the DB in PgAdmin from the server screen or remote acces.
>
>
>Those these information help you ?
>
>
>""A. Mous"" <a.mous@shaw.ca> a écrit dans le message de
>news:000801c4e7d1$058c5300$6500a8c0@PETER...
>> Hi all,
>>
>> I'm using psql 8.0.0 on a client's site who's running win
>server 2003.
>> We've had him on beta 3 for some time, and no problems at
>all (yes, in a
>> sense, he is a beta tester as well, but doesn't know it!).
>Today I tried
>to
>> upgrade the db to RC1 and had some problems.
>>
>> Remote clients connect to this database, so I have to set
>listen_addresses
>=
>> '*' in the posrgresql.conf file.  This is the only change to
>the config
>> file.  Doing this with RC1 and trying to connect locally
>with through psql
>> resulted in the following error message:
>>
>> "could not receive server response to SSL negotiation
>packet; connection
>> reset by peer (0x00002746/10054)"
>>
>> Removing the modified line in the config file resolved the problem
>> (locally), however, no clients can connect!  Beta 3 does not
>seem to have
>> this issue, so we had to revert back to it for now.
>>
>> I would appreciate any ideas that some of you may have.  Much thanks,
>>
>> -Peter
>>
>>
>> ---------------------------(end of
>broadcast)---------------------------
>> TIP 9: the planner will ignore your desire to choose an
>index scan if your
>>       joining column's datatypes do not match
>>
>
>---------------------------(end of
>broadcast)---------------------------
>TIP 7: don't forget to increase your free space map settings
>
>------- End of Forwarded Message
>
>
>---------------------------(end of
>broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index
>scan if your
>      joining column's datatypes do not match
>

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

"Nicolas COUSSEMACQ"

Дата:

28 декабря 2004 г., 20:13:27

1) I checked the option in the setup program that allow connection from all
client workstation, and added one line in pg_hba.conf ('host    all
all         10.0.0.0/8          password').
    When I setup postgres without checking this option, it runs perfectly
from localhost but when i active 'external connections', it fails...

2) I tried to setup with and without data from previous installed postgres.
I think that the problem is immediate because I get a message during the
installation explaining that the setup programm can not contact the database
server ( I think that it happens when installing PL/PGSQL ...).

3) I tried to setup Postgress beta5, RC1 and RC1 on two servers : one was
clean, it had just been running Beta4 for a few days, and the other was
hosting my old Mysql Database. I got the same problem in all case.

4) shared_buffers = 1000


----- Original Message -----
From: "Magnus Hagander" <mha@sollentuna.net>
To: "Tom Lane" <tgl@sss.pgh.pa.us>; <pgsql-hackers-win32@postgresql.org>
Cc: <nico@def2shoot.com>
Sent: Monday, December 27, 2004 7:53 PM
Subject: RE: [pgsql-hackers-win32] Fwd: 8.0 Beta3 worked, RC1 didn't!


I have tried to, and am unable to reproduce this on any of my 2003 machines.
I have tried with both RC1 and RC2.

For those who reported the problem:
1) To reproduce, I installed from the MSI installer and just changed the
listen_address parameter. Did you change anything *else* in your
configuration? In postgresql.conf or anywhere else in pg?

2) Does this happen in a freshly initdb:ed database, or only when there is
data? Does this happen directly after server (service) startup, or does it
require the database to be running for a while with
connections/disconnections before it happens?

3) Do you have any non-OS software installed on the machine(s) that are
showing this problem?

4) What's the value of shared_buffers in postgresql.conf?


Tom,
why is DC000000 so low? That's still 10Mb into the process, right? Granted,
it's not high, but it's not *that* low. (A simple test program with all
parameters at default get it's first address allocated at 003D2438 for me. A
freshly MapViewOfFile()d memory ends up at 003f0000. If I go for a larger
test block (such as 50Mb), the mapped memory is moved up to 004d0000. I get
very simlar results on XP and 2003.


There are unfortunatly several places in the shmem code that will return
EINVAL. So there is currently no way to detect exactly where the problem is.
What do you think of adding a couple of elog()s at each place to help
identifying them?


//Magnus


>-----Original Message-----
>From: pgsql-hackers-win32-owner@postgresql.org
>[mailto:pgsql-hackers-win32-owner@postgresql.org] On Behalf Of Tom Lane
>Sent: den 24 december 2004 16:01
>To: pgsql-hackers-win32@postgresql.org
>Subject: [pgsql-hackers-win32] Fwd: 8.0 Beta3 worked, RC1 didn't!
>
>
>Forwarding the attached in case anyone missed it on -general.
>
>The shmem attach address shown in his messages (00DC0000) seems mighty
>low.  What I am suspecting is:
>   1. Postmaster boots, creates shmem, and for some idiotic reason
>      2003 Server creates the shmem segment just above the end of
>      regular memory.
>   2. When subprocesses launch and re-read GUC settings, for one
>      reason or another they use up a little more RAM than the
>      postmaster did.
>   3. Subprocesses fail to attach to shmem because the target
>      address is now in their regular RAM range.
>
>I don't know why 2003 Server has such a brain-dead choice of shmem
>address assignment, nor why listen_addresses might prompt a
>little extra
>growth of RAM usage.  But the theory seems to fit the available facts.
>
>If this is correct then we have to do something to force a smarter
>choice of shmem address on Windows.  One brute-force way to do it
>might be to malloc a couple hundred K just before the postmaster
>attaches to shmem, and then release?
>
>Theory B is that somehow UsedShmemSegAddr is not being passed down
>accurately in this case, but that seems a mite improbable.
>
> regards, tom lane
>
>------- Forwarded Message
>
>Date:    23 Dec 2004 08:33:12 -0800
>From:    nico@def2shoot.com (Nicolas COUSSEMACQ)
>To:      pgsql-general@postgresql.org
>Subject: [GENERAL] 8.0 Beta3 worked, RC1 didn't!
>
>I have the same problem !
>
>When I setup Postgres 8.0 Beta 4 on a Windows Xp or 2003
>Server, it works
>parfectly with parameter listen_adresses set to '*' or localhost.
>I have been testing Beta5, RC1 and RC2 on my XP workstation
>and there is no
>problem, event if I accept external connections (
>listen_adresses  = '*').
>Then I tried to setup Beta5, RC1 or RC2 on a station with 2003
>Server, I can
>only acces the Database when listen_adresses  = localhost. If i set
>listen_adresses  = '*', i have a connection problem in PgAdmin
>saying "Could
>not recieve server response to SSL negociation packet :
>Connection reset by
>peer (0X00002746/10054). It appends when I launch pgadmin
>directly logged on
>the station, when i'm connected with remote access and even from my XP
>workstation.
>The log file contains many lines such these ones :
>2004-12-23 16:55:17 FATAL:  could not attach to proper memory at fixed
>address: shmget(key=5432001, addr=00DC0000) failed: Invalid argument
>2004-12-23 16:55:17 FATAL:  could not attach to proper memory at fixed
>address: shmget(key=5432001, addr=00DC0000) failed: Invalid argument
>2004-12-23 16:55:17 LOG:  background writer process (PID 680)
>exited with
>exit code 0
>2004-12-23 16:55:17 LOG:  terminating any other active server processes
>2004-12-23 16:55:17 LOG:  all server processes terminated;
>reinitializing
>
>If I switch the listen_addresses parameter back to localhost',
>I can connect
>to the DB in PgAdmin from the server screen or remote acces.
>
>
>Those these information help you ?
>
>
>""A. Mous"" <a.mous@shaw.ca> a écrit dans le message de
>news:000801c4e7d1$058c5300$6500a8c0@PETER...
>> Hi all,
>>
>> I'm using psql 8.0.0 on a client's site who's running win
>server 2003.
>> We've had him on beta 3 for some time, and no problems at
>all (yes, in a
>> sense, he is a beta tester as well, but doesn't know it!).
>Today I tried
>to
>> upgrade the db to RC1 and had some problems.
>>
>> Remote clients connect to this database, so I have to set
>listen_addresses
>=
>> '*' in the posrgresql.conf file.  This is the only change to
>the config
>> file.  Doing this with RC1 and trying to connect locally
>with through psql
>> resulted in the following error message:
>>
>> "could not receive server response to SSL negotiation
>packet; connection
>> reset by peer (0x00002746/10054)"
>>
>> Removing the modified line in the config file resolved the problem
>> (locally), however, no clients can connect!  Beta 3 does not
>seem to have
>> this issue, so we had to revert back to it for now.
>>
>> I would appreciate any ideas that some of you may have.  Much thanks,
>>
>> -Peter
>>
>>
>> ---------------------------(end of
>broadcast)---------------------------
>> TIP 9: the planner will ignore your desire to choose an
>index scan if your
>>       joining column's datatypes do not match
>>
>
>---------------------------(end of
>broadcast)---------------------------
>TIP 7: don't forget to increase your free space map settings
>
>------- End of Forwarded Message
>
>
>---------------------------(end of
>broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index
>scan if your
>      joining column's datatypes do not match
>

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

Tom Lane

Дата:

28 декабря 2004 г., 22:01:27

"Magnus Hagander" <mha@sollentuna.net> writes:
> Tom,
> why is DC000000 so low? That's still 10Mb into the process, right? Granted, it's not high, but it's not *that* low.
(Asimple test program with all parameters at default get it's first address allocated at 003D2438 for me. A freshly
MapViewOfFile()dmemory ends up at 003f0000. If I go for a larger test block (such as 50Mb), the mapped memory is moved
upto 004d0000. I get very simlar results on XP and 2003. 

The question is not whether it's "low", it's whether there's any
daylight between the end of memory in a postmaster/backend image and
where the shmem segment gets placed.

On Unix, shmat() is supposed to leave a lot of room between the data
break address and where it puts shmem, so that malloc still has room to
play in.  I suspect that Windows is willing to malloc() memory above the
shmem segment and so thinks that it doesn't need to leave any daylight
there, other than rounding off to a page boundary for hardware reasons.
If the backend process malloc's a bit more space than the postmaster did
before trying to attach, we got trouble.

It's not clear to me exactly *why* the backend would allocate any more
space than the postmaster did, but that's my working hypothesis at the
moment.

            regards, tom lane

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

Bruce Momjian

Дата:

28 декабря 2004 г., 22:15:50

Tom Lane wrote:
> "Magnus Hagander" <mha@sollentuna.net> writes:
> > Tom,
> > why is DC000000 so low? That's still 10Mb into the process, right? Granted, it's not high, but it's not *that* low.
(Asimple test program with all parameters at default get it's first address allocated at 003D2438 for me. A freshly
MapViewOfFile()dmemory ends up at 003f0000. If I go for a larger test block (such as 50Mb), the mapped memory is moved
upto 004d0000. I get very simlar results on XP and 2003. 
>
> The question is not whether it's "low", it's whether there's any
> daylight between the end of memory in a postmaster/backend image and
> where the shmem segment gets placed.
>
> On Unix, shmat() is supposed to leave a lot of room between the data
> break address and where it puts shmem, so that malloc still has room to
> play in.  I suspect that Windows is willing to malloc() memory above the
> shmem segment and so thinks that it doesn't need to leave any daylight
> there, other than rounding off to a page boundary for hardware reasons.
> If the backend process malloc's a bit more space than the postmaster did
> before trying to attach, we got trouble.
>
> It's not clear to me exactly *why* the backend would allocate any more
> space than the postmaster did, but that's my working hypothesis at the
> moment.

What if we malloc 100k just before we create the postmaster segment and
then free it and see if that fixes the postgres.exe problem?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

Tom Lane

Дата:

28 декабря 2004 г., 22:19:50

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> What if we malloc 100k just before we create the postmaster segment and
> then free it and see if that fixes the postgres.exe problem?

That was suggested already.  As a permanent fix it's certainly
unspeakably ugly, but it would be useful to try it just to prove
(or disprove) that we understand the problem.

It would probably be a good idea to make the padding at least 256K,
since the numbers that have been tossed around seem to indicate that
Windows may be aligning things on 128K boundaries.

My inclination for a permanent fix would be to try to do the shmat()
much earlier, but I don't think we should go to the effort of doing
that code rearrangement until we've proven that this is indeed the
issue.

            regards, tom lane

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

Bruce Momjian

Дата:

28 декабря 2004 г., 22:27:01

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > What if we malloc 100k just before we create the postmaster segment and
> > then free it and see if that fixes the postgres.exe problem?
>
> That was suggested already.  As a permanent fix it's certainly
> unspeakably ugly, but it would be useful to try it just to prove
> (or disprove) that we understand the problem.
>
> It would probably be a good idea to make the padding at least 256K,
> since the numbers that have been tossed around seem to indicate that
> Windows may be aligning things on 128K boundaries.
>
> My inclination for a permanent fix would be to try to do the shmat()
> much earlier, but I don't think we should go to the effort of doing
> that code rearrangement until we've proven that this is indeed the
> issue.

Right.  Merlin, I added you to this email.  Can you test that?  Do you
need us to send you a patch for testing?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

"Magnus Hagander"

Дата:

29 декабря 2004 г., 00:26:08

>Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> What if we malloc 100k just before we create the postmaster
>segment and
>> then free it and see if that fixes the postgres.exe problem?
>
>That was suggested already.  As a permanent fix it's certainly
>unspeakably ugly, but it would be useful to try it just to prove
>(or disprove) that we understand the problem.
>
>It would probably be a good idea to make the padding at least 256K,
>since the numbers that have been tossed around seem to indicate that
>Windows may be aligning things on 128K boundaries.
>
>My inclination for a permanent fix would be to try to do the shmat()
>much earlier, but I don't think we should go to the effort of doing
>that code rearrangement until we've proven that this is indeed the
>issue.


Still unable to reproduce this, even with the more detailed steps in
Nicolas mail. However, I've created a postgres.exe based on
cvs-as-of-yesterday plus the attached patch for testing.

The file is available on
http://www.hagander.net/pgsql/postgres_shmem.zip


Nicolas and Merlin - can you test with this .exe, please? You need to
replace *both* postmaster.exe *and* postgres.exe with the new one.


//Magnus

Вложения

shmem.patch

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

"Magnus Hagander"

Дата:

29 декабря 2004 г., 15:42:00

> >> What if we malloc 100k just before we create the postmaster
> >segment and
> >> then free it and see if that fixes the postgres.exe problem?
> >
> >That was suggested already.  As a permanent fix it's certainly
> >unspeakably ugly, but it would be useful to try it just to prove (or
> >disprove) that we understand the problem.
> >
> >It would probably be a good idea to make the padding at least 256K,
> >since the numbers that have been tossed around seem to indicate that
> >Windows may be aligning things on 128K boundaries.
> >
> >My inclination for a permanent fix would be to try to do the shmat()
> >much earlier, but I don't think we should go to the effort of doing
> >that code rearrangement until we've proven that this is indeed the
> >issue.
>
>
> Still unable to reproduce this, even with the more detailed
> steps in Nicolas mail. However, I've created a postgres.exe
> based on cvs-as-of-yesterday plus the attached patch for testing.
>
> The file is available on
> http://www.hagander.net/pgsql/postgres_shmem.zip
>
>
> Nicolas and Merlin - can you test with this .exe, please? You
> need to replace *both* postmaster.exe *and* postgres.exe with
> the new one.

I've now had confirmation from one person (Edgars) that this solves his
problem. I'd like confirmation from at least one more, but things point
towards this being the reason.

Tom - what's next? Do we want to roll RC3 with this ugly fix, or do we
want to look at a better fix right away?

One thought - what if we hard-code the address to somewhere at the 1Gb
limit? That would limit us to 1Gb of shared buffers (or 2Gb if started
witht he /3G switch to give user programs 3Gb in windows), but I don't
see *anybody* needing 1Gb shared buffers... Or is that a bad idea?

//Magnus

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

"Nicolas COUSSEMACQ"

Дата:

29 декабря 2004 г., 16:06:42

it works for me too.

----- Original Message -----
From: "Magnus Hagander" <mha@sollentuna.net>
To: "Tom Lane" <tgl@sss.pgh.pa.us>; "Bruce Momjian" <pgman@candle.pha.pa.us>
Cc: <pgsql-hackers-win32@postgresql.org>; <nico@def2shoot.com>; "Merlin
Moncure" <merlin.moncure@rcsonline.com>; "Edgars Diebelis"
<edgars.diebelis@divi.lv>
Sent: Wednesday, December 29, 2004 10:42 AM
Subject: RE: [pgsql-hackers-win32] Fwd: 8.0 Beta3 worked, RC1 didn't!


> >> What if we malloc 100k just before we create the postmaster
> >segment and
> >> then free it and see if that fixes the postgres.exe problem?
> >
> >That was suggested already.  As a permanent fix it's certainly
> >unspeakably ugly, but it would be useful to try it just to prove (or
> >disprove) that we understand the problem.
> >
> >It would probably be a good idea to make the padding at least 256K,
> >since the numbers that have been tossed around seem to indicate that
> >Windows may be aligning things on 128K boundaries.
> >
> >My inclination for a permanent fix would be to try to do the shmat()
> >much earlier, but I don't think we should go to the effort of doing
> >that code rearrangement until we've proven that this is indeed the
> >issue.
>
>
> Still unable to reproduce this, even with the more detailed
> steps in Nicolas mail. However, I've created a postgres.exe
> based on cvs-as-of-yesterday plus the attached patch for testing.
>
> The file is available on
> http://www.hagander.net/pgsql/postgres_shmem.zip
>
>
> Nicolas and Merlin - can you test with this .exe, please? You
> need to replace *both* postmaster.exe *and* postgres.exe with
> the new one.

I've now had confirmation from one person (Edgars) that this solves his
problem. I'd like confirmation from at least one more, but things point
towards this being the reason.

Tom - what's next? Do we want to roll RC3 with this ugly fix, or do we
want to look at a better fix right away?

One thought - what if we hard-code the address to somewhere at the 1Gb
limit? That would limit us to 1Gb of shared buffers (or 2Gb if started
witht he /3G switch to give user programs 3Gb in windows), but I don't
see *anybody* needing 1Gb shared buffers... Or is that a bad idea?

//Magnus

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

"Merlin Moncure"

Дата:

29 декабря 2004 г., 19:49:42

> I've now had confirmation from one person (Edgars) that this solves
his
> problem. I'd like confirmation from at least one more, but things
point
> towards this being the reason.
>
> Tom - what's next? Do we want to roll RC3 with this ugly fix, or do we
> want to look at a better fix right away?
>
> One thought - what if we hard-code the address to somewhere at the 1Gb
> limit? That would limit us to 1Gb of shared buffers (or 2Gb if started
> witht he /3G switch to give user programs 3Gb in windows), but I don't
> see *anybody* needing 1Gb shared buffers... Or is that a bad idea?
>
> //Magnus

I can confirm the patched version fixes my busted win2k box.  I was
unable to get Magnus's compiled binary to work, maybe because I'm using
gcc 3.4.1.

Merlin

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

Tom Lane

Дата:

29 декабря 2004 г., 23:00:32

"Magnus Hagander" <mha@sollentuna.net> writes:
> Tom - what's next? Do we want to roll RC3 with this ugly fix, or do we
> want to look at a better fix right away?

I think we want to look at a better fix right away; mainly because we
need to test it to be sure that it really fixes the problem ;-).
I will work on this today.

            regards, tom lane

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

Tom Lane

Дата:

30 декабря 2004 г., 03:38:09

> "Magnus Hagander" <mha@sollentuna.net> writes:
>> Tom - what's next? Do we want to roll RC3 with this ugly fix, or do we
>> want to look at a better fix right away?

> I think we want to look at a better fix right away; mainly because we
> need to test it to be sure that it really fixes the problem ;-).
> I will work on this today.

I have committed fixes that rearrange the code as I was envisioning.
Things still seem to work when building with -DEXEC_BACKEND on Unix,
but I'm not in a position to verify the Windows-specific code.  Please
give it a try ASAP.

            regards, tom lane

Re: Fwd: 8.0 Beta3 worked, RC1 didn't!

От

"Magnus Hagander"

Дата:

31 декабря 2004 г., 03:26:09

>>> Tom - what's next? Do we want to roll RC3 with this ugly
>fix, or do we
>>> want to look at a better fix right away?
>
>> I think we want to look at a better fix right away; mainly because we
>> need to test it to be sure that it really fixes the problem ;-).
>> I will work on this today.
>
>I have committed fixes that rearrange the code as I was envisioning.
>Things still seem to work when building with -DEXEC_BACKEND on Unix,
>but I'm not in a position to verify the Windows-specific code.  Please
>give it a try ASAP.

It passes all regression tests for me. But since I didn't see the
original problem, I can't confirm if it solves them.

I have put up a new binary from cvs today on
http://www.hagander.net/pgsql/postgres_shmem2.zip. For those of you who
have the problem, as before, copy this file and overwrite both
postgres.exe and postmaster.exe, then test and let us know if this one
also fixes things.

//Magnus

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Fwd: 8.0 Beta3 worked, RC1 didn't!

Вложения