Streaming Replication on win32
От | Magnus Hagander |
---|---|
Тема | Streaming Replication on win32 |
Дата | |
Msg-id | 9837222c1001170558r338847b4h460a98115ab98d5b@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Streaming Replication on win32
Re: Streaming Replication on win32 |
Список | pgsql-hackers |
I'm trying to figure out why streaming replication doesn't work on win32. Here is what I have so far: It starts up fine, and outputs: LOG: starting archive recovery LOG: standby_mode = 'on' LOG: primary_conninfo = 'host=localhost port=5432' LOG: starting streaming recovery at 0/2000000 After this, *nothing* happens, and it never reaches a consistent state or anything. Looking at stacktraces, I notice two things: walreceiver process is in: ntdll!ZwWaitForSingleObject+0xa mswsock+0x4f65 WS2_32!select+0x105 LIBPQ!pqSocketPoll(int sock = 4936, int forRead = 1, int forWrite = 0, int64 end_time = -1)+0x2bb LIBPQ!pqSocketCheck(struct pg_conn * conn = 0x00000000`00830160, int forRead = 1, int forWrite = 0, int64 end_time = -1)+0xa1 LIBPQ!pqWaitTimed(int forRead = 1, int forWrite = 0, struct pg_conn * conn = 0x00000000`00830160, int64 finish_time = -1)+0x2e LIBPQ!pqWait(int forRead = 1, int forWrite = 0, struct pg_conn * conn = 0x00000000`00830160)+0x2a LIBPQ!PQgetResult(struct pg_conn * conn = 0x00000000`00830160)+0x82 LIBPQ!PQexecFinish(struct pg_conn * conn = 0x00000000`00830160)+0x1c LIBPQ!PQexec(struct pg_conn * conn = 0x00000000`00830160, char * query = 0x00000000`0042f600 "START_REPLICATION 0/2000000")+0x44 walreceiver!WalRcvConnect(void)+0x457 walreceiver!WalReceiverMain(struct FunctionCallInfoData * fcinfo = 0x00000000`00000000)+0x20e postgres!AuxiliaryProcessMain(int argc = 2, char ** argv = 0x00000000`0081f080)+0x600 postgres!SubPostmasterMain(int argc = 4, char ** argv = 0x00000000`0081f070)+0x2d7 postgres!main(int argc = 4, char ** argv = 0x00000000`0081f070)+0x1e4 postgres!__tmainCRTStartup(void)+0x192 postgres!mainCRTStartup(void)+0xe kernel32!BaseProcessStart+0x2c Which shows one potentially big problem - since we're calling select() from inside libpq, it's not calling our "signal emulation layer compatible select()". This means that at this point, walreceiver is not interruptible. Which also shows itself if I shut down the system - the walreceiver stays around, and won't terminate properly. Do we need to invent a way for libpq to call back into backend code to do this select? We certainly can't have libipq use our version directly - since that would break all non-postmaster/postgres processes. The second thing I note is that the walsender is in: ntdll!ZwWaitForMultipleObjects+0xa kernel32!ReleaseSemaphore+0x6b postgres!pgwin32_waitforsinglesocket(unsigned int64 s = 0x13fc, int what = 41, int timeout = -1)+0x275 postgres!pgwin32_recv(unsigned int64 s = 0x13fc, char * buf = 0x00000000`0042f990 "???", int len = 1, int f = 0)+0xf5 postgres!secure_read(struct Port * port = 0x00000000`0042fcf0, void * ptr = 0x00000000`0042f990, unsigned int64 len = 1)+0x32 postgres!pq_getbyte_if_available(unsigned char * c = 0x00000000`0042f990 "???")+0x106 postgres!CheckClosedConnection(void)+0x10 postgres!WalSndLoop(void)+0xdf postgres!WalSenderMain(void)+0xb9 postgres!PostgresMain(int argc = 2, char ** argv = 0x00000000`0084d520, char * username = 0x00000000`0082e218 "Administrator")+0x3b5 postgres!BackendRun(struct Port * port = 0x00000000`0042fcf0)+0x235 postgres!SubPostmasterMain(int argc = 3, char ** argv = 0x00000000`0081f080)+0x278 postgres!main(int argc = 3, char ** argv = 0x00000000`0081f080)+0x1e4 postgres!__tmainCRTStartup(void)+0x192 postgres!mainCRTStartup(void)+0xe kernel32!BaseProcessStart+0x2c From what I can tell, this indicates that pq_getbyte_if_available() is not working - because it's supposed to never block, right? This could be because the win32 socket emulation layer simply wasn't designed to deal with non-blocking sockets. Specifically, it actually *always* sets the socket to non-blocking mode, and then uses that to properly emulate how sockets work under unix. Oh, and the walsender process says: \Sessions\1\BaseNamedObjects\pgident(2196): postgres: wal sender process Administrator 127.0.0.1(1398) startup the walreceiver says: \Sessions\1\BaseNamedObjects\pgident(2264): postgres: wal receiver process and the startup process says: \Sessions\1\BaseNamedObjects\pgident(2764): postgres: startup processwaiting for 000000010000000000000002 -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Magnus HaganderДата:
Сообщение: Re: Archive recovery crashes on win32 in HEAD - hot standby related?