Re: WAL recycling, Linux 2.4.18

Поиск
Список
Период
Сортировка
От Doug Fields
Тема Re: WAL recycling, Linux 2.4.18
Дата
Msg-id 5.1.0.14.2.20020708134721.02f23578@pop.pexicom.com
обсуждение исходный текст
Ответ на WAL recycling, ext3, Linux 2.4.18  (Doug Fields <dfields-pg-general@pexicom.com>)
Ответы Re: WAL recycling, Linux 2.4.18  (Tom Lane <tgl@sss.pgh.pa.us>)
Hardware for PG  (Steve Lane <slane@fmpro.com>)
Список pgsql-general
Hi Tom, all,

>Also, could you do the checkpoint manually and get a stack trace from
>that backend while others are hung up?

Yes, see below.

>directly try to acquire ControlFileLock.  In any case it's hard to
>credit that the recycling process could take 90 seconds to rename a
>dozen or so files.  If you have a gdb attached to a process doing a
>manual checkpoint, it would be fairly easy to see how long
>MoveOfflineLogs() runs.  (Set a breakpoint at its start, when control
>reaches the breakpoint issue "fin" and see how long it takes to come
>back.)

Two things:

1) Remounted my ext3 filesystems as ext2 to rule out an ext3fs related
problem; the problems persist, so it's probably not an ext3 thing

2) Doing as you suggest, attaching to a /usr/lib/postgresql/bin/postgres
process from which I run a manual checkpoint results in the following behavior:

I was able to set a breakpoint at CreateCheckPoint - gdb never found a
MoveOfflineLogs for me to set a breakpoint.

Doing a single step from the breakpoint after the CreateCheckPoint takes
quite a few moments (perhaps 10-30 seconds, but I didn't use a stopwatch).
During this time control-C has no effect, and when it does take effect, it
leaves me at the displayed location below.

Single stepping from mdsync() to smgrsync() also seems to take a few
seconds (perhaps 3).

My GDB output is below. The checkpoint seems to take 42 seconds to complete:

pexicast_lg=# select now(); checkpoint; select now();
               now
-------------------------------
  2002-07-08 13:57:57.918766-04
(1 row)

CHECKPOINT
               now
-------------------------------
  2002-07-08 13:58:39.790787-04
(1 row)

I was wondering if it could be the sync function call, but I sit there on
the server and type sync until I'm blue in the face and they seem to run so
fast I don't even notice any delay.

Also, I can't seem to recompile PostgreSQL because Debian can't find a
tclConfig.sh.

Thanks,

Doug


(gdb) c
Continuing.

Breakpoint 1, 0x08087ae5 in CreateCheckPoint ()
(gdb) where
#0  0x08087ae5 in CreateCheckPoint ()
#1  0x08111066 in ProcessUtility ()
#2  0x0810ecc5 in pg_exec_query_string ()
#3  0x0810fd5e in PostgresMain ()
#4  0x080f6d4e in ClosePostmasterPorts ()
#5  0x080f669f in ClosePostmasterPorts ()
#6  0x080f5882 in PostmasterMain ()
#7  0x080f5391 in PostmasterMain ()
#8  0x080d4e18 in main ()
#9  0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) n
Single stepping until exit from function CreateCheckPoint,
which has no line number information.

Program received signal SIGINT, Interrupt.
0x40282967 in sync () from /lib/libc.so.6
(gdb) where
#0  0x40282967 in sync () from /lib/libc.so.6
#1  0x0810d167 in mdsync ()
#2  0x0810de5f in smgrsync ()
#3  0x081036d8 in FlushBufferPool ()
#4  0x08087d13 in CreateCheckPoint ()
#5  0x08111066 in ProcessUtility ()
#6  0x0810ecc5 in pg_exec_query_string ()
#7  0x0810fd5e in PostgresMain ()
#8  0x080f6d4e in ClosePostmasterPorts ()
#9  0x080f669f in ClosePostmasterPorts ()
#10 0x080f5882 in PostmasterMain ()
#11 0x080f5391 in PostmasterMain ()
#12 0x080d4e18 in main ()
#13 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) break MoveOfflineLogs
Function "MoveOfflineLogs" not defined.
(gdb) break MoveOfflineLog
Function "MoveOfflineLog" not defined.
(gdb) break MoveOfflineLogs()
Function "MoveOfflineLogs()" not defined.
(gdb) s
Single stepping until exit from function sync,
which has no line number information.
0x0810d167 in mdsync ()
(gdb) s
Single stepping until exit from function mdsync,
which has no line number information.
0x0810de5f in smgrsync ()
(gdb) s
Single stepping until exit from function smgrsync,
which has no line number information.
0x081036d8 in FlushBufferPool ()
(gdb) where
#0  0x081036d8 in FlushBufferPool ()
#1  0x08087d13 in CreateCheckPoint ()
#2  0x08111066 in ProcessUtility ()
#3  0x0810ecc5 in pg_exec_query_string ()
#4  0x0810fd5e in PostgresMain ()
#5  0x080f6d4e in ClosePostmasterPorts ()
#6  0x080f669f in ClosePostmasterPorts ()
#7  0x080f5882 in PostmasterMain ()
#8  0x080f5391 in PostmasterMain ()
#9  0x080d4e18 in main ()
#10 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) s
Single stepping until exit from function FlushBufferPool,
which has no line number information.
0x08087d13 in CreateCheckPoint ()
(gdb) where
#0  0x08087d13 in CreateCheckPoint ()
#1  0x08111066 in ProcessUtility ()
#2  0x0810ecc5 in pg_exec_query_string ()
#3  0x0810fd5e in PostgresMain ()
#4  0x080f6d4e in ClosePostmasterPorts ()
#5  0x080f669f in ClosePostmasterPorts ()
#6  0x080f5882 in PostmasterMain ()
#7  0x080f5391 in PostmasterMain ()
#8  0x080d4e18 in main ()
#9  0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) s
Single stepping until exit from function CreateCheckPoint,
which has no line number information.
0x08085ac4 in XLogFlush ()
(gdb) where
#0  0x08085ac4 in XLogFlush ()
#1  0x08111066 in ProcessUtility ()
#2  0x0810ecc5 in pg_exec_query_string ()
#3  0x0810fd5e in PostgresMain ()
#4  0x080f6d4e in ClosePostmasterPorts ()
#5  0x080f669f in ClosePostmasterPorts ()
#6  0x080f5882 in PostmasterMain ()
#7  0x080f5391 in PostmasterMain ()
#8  0x080d4e18 in main ()
#9  0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) s
Single stepping until exit from function XLogFlush,
which has no line number information.

Program received signal SIGINT, Interrupt.
0x402887f2 in recv () from /lib/libc.so.6
(gdb) where
#0  0x402887f2 in recv () from /lib/libc.so.6
#1  0x080d42bc in StreamClose ()
#2  0x080d430d in pq_getbyte ()
#3  0x0810e7c8 in HandleFunctionRequest ()
#4  0x0810e837 in HandleFunctionRequest ()
#5  0x0810fc5e in PostgresMain ()
#6  0x080f6d4e in ClosePostmasterPorts ()
#7  0x080f669f in ClosePostmasterPorts ()
#8  0x080f5882 in PostmasterMain ()
#9  0x080f5391 in PostmasterMain ()
#10 0x080d4e18 in main ()
#11 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) c
Continuing.




В списке pgsql-general по дате отправления:

Предыдущее
От: Peter Schuller
Дата:
Сообщение: Frontend/backend protocol: authentication
Следующее
От: Doug Fields
Дата:
Сообщение: Re: WAL recycling, ext3, Linux 2.4.18