Re: WAL recycling, ext3, Linux 2.4.18

Поиск
Список
Период
Сортировка
От Doug Fields
Тема Re: WAL recycling, ext3, Linux 2.4.18
Дата
Msg-id 5.1.0.14.2.20020708120620.01f39538@pop.pexicom.com
обсуждение исходный текст
Ответ на Re: WAL recycling, ext3, Linux 2.4.18  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: WAL recycling, ext3, Linux 2.4.18  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: WAL recycling, ext3, Linux 2.4.18  (Curt Sampson <cjs@cynic.net>)
Список pgsql-general
At 11:57 AM 7/8/2002, Tom Lane wrote:
>Doug Fields <dfields-pg-general@pexicom.com> writes:
> > I'm still trying to track down my very odd periodic pauses/hangs in
> > PostgreSQL 7.2.1.
> > I've localized it to what seems to be the "recycled transaction log file"
> > lines in the log file. Whenever this happens, a whole bunch of queries
> > which were "on hold" (just sitting there, as can be seen in
> > pg_stat_activity, when they usually execute in fractions of a second) come
> > back to life and finish very quickly.
>
>Hm.  That would be from the tail end of a checkpoint operation.  If you
>do a CHECKPOINT command manually, do you get a similar hangup?

In fact, yes. (I never did a CHECKPOINT ever before now.) It took less time
(probably because I did it around half way through the 6.5-minute WAL
recycling period), but had almost identical effects. pg_stat_activity
showed a whole bunch of queries stacked up, and vmstat exhibited the same
pathology: the other postgres processes blocked during the checkpoint (it's
pretty plain in the vmstat output below).

    procs                      memory    swap          io     system
  cpu
  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy  id
  0  0  0      0 3851828 121732
3913552   0   0     2    79   56    62  49   2  49
  1  0  0      0 3850724 121732
3913576   0   0     0  1171  214  1023   4   0  95
  1  0  0      0 3851544 121732
3913576   0   0     0   266  179   148   1   0  99
  0  0  0      0 3851780 121732
3913600   0   0     0  1407  266   364   4   1  95
  0  2  1      0 3843020 121732
3913616   0   0     0  4182  195   186   2   4  94
  0  2  3      0 3841608 121732
3913616   0   0     0  2722  197   191   0   1  99
  0  2  3      0 3840984 121732
3913616   0   0     0  2774  232   298   1   1  99
  0  2  3      0 3840056 121732
3913616   0   0     0  2739  200   204   1   1  99
  0  2  3      0 3839280 121732
3913616   0   0     0  1634  183   201   1   0  99
  0  2  4      0 3839424 121732
3913616   0   0     0  4098  202   143   0   1  99
  0  2  4      0 3837672 121732
3913616   0   0     0  2710  194   170   1   1  99
  0  2  3      0 3836992 121732
3913616   0   0     0  2139  156    82   0   0  99
  0  2  4      0 3827844 121732
3913632   0   0     0  5362  170  3260   3   1  95
  0  2  4      0 3828592 121732
3913632   0   0     0  2134  143    53   0   1  99
  1  1  0      0 3830892 121744
3913668   0   0     0  4926  152  6087   8   3  89
  0  0  0      0 3850360 121744 3913772   0   0     0  7851  610
16106  61   5  34
  0  0  0      0 3850360 121744
3913772   0   0     0    20  189   284   1   0  99
  0  0  0      0 3850360 121744
3913772   0   0     0    14  147   157   0   0 100

>What are the stuck queries doing exactly?  Can you attach to a few of the
>stuck backends with gdb and get stack traces?

The stuck queries are of the following nature, typically either:

1) UPDATE x SET somedate=now() WHERE primarykey IN (1,2,3,4...300 to 450 IDs)
2) DELETE FROM x WHERE key1 = 1 AND key2 IN (1,2,3,4...300 to 450 IDs)

In my pg_stats display below you can see some of them.

Here is a stack trace. I did "where" about every second during the "pause"
and received the same stack trace. This is on PID 3456 per the
pg_stat_activity listing below. After things clear up, I also did a stack
trace; it's blocked on recv, presumably waiting for more commands to come
down the socket. (I tried a few other PIDs with similar stack traces, all
stuck on the semop call.)

(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
0x4028913f in semop () from /lib/libc.so.6
(gdb) where
#0  0x4028913f in semop () from /lib/libc.so.6
#1  0x08106dec in IpcSemaphoreLock ()
#2  0x0810bce1 in LWLockAcquire ()
#3  0x0808433b in XLogInsert ()
#4  0x08079033 in _bt_doinsert ()
#5  0x08078ea9 in _bt_doinsert ()
#6  0x080786ed in _bt_doinsert ()
#7  0x0807be9d in btinsert ()
#8  0x08152822 in OidFunctionCall5 ()
#9  0x08077cf8 in index_insert ()
#10 0x080c674a in ExecInsertIndexTuples ()
#11 0x080c373e in ExecutorEnd ()
#12 0x080c3374 in ExecutorEnd ()
#13 0x080c2797 in ExecutorRun ()
#14 0x081104de in ProcessQuery ()
#15 0x0810ed70 in pg_exec_query_string ()
#16 0x0810fd5e in PostgresMain ()
#17 0x080f6d4e in ClosePostmasterPorts ()
#18 0x080f669f in ClosePostmasterPorts ()
#19 0x080f5882 in PostmasterMain ()
#20 0x080f5391 in PostmasterMain ()
#21 0x080d4e18 in main ()
#22 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) c
Continuing.

And then later...

(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
0x402887f2 in recv () from /lib/libc.so.6
(gdb) where
#0  0x402887f2 in recv () from /lib/libc.so.6
#1  0x080d42bc in StreamClose ()
#2  0x080d430d in pq_getbyte ()
#3  0x0810e7c8 in HandleFunctionRequest ()
#4  0x0810e837 in HandleFunctionRequest ()
#5  0x0810fc5e in PostgresMain ()
#6  0x080f6d4e in ClosePostmasterPorts ()
#7  0x080f669f in ClosePostmasterPorts ()
#8  0x080f5882 in PostmasterMain ()
#9  0x080f5391 in PostmasterMain ()
#10 0x080d4e18 in main ()
#11 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) c
Continuing.

> > 1) Is there any known bad interactions with ext3fs and PostgreSQL?
>
>Dunno.  The CHECKPOINT would probably create a significant number of
>disk write requests, followed by a sync() request.  If that could
>monopolize an ext3 filesystem for a long time, perhaps that would
>explain your problem.  But I haven't heard similar complaints before.
>
>What do you have shared_buffers set to?

pexicast_lg=# show shared_buffers;
NOTICE:  shared_buffers is 65536
SHOW VARIABLE

I believe this is about 512 megs. I have 8 gigs RAM on this server and
tried 65536 and before it 32768. Either one seems to work fine - I didn't
notice any significant performance difference yet between them.

See below for pg_stat_activity...

I appreciate the help!

Cheers,

Doug

pexicast_lg=# select * from pg_stat_activity where datname='pexicast_lg';
select now();
   datid  |   datname   | procpid | usesysid | usename
|
current_query


---------+-------------+---------+----------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  3926366 | pexicast_lg |    3393 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(463196,463216,463236,463256,463276,463296,461176,461196,461216,461236,461256,461276,461296,461316,461336,461356,461376,461396,461416,461436,461456,461476,461496,461516,461536,461556,461576,46
  3926366 | pexicast_lg |    3412 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(425066,425086,425106,425126,425146,425166,425186,425206,425226,425246,425266,425286,425306,425326,425346,425366,425386,425406,425426,425446,425466,425486,425506,425526,425546,425566,425586,42
  3926366 | pexicast_lg |    3443 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(442093,442113,442133,442153,442173,442193,442213,442233,442253,442273,442293,442313,442333,442353,442373,442393,442413,442433,442453,442473,442493,442513,442533,442553,442573,442593,442613,44
  3926366 | pexicast_lg |    3444 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(438107,438127,438147,438167,438187,438207,438227,438247,436147,436167,436187,436207,436227,436247,436267,436287,436307,436327,436347,436367,436387,436407,436427,436447,436467,436487,436507,43
  3926366 | pexicast_lg |    3446 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(429470,429490,429510,429530,429550,429570,429590,429610,429630,429650,429670,429690,429710,429730,429750,429770,429790,429810,429830,429850,429870,429890,429910,429930,427830,427850,427870,42
  3926366 | pexicast_lg |    3447 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(463658,463678,463698,463718,463738,463758,463778,463798,463818,463838,463858,463878,463898,463918,460378,461838,461858,461878,461898,461918,461938,461958,461978,461998,462018,462038,462058,46
  3926366 | pexicast_lg |    3448 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(407502,407522,407542,407562,407582,407602,407622,407642,407662,407682,407702,407722,407742,407762,407782,407802,407822,407842,407862,407882,405802,405822,405842,405862,405882,405902,405922,40
  3926366 | pexicast_lg |    3449 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(475717,475737,475757,475777,475797,475817,475837,475857,475877,475897,475917,475937,475957,475977,475997,476017,476037,476057,476077,476097,476117,476137,476157,474077,474097,474117,474137,47
  3926366 | pexicast_lg |    3450 |      100 | tomcat  | <IDLE>
  3926366 | pexicast_lg |    3451 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(438472,438492,438512,438532,438552,438572,438592,438612,438632,438652,438672,436572,436592,436612,436632,436652,436672,436692,436712,436732,436752,436772,436792,436812,436832,436852,436872,43
  3926366 | pexicast_lg |    3452 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(415401,415421,415441,415461,415481,415501,415521,415541,415561,415581,415601,415621,415641,415661,415681,413581,413601,413621,413641,413661,413681,413701,413721,413741,413761,413781,413801,41
  3926366 | pexicast_lg |    3453 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(442908,442928,442948,442968,440868,440888,440908,440928,440948,440968,440988,441008,441028,441048,441068,441088,441108,441128,441148,441168,441188,441208,441228,441248,441268,441288,441308,44
  3926366 | pexicast_lg |    3454 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(410524,410544,410564,410584,410604,410624,410644,410664,410684,410704,410724,410744,408644,408664,408684,408704,408724,408744,408764,408784,408804,408824,408844,408864,408884,408904,408924,40
  3926366 | pexicast_lg |    3455 |      100 | tomcat  | <IDLE>
  3926366 | pexicast_lg |    3456 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(451115,451135,451155,451175,451195,451215,451235,451255,451275,451295,451315,451335,451355,451375,451395,451415,451435,451455,451475,451495,451515,451535,451555,449475,449495,449515,449535,44
  3926366 | pexicast_lg |    3457 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(408325,408345,408365,408385,408405,408425,408445,408465,408485,408505,408525,408545,406465,406485,406505,406525,406545,406565,406585,406605,406625,406645,406665,406685,406705,406725,406745,40
  3926366 | pexicast_lg |    3458 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(397080,397100,397120,397140,397160,397180,397200,397220,397240,397260,397280,397300,397320,397340,397360,397380,397400,397420,397440,397460,397480,397500,397520,397540,397560,397580,397600,39
  3926366 | pexicast_lg |    3459 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(476459,476479,476499,476519,476539,476559,476579,476599,476619,476639,476659,476679,476699,476719,476739,476759,474659,474679,474699,474719,474739,474759,474779,474799,474819,474839,474859,47
  3926366 | pexicast_lg |    3460 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(411003,411023,411043,411063,411083,411103,411123,411143,411163,411183,411203,411223,411243,411263,411283,411303,411323,411343,411363,411383,411403,411423,411443,411463,411483,411503,411523,41
  3926366 | pexicast_lg |    3461 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(432589,432609,432629,432649,432669,432689,432709,432729,432749,432769,432789,432809,432829,432849,432869,432889,432909,430809,430829,430849,430869,430889,430909,430929,430949,430969,430989,43
  3926366 | pexicast_lg |    3462 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(470654,470674,470694,470714,470734,470754,470774,470794,470814,470834,470854,470874,470894,470914,470934,470954,470974,470994,471014,471034,471054,471074,471094,471114,471134,471154,471174,47
  3926366 | pexicast_lg |    3497 |      100 | tomcat  | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN

(440551,440571,440591,440611,440631,440651,440671,440691,440711,440731,440751,440771,440791,440811,440831,440851,440871,440891,440911,440931,440951,440971,440991,441011,441031,441051,441071,44
  3926366 | pexicast_lg |    4235 |      101 | dfields | <IDLE>
(23 rows)





В списке pgsql-general по дате отправления:

Предыдущее
От: "Peter Alberer"
Дата:
Сообщение: Speeding up subselect ?
Следующее
От: Manfred Koizar
Дата:
Сообщение: Re: PostgreSQL doesn't use indexes even is enable_seqscan = off