Обсуждение: Possible causes of sometimes slow single-row UPDATE with trivial indexed condition?

Поиск

Список

Период

Сортировка

Possible causes of sometimes slow single-row UPDATE with trivial indexed condition?

От

"Vlad Romascanu"

Дата:

17 сентября 2009 г., 02:01:59

Problem occurs when running (in production) Postgres 8.3.7 64-bit (from RPM)
on Ubuntu 8.04.2, on an Amazon EC2 (xen) "Large" instance (8GB RAM), with
the DB on a 50GB EC2 block device.
Problem does not occur when running (in staging/pre-production) Postgres
8.3.5 32-bit (from RPM) on Ubuntu 8.04.1, on a less beefy Amazon EC2 (xen)
"Small" instance, with the DB on a 5GB EC2 block device.

I am running with slow query logging on, and somewhat inexplicably I've been
getting the following slow UPDATE query several times in the past weeks (I'm
also including some context lines above and below):

   2009-09-14 08:07:06.238 UTC user@database pid=26524 ip=127.0.0.1(58380)
sid=4aadf5ba.679c:1 LOG:  duration: 103.905 ms
     statement: COMMIT
   2009-09-14 08:10:19.025 UTC user@database pid=26524 ip=127.0.0.1(58380)
sid=4aadf5ba.679c:2 LOG:  duration: 124.341 ms
     statement: COMMIT
   2009-09-14 08:10:47.359 UTC user@database pid=26524 ip=127.0.0.1(58380)
sid=4aadf5ba.679c:3 LOG:  duration: 126.896 ms
     statement: COMMIT
>> 2009-09-14 08:12:30.363 UTC user@database pid=26474 ip=127.0.0.1(58364)
>> sid=4aadf58d.676a:1 LOG:  duration: 13472.892 ms
>>   statement: UPDATE "document_sets" SET "status" = E'rejected',
>> "updated_at" = '2009-09-14 08:12:16.890054' WHERE "id" = 288
   2009-09-14 08:13:41.237 UTC user@database pid=26474 ip=127.0.0.1(58364)
sid=4aadf58d.676a:2 LOG:  duration: 107.674 ms
     statement: SELECT * FROM "tenders"

This is one of the "faster" occurrences; at times the query has been logged
as having taken 100+ seconds:

2009-07-21 06:05:23.035 UTC user@database pid=24834 ip=127.0.0.1(34505)
sid=4a6559e1.6102:1 LOG:  duration: 156605.004 ms
  statement: UPDATE "document_sets" SET "status" = E'rejected', "updated_at"
= '2009-07-21 06:02:46.430176' WHERE "id" = 318
...
2009-07-21 06:16:32.148 UTC user@database pid=23500 ip=127.0.0.1(38720)
sid=4a6552dd.5bcc:2 LOG:  duration: 14833.439 ms
  statement: UPDATE "document_sets" SET "status" = E'rejected', "updated_at"
= '2009-07-21 06:16:17.314905' WHERE "id" = 188
...
2009-08-11 07:31:35.867 UTC user@database pid=1227 ip=127.0.0.1(55630)
sid=4a811ded.4cb:1 LOG:  duration: 29258.137 ms
  statement: UPDATE "document_sets" SET "status" = E'rejected', "updated_at"
= '2009-08-11 07:31:06.609191' WHERE "id" = 251
...
2009-08-13 11:16:40.027 UTC user@database pid=13442 ip=127.0.0.1(41127)
sid=4a83f557.3482:1 LOG:  duration: 10287.765 ms
  statement: UPDATE "document_sets" SET "status" = E'rejected', "updated_at"
= '2009-08-13 11:16:29.738634' WHERE "id" = 273
...
2009-08-16 05:30:09.082 UTC user@database pid=3505 ip=127.0.0.1(36644)
sid=4a8798aa.db1:1 LOG:  duration: 153523.612 ms
  statement: UPDATE "document_sets" SET "status" = E'approved', "updated_at"
= '2009-08-16 05:27:35.558505' WHERE "id" = 369
...
2009-08-16 05:30:09.673 UTC user@database pid=3518 ip=127.0.0.1(36655)
sid=4a8798c8.dbe:1 LOG:  duration: 114885.274 ms
  statement: UPDATE "document_sets" SET "status" = E'approved', "updated_at"
= '2009-08-16 05:28:14.787689' WHERE "id" = 369
...
2009-08-16 05:30:10.318 UTC user@database pid=3580 ip=127.0.0.1(36707)
sid=4a879919.dfc:1 LOG:  duration: 73107.179 ms
  statement: UPDATE "document_sets" SET "status" = E'approved', "updated_at"
= '2009-08-16 05:28:57.210502' WHERE "id" = 369
...
2009-08-20 06:27:54.961 UTC user@database pid=8312 ip=127.0.0.1(38488)
sid=4a8cec7b.2078:1 LOG:  duration: 18959.648 ms
  statement: UPDATE "document_sets" SET "status" = E'rejected', "updated_at"
= '2009-08-20 06:27:36.001030' WHERE "id" = 255
...
2009-09-10 06:30:08.176 UTC user@database pid=25992 ip=127.0.0.1(59692)
sid=4aa89ac1.6588:1 LOG:  duration: 27495.609 ms
  statement: UPDATE "document_sets" SET "status" = E'rejected', "updated_at"
= '2009-09-10 06:29:40.680647' WHERE "id" = 346
...
2009-09-14 08:12:30.363 UTC user@database pid=26474 ip=127.0.0.1(58364)
sid=4aadf58d.676a:1 LOG:  duration: 13472.892 ms
  statement: UPDATE "document_sets" SET "status" = E'rejected', "updated_at"
= '2009-09-14 08:12:16.890054' WHERE "id" = 288

No other UPDATE or INSERT or DELETE operation has ever triggered the slow
query logging.  No other query of any kind has ever taken more than 500ms.

When analyzing this UPDATE query (verbose output at end of message), as
expected:

> Index Scan using document_sets_pkey on document_sets  (cost=0.00..8.27
> rows=1 width=47) (actual time=0.025..0.028 rows=1 loops=1)
>   Index Cond: (id = 288)
> Total runtime: 0.137 ms

Actual table rowcount: 423 rows.  Table statistics:
 Sequential Scans 12174674
 Sequential Tuples Read 2764651442
 Index Scans 813
 Index Tuples Fetched 813
 Tuples Inserted 424
 Tuples Updated 625
 Tuples Deleted 1
 Heap Blocks Read 9
 Heap Blocks Hit 35888949
 Index Blocks Read 11
 Index Blocks Hit 1858
 Toast Blocks Read 0
 Toast Blocks Hit 0
 Toast Index Blocks Read 0
 Toast Index Blocks Hit 0
 Table Size 40 kB
 Toast Table Size 8192 bytes
 Indexes Size 32 kB

Only entries changed in postgresql.conf default (mainly to reflect running
on an 8GB machine):
> shared_buffers = 256MB                        # min 128kB or
> max_connections*16kB
> work_mem = 4MB                                # min 64kB
> checkpoint_segments = 8                       # in logfile segments, min
> 1, 16MB each
> effective_cache_size = 1280MB
> log_min_duration_statement = 100      # -1 is disabled, 0 logs all
> statements
> log_line_prefix = '%m %u@%d pid=%p ip=%r sid=%c:%l '

sar data for period in question (CPU, processes, paging, disk, etc.) shows
no obvious change in CPU usage, number of forks, load, or disk activity on
the database device (sdh).

# atopsar -b 7:00 -e 10:00 | grep all
ip-XXXXXX 2.6.21.7-2.fc8xen  #1 SMP Fri Feb 15 12:34:28 EST 2008  x86_64
2009/09/14
-------------------------- analysis date:
2009/09/14 --------------------------
07:04:15  cpu %usr %nice   %sys %irq %softirq   %steal    %wait %idle
_cpu_
07:14:15  all    1     0      0    0        0        0        0    98
07:24:15  all    2     0      0    0        0        0        0    97
07:34:15  all    3     0      0    0        0        0        0    97
07:44:15  all    1     0      0    0        0        0        0    99
07:54:15  all    2     0      0    0        0        0        0    98
08:04:15  all    2     0      0    0        0        0        0    98
08:14:15  all    2     0      0    0        0        0        0    97
<--- slow query occurred at 8:12
08:24:15  all    6     0      1    0        0        0        0    94
08:34:16  all    8     0      0    0        0        0        0    92
08:44:16  all    2     0      0    0        0        0        0    98
08:54:16  all    1     0      0    0        0        0        0    99
09:04:16  all    3     0      0    0        0        0        0    97
09:14:16  all    6     0      1    0        0        0        0    94
09:24:16  all    1     0      1    0        0        0        0    98
09:34:16  all    1     0      0    0        0        0        0    98
09:44:16  all    2     0      0    0        0        0        0    98
09:54:17  all    2     0      0    0        0        0        0    98

# atopsar -b 7:00 -e 10:00 -s
ip-XXXXXX 2.6.21.7-2.fc8xen  #1 SMP Fri Feb 15 12:34:28 EST 2008  x86_64
2009/09/14
-------------------------- analysis date:
2009/09/14 --------------------------
07:04:15  pagescan/s   swapin/s  swapout/s        commitspc  commitlim
_swap_
07:14:15        0.00       0.00       0.00            1647M      3587M
07:24:15        0.00       0.00       0.00            1447M      3587M
07:34:15        0.00       0.00       0.00            2038M      3587M
07:44:15        0.00       0.00       0.00            1444M      3587M
07:54:15        0.00       0.00       0.00            2082M      3587M
08:04:15        0.00       0.00       0.00            1658M      3587M
08:14:15        0.00       0.00       0.00            1990M      3587M
<--- slow query occurred at 8:12
08:24:15        0.00       0.00       0.00            1709M      3587M
08:34:16        0.00       0.00       0.00            1507M      3587M
08:44:16        0.00       0.00       0.00            1894M      3587M
08:54:16        0.00       0.00       0.00            1640M      3587M
09:04:16        0.00       0.00       0.00            1775M      3587M
09:14:16        0.00       0.00       0.00            2209M      3587M
09:24:16        0.00       0.00       0.00            2035M      3587M
09:34:16        0.00       0.00       0.00            1887M      3587M
09:44:16        0.00       0.00       0.00            1922M      3587M
09:54:17        0.00       0.00       0.00            2138M      3587M

# atopsar -b 7:00 -e 10:00 -d
ip-XXXXXXX 2.6.21.7-2.fc8xen  #1 SMP Fri Feb 15 12:34:28 EST 2008  x86_64
2009/09/14
-------------------------- analysis date:
2009/09/14 --------------------------
07:04:15  disk    busy   read/s KB/read   write/s KB/writ   avque avserv
_disk_
07:24:15  sdh       0%     0.00     0.0      3.10    14.2    3.41   1.17 ms
07:34:15  sdh       0%     0.00     0.0      3.43    14.0    3.85   1.01 ms
07:44:15  sdh       0%     0.00     0.0      2.76    13.8    4.51   1.14 ms
07:54:15  sdh       0%     0.00     0.0      3.11    13.7    2.87   1.11 ms
08:04:15  sdh       0%     0.00     0.0      3.33    13.4    3.85   1.13 ms
08:14:15  sdh       0%     0.00     0.0      3.71    13.3    3.37   1.25 ms
<--- slow query occurred at 8:12
08:24:15  sdh       0%     0.00     0.0      2.95    13.4    5.04   1.13 ms
08:34:16  sdh       0%     0.00     0.0      3.27    13.4    4.17   1.05 ms
08:44:16  sdh       0%     0.00     0.0      2.73    13.7    3.31   1.17 ms
08:54:16  sdh       0%     0.00     0.0      1.57    13.9    4.02   1.14 ms
09:04:16  sdh       0%     0.00     0.0      2.05    14.2    4.75   0.96 ms
09:14:16  sdh       0%     0.00     0.0      3.87    14.0    4.19   1.10 ms
09:24:16  sdh       0%     0.00     0.0      3.26    13.9    4.17   1.11 ms
09:34:16  sdh       0%     0.00     0.0      2.17    14.1    3.67   1.18 ms
09:44:16  sdh       0%     0.00     0.0      2.72    15.0    3.63   0.99 ms
09:54:17  sdh       0%     0.00     0.0      3.15    15.0    4.40   1.21 ms

# atopsar -b 7:00 -e 10:00 -p
ip-XXXXXXX 2.6.21.7-2.fc8xen  #1 SMP Fri Feb 15 12:34:28 EST 2008  x86_64
2009/09/14
-------------------------- analysis date:
2009/09/14 --------------------------
07:04:15  pswch/s devintr/s  forks/s  loadavg1 loadavg5 loadavg15
_load_
07:14:15      174       163     0.90      0.21     0.20      0.18
07:24:15      207       228     0.90      0.24     0.28      0.21
07:34:15      250       233     0.88      0.18     0.26      0.21
07:44:15      176       172     0.97      0.04     0.13      0.16
07:54:15      215       203     0.88      0.26     0.24      0.18
08:04:15      177       162     0.96      0.23     0.22      0.18
08:14:15      212       259     0.90      0.33     0.29      0.21     <---
slow query occurred at 8:12
08:24:15      220       266     1.08      1.29     0.75      0.40
08:34:16      207       290     0.84      0.25     0.51      0.49
08:44:16      178       175     0.95      0.09     0.21      0.33
08:54:16      159       156     0.82      0.12     0.12      0.20
09:04:16      185       324     0.78      0.44     0.33      0.24
09:14:16      279       505     0.92      0.49     0.56      0.38
09:24:16      231       222     0.87      0.18     0.33      0.34
09:34:16      164       166     0.74      0.21     0.15      0.21
09:44:16      210       191     0.78      0.78     0.34      0.23
09:54:17      240       224     1.03      0.32     0.27      0.23






Possible causes?
1) I/O bottleneck?  Bad block on device? Yet "sar -d" shows no change in I/O
service time
2) lock contention with (autovacuum?)? why would this not affect other
statements on other tables?
3) clock change?  Yet why only during this particular UPDATE query?
4) ???

Thank you,
V.




- - - -


Explain verbose:

>  {INDEXSCAN
>  :startup_cost 0.00
>  :total_cost 8.27
>  :plan_rows 1
>  :plan_width 47
>  :targetlist (
>     {TARGETENTRY
>     :expr
>        {VAR
>        :varno 1
>        :varattno 1
>        :vartype 23
>        :vartypmod -1
>        :varlevelsup 0
>        :varnoold 1
>        :varoattno 1
>        }
>     :resno 1
>     :resname id
>     :ressortgroupref 0
>     :resorigtbl 0
>     :resorigcol 0
>     :resjunk false
>     }
>     {TARGETENTRY
>     :expr
>        {VAR
>        :varno 1
>        :varattno 2
>        :vartype 23
>        :vartypmod -1
>        :varlevelsup 0
>        :varnoold 1
>        :varoattno 2
>        }
>     :resno 2
>     :resname tender_id
>     :ressortgroupref 0
>     :resorigtbl 0
>     :resorigcol 0
>     :resjunk false
>     }
>     {TARGETENTRY
>     :expr
>        {VAR
>        :varno 1
>        :varattno 3
>        :vartype 25
>        :vartypmod -1
>        :varlevelsup 0
>        :varnoold 1
>        :varoattno 3
>        }
>     :resno 3
>     :resname note
>     :ressortgroupref 0
>     :resorigtbl 0
>     :resorigcol 0
>     :resjunk false
>     }
>     {TARGETENTRY
>     :expr
>        {CONST
>        :consttype 1043
>        :consttypmod 259
>        :constlen -1
>        :constbyval false
>        :constisnull false
>        :constvalue 12 [ 48 0 0 0 114 101 106 101 99 116 101 100 ]
>        }
>     :resno 4
>     :resname status
>     :ressortgroupref 0
>     :resorigtbl 0
>     :resorigcol 0
>     :resjunk false
>     }
>     {TARGETENTRY
>     :expr
>        {VAR
>        :varno 1
>        :varattno 5
>        :vartype 1114
>        :vartypmod -1
>        :varlevelsup 0
>        :varnoold 1
>        :varoattno 5
>        }
>     :resno 5
>     :resname created_at
>     :ressortgroupref 0
>     :resorigtbl 0
>     :resorigcol 0
>     :resjunk false
>     }
>     {TARGETENTRY
>     :expr
>        {CONST
>        :consttype 1114
>        :consttypmod -1
>        :constlen 8
>        :constbyval false
>        :constisnull false
>        :constvalue 8 [ -58 44 34 -2 -125 22 1 0 ]
>        }
>     :resno 6
>     :resname updated_at
>     :ressortgroupref 0
>     :resorigtbl 0
>     :resorigcol 0
>     :resjunk false
>     }
>     {TARGETENTRY
>     :expr
>        {VAR
>        :varno 1
>        :varattno -1
>        :vartype 27
>        :vartypmod -1
>        :varlevelsup 0
>        :varnoold 1
>        :varoattno -1
>        }
>     :resno 7
>     :resname ctid
>     :ressortgroupref 0
>     :resorigtbl 0
>     :resorigcol 0
>     :resjunk true
>     }
>  )
>  :qual <>
>  :lefttree <>
>  :righttree <>
>  :initPlan <>
>  :extParam (b)
>  :allParam (b)
>  :scanrelid 1
>  :indexid 18771
>  :indexqual (
>     {OPEXPR
>     :opno 96
>     :opfuncid 65
>     :opresulttype 16
>     :opretset false
>     :args (
>        {VAR
>        :varno 1
>        :varattno 1
>        :vartype 23
>        :vartypmod -1
>        :varlevelsup 0
>        :varnoold 1
>        :varoattno 1
>        }
>        {CONST
>        :consttype 23
>        :consttypmod -1
>        :constlen 4
>        :constbyval true
>        :constisnull false
>        :constvalue 4 [ 32 1 0 0 0 0 0 0 ]
>        }
>     )
>     }
>  )
>  :indexqualorig (
>     {OPEXPR
>     :opno 96
>     :opfuncid 65
>     :opresulttype 16
>     :opretset false
>     :args (
>        {VAR
>        :varno 1
>        :varattno 1
>        :vartype 23
>        :vartypmod -1
>        :varlevelsup 0
>        :varnoold 1
>        :varoattno 1
>        }
>        {CONST
>        :consttype 23
>        :consttypmod -1
>        :constlen 4
>        :constbyval true
>        :constisnull false
>        :constvalue 4 [ 32 1 0 0 0 0 0 0 ]
>        }
>     )
>     }
>  )
>  :indexstrategy (i 3)
>  :indexsubtype (o 23)
>  :indexorderdir 1
>  }

Re: Possible causes of sometimes slow single-row UPDATE with trivial indexed condition?

От

Richard Huxton

Дата:

17 сентября 2009 г., 04:58:46

Vlad Romascanu wrote:
> Problem occurs when running (in production) Postgres 8.3.7 64-bit (from
> RPM) on Ubuntu 8.04.2, on an Amazon EC2 (xen) "Large" instance (8GB
> RAM), with the DB on a 50GB EC2 block device.

Hmm - don't know what the characteristics of running PG on EC2 are. This
might be something peculiar to that.

> Problem does not occur when running (in staging/pre-production) Postgres
> 8.3.5 32-bit (from RPM) on Ubuntu 8.04.1, on a less beefy Amazon EC2
> (xen) "Small" instance, with the DB on a 5GB EC2 block device.
>
> I am running with slow query logging on, and somewhat inexplicably I've
> been getting the following slow UPDATE query several times in the past
> weeks (I'm also including some context lines above and below):

>>> 2009-09-14 08:12:30.363 UTC user@database pid=26474
>>> ip=127.0.0.1(58364) sid=4aadf58d.676a:1 LOG:  duration: 13472.892 ms
>>>   statement: UPDATE "document_sets" SET "status" = E'rejected',

> This is one of the "faster" occurrences; at times the query has been
> logged as having taken 100+ seconds:

That's *very* slow indeed, and clearly the query itself is simple enough.

Typically in a situation like this you might suspect checkpointing was
the problem. Lots of dirty disk pages being flushed to disk before a
checkpoint. The stats for disk activity you give don't support that
idea, although 10 minute intervals is quite far apart.

Your table-stats show this is a small table. If it's updated a lot then
it might be that your autovacuum settings aren't high enough for this
table. The log_autovacuum_min_duration setting might be worth enabling
too - to see if autovacuum is taking a long time over anything.

Another thing that can cause irregular slowdowns is if you have a
trigger with some custom code that takes an unexpectedly long time to
run (takes locks, runs a query that plans badly occasionally). I don't
know if that's the case here.

Oh, if you don't have indexes on "status" or "updated_at" then you might
want to read up on HOT and decrease your fill-factor on the table too.
That's unrelated to this though.

It looks like the problem is common enough that you could have a small
script check pg_stat_activity once every 10 seconds and dump a snapshot
of pg_locks, vmstat etc. If you can catch the problem happening that
should make it easy to diagnose.

--
  Richard Huxton
  Archonet Ltd

Re: Possible causes of sometimes slow single-row UPDATE with trivial indexed condition?

От

Andy Colson

Дата:

17 сентября 2009 г., 10:45:55

Vlad Romascanu wrote:
> Problem occurs when running (in production) Postgres 8.3.7 64-bit (from
> RPM) on Ubuntu 8.04.2, on an Amazon EC2 (xen) "Large" instance (8GB
> RAM), with the DB on a 50GB EC2 block device.
> Problem does not occur when running (in staging/pre-production) Postgres
> 8.3.5 32-bit (from RPM) on Ubuntu 8.04.1, on a less beefy Amazon EC2
> (xen) "Small" instance, with the DB on a 5GB EC2 block device.
>
> I am running with slow query logging on, and somewhat inexplicably I've
> been getting the following slow UPDATE query several times in the past
> weeks (I'm also including some context lines above and below):
>

I'm not sure how Amazon vm's work, but are there multiple vm's on one
box?  Just because your vm has zero cpu/disk does not mean the host
isn't pegged out of its mind.

Does Amazon give any sort of host stats?

-Andy

Re: Possible causes of sometimes slow single-row UPDATE with trivial indexed condition?

От

"Vlad Romascanu"

Дата:

19 сентября 2009 г., 23:25:13

Hi, Andy,

Impact of other VMs is reflected as %steal time.  And it's only this
specific UPDATE query on this specific table that ever exhibits the problem,
as measured over many days, a query that must represent something like <2%
of all queries run against the DB over the same period and <5% of all other
UPDATE queries running on the system across other tables in the same
tablespace over, again, the same period.

V.

----- Original Message -----
From: "Andy Colson" <andy@squeakycode.net>
To: "Vlad Romascanu" <vromascanu@monaqasat.com>;
<pgsql-performance@postgresql.org>
Sent: Thursday, September 17, 2009 9:45 AM
Subject: Re: [PERFORM] Possible causes of sometimes slow single-row UPDATE
with trivial indexed condition?


> Vlad Romascanu wrote:
>> Problem occurs when running (in production) Postgres 8.3.7 64-bit (from
>> RPM) on Ubuntu 8.04.2, on an Amazon EC2 (xen) "Large" instance (8GB RAM),
>> with the DB on a 50GB EC2 block device.
>> Problem does not occur when running (in staging/pre-production) Postgres
>> 8.3.5 32-bit (from RPM) on Ubuntu 8.04.1, on a less beefy Amazon EC2
>> (xen) "Small" instance, with the DB on a 5GB EC2 block device.
>>
>> I am running with slow query logging on, and somewhat inexplicably I've
>> been getting the following slow UPDATE query several times in the past
>> weeks (I'm also including some context lines above and below):
>>
>
> I'm not sure how Amazon vm's work, but are there multiple vm's on one box?
> Just because your vm has zero cpu/disk does not mean the host isn't pegged
> out of its mind.
>
> Does Amazon give any sort of host stats?
>
> -Andy
>

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Possible causes of sometimes slow single-row UPDATE with trivial indexed condition?

Possible causes of sometimes slow single-row UPDATE with trivial indexed condition?

Re: Possible causes of sometimes slow single-row UPDATE with trivial indexed condition?

Re: Possible causes of sometimes slow single-row UPDATE with trivial indexed condition?

Re: Possible causes of sometimes slow single-row UPDATE with trivial indexed condition?