Обсуждение: Query performance inconsistant.

Поиск
Список
Период
Сортировка

Query performance inconsistant.

От
Matthew Schumacher
Дата:
I have been having performance problems with my DB so this morning I
added some config to log queries that take more than 250ms.  The result
is surprising because some queries will take as long as 10 seconds, but
then you do a explain analyze on them they show that indexes are being
used and they run very fast.  Here is an example:

2006-08-31 05:55:39.560 LOG:  duration: 3835.182 ms  statement: select
acctMessage( <params hidden to protect the innocent> )

But the same query returns this when I explain it:

> > explain analyze select acctMessage( <params hidden to protect the
innocent> );
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Result  (cost=0.00..0.03 rows=1 width=0) (actual time=26.797..26.799
rows=1 loops=1)
 Total runtime: 36.838 ms

So the question is, why do some queries take a very long time?  Sure,
the obvious answer is the machine was busy doing something else, but I
can't find any evidence of that:

05:40:01 AM       CPU     %user     %nice   %system   %iowait    %steal
%idle
05:54:01 AM       all      0.58      0.00      0.47      0.50      0.00
    98.45
05:55:01 AM       all      0.46      0.00      0.31      3.45      0.00
    95.79
05:56:01 AM       all      0.75      0.00      0.25      4.32      0.00
    94.69

05:40:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree
kbswpused  %swpused  kbswpcad
05:54:01 AM     96376   3019292     96.91      6724   2789348   1004000
       20      0.00        20
05:55:01 AM     92904   3022764     97.02      7844   2791424   1004000
       20      0.00        20
05:56:01 AM     98840   3016828     96.83      9056   2784160   1004000
       20      0.00        20

05:40:01 AM       tps      rtps      wtps   bread/s   bwrtn/s
05:54:01 AM     21.53      4.35     17.18     67.77    344.84
05:55:01 AM     71.61     59.11     12.50   1202.79    283.57
05:56:01 AM     29.22     13.94     15.29    264.18    316.59

Any thoughts on how to track this down?  I don't want to go buy a faster
server when I can't confirm that hardware performance is the problem.

Thanks,
schu




Re: Query performance inconsistant.

От
Tom Lane
Дата:
Matthew Schumacher <matt.s@aptalaska.net> writes:
> I have been having performance problems with my DB so this morning I
> added some config to log queries that take more than 250ms.  The result
> is surprising because some queries will take as long as 10 seconds, but
> then you do a explain analyze on them they show that indexes are being
> used and they run very fast.

Is it possible that it's not directly that query's fault?  For instance
it could be blocked by a lock held by some other transaction.  I can't
unfortunately think of any very nice way to deduce this from log entries
... you'd have to catch it in the act and look into pg_locks to find out
who's the perpetrator.

            regards, tom lane

Re: Query performance inconsistant.

От
Matthew Schumacher
Дата:
Tom Lane wrote:
> Matthew Schumacher <matt.s@aptalaska.net> writes:
>> I have been having performance problems with my DB so this morning I
>> added some config to log queries that take more than 250ms.  The result
>> is surprising because some queries will take as long as 10 seconds, but
>> then you do a explain analyze on them they show that indexes are being
>> used and they run very fast.
>
> Is it possible that it's not directly that query's fault?  For instance
> it could be blocked by a lock held by some other transaction.  I can't
> unfortunately think of any very nice way to deduce this from log entries
> ... you'd have to catch it in the act and look into pg_locks to find out
> who's the perpetrator.
>
>             regards, tom lane

This does help me try to figure out where the problem is.  The proc in
question inserts in a very large table, and updates another large table.
 Since postgres puts each proc in it's own transaction I'm thinking the
problem may be the database locking these large tables while this proc
is called concurrently.

In order to understand this better I need to know how postgres locking
works and when locks are used.  Do you know of any documentation that I
can read that explains this?

Thanks,
schu

Re: Query performance inconsistant.

От
Jeff Davis
Дата:
On Thu, 2006-08-31 at 11:04 -0800, Matthew Schumacher wrote:
> Tom Lane wrote:
> > Matthew Schumacher <matt.s@aptalaska.net> writes:
> >> I have been having performance problems with my DB so this morning I
> >> added some config to log queries that take more than 250ms.  The result
> >> is surprising because some queries will take as long as 10 seconds, but
> >> then you do a explain analyze on them they show that indexes are being
> >> used and they run very fast.
> >
> > Is it possible that it's not directly that query's fault?  For instance
> > it could be blocked by a lock held by some other transaction.  I can't
> > unfortunately think of any very nice way to deduce this from log entries
> > ... you'd have to catch it in the act and look into pg_locks to find out
> > who's the perpetrator.
> >
> >             regards, tom lane
>
> This does help me try to figure out where the problem is.  The proc in
> question inserts in a very large table, and updates another large table.
>  Since postgres puts each proc in it's own transaction I'm thinking the
> problem may be the database locking these large tables while this proc
> is called concurrently.
>
> In order to understand this better I need to know how postgres locking
> works and when locks are used.  Do you know of any documentation that I
> can read that explains this?

http://www.postgresql.org/docs/8.1/static/mvcc.html

In the "Explicit Locking" section it details the locks acquired by
UPDATE, etc.

From what you described, I would not expect many locking problems. Are
there any other types of queries you run that may cause a lock? Do you
run periodic "VACUUM FULL" or something? "VACUUM FULL" causes a full
table lock, and is usually not necessary. If so, try running just
"VACUUM" without "FULL".

Regards,
    Jeff Davis


Re: Query performance inconsistant.

От
Matthew Schumacher
Дата:
Jeff Davis wrote:
> http://www.postgresql.org/docs/8.1/static/mvcc.html
>
> In the "Explicit Locking" section it details the locks acquired by
> UPDATE, etc.
>
>From what you described, I would not expect many locking problems. Are
> there any other types of queries you run that may cause a lock? Do you
> run periodic "VACUUM FULL" or something? "VACUUM FULL" causes a full
> table lock, and is usually not necessary. If so, try running just
> "VACUUM" without "FULL".
>
> Regards,
>     Jeff Davis
>

Jeff,

I have "autovacuum = on" in the config file with a pretty frequent
autovacuum_naptime, but I'm unsure if that does a vacuum or vacuum full.

schu


Re: Query performance inconsistant.

От
Tom Lane
Дата:
Jeff Davis <pgsql@j-davis.com> writes:
> From what you described, I would not expect many locking problems. Are
> there any other types of queries you run that may cause a lock?

Row locks (SELECT FOR UPDATE/SHARE) are a possible problem, particularly
if this is a pre-8.1 Postgres where exclusive row locks were used for
foreign key constraints.

            regards, tom lane

Re: Query performance inconsistant.

От
Tom Lane
Дата:
Matthew Schumacher <matt.s@aptalaska.net> writes:
> I have "autovacuum = on" in the config file with a pretty frequent
> autovacuum_naptime, but I'm unsure if that does a vacuum or vacuum full.

autovacuum *never* does a vacuum full, because that would lead to
unexpected blockages of foreground queries.  Still though, autovac could
be contributing to the problem indirectly.  I'm assuming that most of
your transactions on the problem table are short.  It's possible that
one or more clients are grabbing quasi-exclusive table locks, and
normally you don't notice because they are able to get the lock quickly,
do their work, and get out.  But if autovac is working on the table then
the requestor of the exclusive lock blocks ... and everyone else queues
up behind him, until the vacuum command finishes with the table.

            regards, tom lane

Re: Query performance inconsistant.

От
Matthew Schumacher
Дата:
Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
>> From what you described, I would not expect many locking problems. Are
>> there any other types of queries you run that may cause a lock?
>
> Row locks (SELECT FOR UPDATE/SHARE) are a possible problem, particularly
> if this is a pre-8.1 Postgres where exclusive row locks were used for
> foreign key constraints.
>
>             regards, tom lane

Tom,

I'm still having issues with this so lemme provide more information....
perhaps there is something obvious....

Here is the proc that has very inconsistent (anywhere from 25ms to
8000ms) performance:

CREATE FUNCTION acctmessage(_accttype character varying, _username
character varying, _ipaddress character varying, _nastimestamp
 timestamp with time zone, _sessionid character varying, _nassessionid
character varying, _nasipaddress character varying, _input
octets bigint, _outputoctets bigint, _inputgigawords integer,
_outputgigawords integer, _sessionlength bigint, _termcause charact
er varying, _nasidentifier character varying, _clientipaddress character
varying, _nasport character varying, _framedprotocol cha
racter varying, _servicetype character varying, _connectinfo character
varying) RETURNS void
    AS $$
DECLARE
    session_rec RECORD;
BEGIN

IF _clientipaddress <> '127.0.0.1' THEN

  INSERT into accounting_tab (
    acctType,
    userName,
    ipAddress,
    nasTimestamp,
    sessionId,
    nasSessionId,
    nasIpAddress,
    inputOctets,
    outputOctets,
    inputGigaWords,
    outputGigaWords,
    sessionLength,
    termCause,
    nasIdentifier,
    clientIpAddress,
    nasPort,
    framedProtocol,
    serviceType,
    connectInfo
  ) values (
    _acctType,
    _userName,
    _ipAddress,
    _nasTimestamp,
    _sessionId,
    _nasSessionId,
    _nasIpAddress,
    _inputOctets,
    _outputOctets,
    _inputGigaWords,
    _outputGigaWords,
    _sessionLength,
    _termCause,
    _nasIdentifier,
   _clientIpAddress,
    _nasPort,
    _framedProtocol,
    _serviceType,
    _connectInfo
  );

END IF;

SELECT INTO session_rec sessionId FROM radutmp_tab WHERE sessionId =
_sessionId;

IF session_rec.sessionId IS NULL AND _acctType = 'start' THEN

  INSERT into radutmp_tab (
    lastAcctType,
    userName,
    ipAddress,
    nasStartTimestamp,
    sessionId,
    nasSessionId,
    nasIpAddress,
    inputOctets,
    outputOctets,
    inputGigaWords,
    outputGigaWords,
    sessionLength,
    termCause,
    nasIdentifier,
    clientIpAddress,
    nasPort,
    framedProtocol,
    serviceType
  ) values (
    _acctType,
    _userName,
    _ipAddress,
    _nasTimestamp,
    _sessionId,
    _nasSessionId,
    _nasIpAddress,
    _inputOctets,
    _outputOctets,
    _inputGigaWords,
    _outputGigaWords,
    _sessionLength,
    _termCause,
    _nasIdentifier,
    _clientIpAddress,
    _nasPort,
    _framedProtocol,
    _serviceType
  ) ;

ELSIF session_rec.sessionId IS NOT NULL AND _acctType = 'stop' THEN

UPDATE
  radutmp_tab
SET
  lastAcctType = _acctType,
  nasStopTimestamp = _nasTimestamp,
  ipAddress = _ipAddress,
  sessionlength = _sessionlength,
  inputOctets = _inputOctets,
  outputOctets = _outputOctets,
  inputgigawords = _inputgigawords,
  outputgigawords = _outputgigawords,
  nasSessionId = _nasSessionId,
  nasIPAddress = _nasIPAddress,
  clientIPAddress = _clientIPAddress,
  nasPort = _nasPort,
  framedProtocol = _framedProtocol,
  termCause = _termCause
WHERE
  sessionId = _sessionId
AND
  userName = _userName
AND
  serviceType = _serviceType;

END IF;

END;
$$
    LANGUAGE plpgsql;

It looks long, but it's really pretty simple, it inserts data into the
accounting_tab and then updates or inserts into the radutmp_tab table
based on whether the session ID is known or not.

Here are the tables:

               Table "public.accounting_tab"
     Column      |           Type           |   Modifiers
-----------------+--------------------------+---------------
 sessionid       | character varying(32)    | not null
 nassessionid    | character varying(32)    | not null
 accttype        | character varying(6)     | not null
 username        | character varying(20)    | not null
 nastimestamp    | timestamp with time zone |
 nasipaddress    | character varying(15)    | not null
 nasidentifier   | character varying(15)    |
 clientipaddress | character varying(15)    | not null
 servicetype     | character varying(6)     | not null
 sessionlength   | bigint                   | default 0
 inputoctets     | bigint                   | default 0
 outputoctets    | bigint                   | default 0
 inputgigawords  | integer                  | default 0
 outputgigawords | integer                  | default 0
 nasport         | character varying(32)    |
 ipaddress       | character varying(32)    |
 framedprotocol  | character varying(32)    |
 termcause       | character varying(32)    |
 timestamp       | timestamp with time zone | default now()
 connectinfo     | character varying(100)   |
Indexes:
    "accounting_nasidentifier_idx" btree (nasidentifier)
    "accounting_nastimestamp_idx" btree (nastimestamp)
    "accounting_sessionid_idx" btree (sessionid)
    "accounting_timestamp_idx" btree ("timestamp")
    "accounting_username_idx" btree (username)

                  Table "public.radutmp_tab"
      Column       |           Type           |   Modifiers
-------------------+--------------------------+---------------
 sessionid         | character varying(32)    | not null
 nassessionid      | character varying(32)    | not null
 lastaccttype      | character varying(6)     | not null
 username          | character varying(20)    | not null
 nasstarttimestamp | timestamp with time zone |
 nasstoptimestamp  | timestamp with time zone |
 nasipaddress      | character varying(15)    | not null
 nasidentifier     | character varying(15)    |
 clientipaddress   | character varying(15)    | not null
 servicetype       | character varying(6)     | not null
 sessionlength     | bigint                   | default 0
 inputoctets       | bigint                   | default 0
 outputoctets      | bigint                   | default 0
 inputgigawords    | integer                  | default 0
 outputgigawords   | integer                  | default 0
 nasport           | character varying(32)    |
 ipaddress         | character varying(32)    |
 framedprotocol    | character varying(32)    |
 termcause         | character varying(32)    |
 timestamp         | timestamp with time zone | default now()
Indexes:
    "radutmp_tab_pkey" PRIMARY KEY, btree (sessionid)
    "radutmp_nasstoptimestamp_idx" btree (nasstoptimestamp)
    "radutmp_servicetype_idx" btree (servicetype)
    "radutmp_username_idx" btree (username)

As you can see the tables are pretty simple and don't have any foreign
keys, but they are large.  accounting_tab is 1.8mill rows and
radutmp_tab is 200k rows.

I should note that This query gets run quite a bit as well:

SELECT COUNT(*) FROM radutmp_tab WHERE UserName='username' AND
nasStopTimestamp IS NULL and serviceType = 'servicetype'

It tells me if there is an active session or not.

This setup, with concurrency, is returning very inconsistent query
performance.  Sometimes its very fast, other times it's slow and waits.
 This makes me think I have a table locking issue, but I'm not sure
since pg_locks rarely reports more than this:

    locktype    | database | relation | page | tuple | transactionid |
classid | objid | objsubid | transaction | pid  |      mode       | granted

---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+-----------------+---------
 relation      |    16395 |    10342 |      |       |               |
      |       |          |    30641038 | 8720 | AccessShareLock | t
 transactionid |          |          |      |       |      30641038 |
      |       |          |    30641038 | 8720 | ExclusiveLock   | t

Do you see anything wrong with how I'm doing this?  Perhaps it's time to
get faster hardware, but it doesn't seem like the box is that loaded.

Thanks,
schu




Re: Query performance inconsistant.

От
Tom Lane
Дата:
Matthew Schumacher <matt.s@aptalaska.net> writes:
> Here is the proc that has very inconsistent (anywhere from 25ms to
> 8000ms) performance:
> ...
> This setup, with concurrency, is returning very inconsistent query
> performance.  Sometimes its very fast, other times it's slow and waits.
>  This makes me think I have a table locking issue, but I'm not sure
> since pg_locks rarely reports more than this:

Yeah, I suspect a locking issue too, but you won't find out what it is
unless you can capture the content of pg_locks at a time when the proc
is being delayed.

You could try making the proc do
    LOCK TABLE ... IN ROW EXCLUSIVE MODE NOWAIT
inside an exception block, catching the error, and logging the contents
of pg_locks to someplace.

            regards, tom lane