Обсуждение: Strange slowness

Поиск
Список
Период
Сортировка

Strange slowness

От
Russ Neufeld
Дата:
Hi all,

I'm investigating some occasional slowness in our API servers, and I've hit upon some strange behavior that I can't explain.  I'm hoping someone on this list can help.

Stack is django 1.3.1, Ubuntu 10.04, python 2.6.5, psycopg 2.4.5, postgresql 9.0, pgbouncer 1.4.1.  All of these run on the same machine with the exception of postgresql, which runs on a dedicated db machine.  So, when django creates a connection to the database, it is connecting to localhost and really talking to pgbouncer.

Here's the relevant code from django/db/backends/postgresql_psycopg2/base.py:

self.connection = Database.connect(**conn_params)
self.connection.set_client_encoding('UTF8')
self.connection.set_isolation_level(self.isolation_level)

cursor = self.connection.cursor()

cursor.execute("SET TIME ZONE %s", [settings_dict['TIME_ZONE']])

I put timing code between these lines, and found that calling set_client_encoding() was sometimes taking upwards of 6 seconds.  Very strange, but it led me to the fact that our db encoding was mistakenly set to sql_ascii, so I've fixed that.  Now, I'm seeing occasional 6 seconds delays on the next line - set_isolation_level.  We're using defaults, so this is setting the level to 1, or "read committed".  Theoretically that's a noop, but obviously the code needs to check that the value passed in equals the value already set, and then return.  Still, it shouldn't take upwards of 6 seconds to do this at times.  The machines are not swapping, and not doing anything else besides serving some static content.

It seems very strange that after I fixed the db encoding, the timing problem would jump to the next line.  Before I fixed the db encoding, we never saw long times spent in set_isolation_level.  This leads me to believe the issue is something else, perhaps network related.  For example, supposed Database.connect lazily creates a connection, and now that set_client_encoding can short circuit due to the encodings matching, the first time the connection is used is now in set_isolation_level.  This is just wild speculation, but I'm at a loss in explaining what I'm seeing.

Anyone have any ideas about what could be going on, or where to look further?  Thanks,

Russ

Re: Strange slowness

От
Daniele Varrazzo
Дата:
On Mon, Apr 30, 2012 at 10:29 PM, Russ Neufeld <russn@nasuni.com> wrote:
> Hi all,
>
> I'm investigating some occasional slowness in our API servers, and I've hit
> upon some strange behavior that I can't explain.  I'm hoping someone on this
> list can help.
>
> Stack is django 1.3.1, Ubuntu 10.04, python 2.6.5, psycopg 2.4.5, postgresql
> 9.0, pgbouncer 1.4.1.  All of these run on the same machine with the
> exception of postgresql, which runs on a dedicated db machine.  So, when
> django creates a connection to the database, it is connecting to localhost
> and really talking to pgbouncer.
>
> Here's the relevant code
> from django/db/backends/postgresql_psycopg2/base.py:
>
> self.connection = Database.connect(**conn_params)
> self.connection.set_client_encoding('UTF8')
> self.connection.set_isolation_level(self.isolation_level)
>
> cursor = self.connection.cursor()
>
> cursor.execute("SET TIME ZONE %s", [settings_dict['TIME_ZONE']])
>
> I put timing code between these lines, and found that calling
> set_client_encoding() was sometimes taking upwards of 6 seconds.  Very
> strange, but it led me to the fact that our db encoding was mistakenly set
> to sql_ascii, so I've fixed that.  Now, I'm seeing occasional 6 seconds
> delays on the next line - set_isolation_level.  We're using defaults, so
> this is setting the level to 1, or "read committed".  Theoretically that's a
> noop, but obviously the code needs to check that the value passed in equals
> the value already set, and then return.  Still, it shouldn't take upwards of
> 6 seconds to do this at times.  The machines are not swapping, and not doing
> anything else besides serving some static content.
>
> It seems very strange that after I fixed the db encoding, the timing problem
> would jump to the next line.  Before I fixed the db encoding, we never saw
> long times spent in set_isolation_level.  This leads me to believe the issue
> is something else, perhaps network related.  For example,
> supposed Database.connect lazily creates a connection, and now
> that set_client_encoding can short circuit due to the encodings matching,
> the first time the connection is used is now in set_isolation_level.  This
> is just wild speculation, but I'm at a loss in explaining what I'm seeing.
>
> Anyone have any ideas about what could be going on, or where to look
> further?  Thanks,

From the description I'd look into how pgbouncer connects to the db:
if it was a normal PG connection, the slowest of the commands would be
connect() (and looking at Django's code, Database is just an alias for
psycopg2, so it's really just it). For a test I would connect directly
to the db instead of pgbouncer and see if connect() becomes the slow
part. You can also enable logging in the server to get the duration of
each statement, which should show you that set time zone or whatever
is the first command is not the slow one (you could combine it with a
couple of prints, which I bet will show you that the delay happens
between the execute() statement and the moment the server receives the
command).

Could it be that pgbouncer is slow to connect because of some ssl
issue? If you rule it out of your setup and see the slowness at
connect time, then it would be an area to look into.

-- Daniele

Re: Strange slowness

От
Federico Di Gregorio
Дата:
On 30/04/12 23:29, Russ Neufeld wrote:
> Hi all,
>
> I'm investigating some occasional slowness in our API servers, and I've
> hit upon some strange behavior that I can't explain.  I'm hoping someone
> on this list can help.

Hi Russ,

you may want to configure PostgreSQL to log connections and queries,
with execution times. Then cross-check PostgreSQL, Django and pgbouncer
logs to understand exactly where that 6 seconds are spent. I bet on
pgbouncer connecting to PostgreSQL (SSL?) but we don't have enough data
to do a proper analysis here.

federico

--
Federico Di Gregorio                         federico.digregorio@dndg.it
Studio Associato Di Nunzio e Di Gregorio                  http://dndg.it
             Quis custodiet ipsos custodes? -- Juvenal, Satires, VI, 347

Re: Strange slowness

От
Russ Neufeld
Дата:
Thanks Federico.  I think you might be right - pgbouncer does not seem to be maintaining connections to postgresql for
verylong, and as a result when a load spike comes in I suspect there's a fair bit of time and contention in
reestablishingthem.  I'm looking there now. 

On May 1, 2012, at 5:54 AM, Federico Di Gregorio wrote:

> On 30/04/12 23:29, Russ Neufeld wrote:
>> Hi all,
>>
>> I'm investigating some occasional slowness in our API servers, and I've
>> hit upon some strange behavior that I can't explain.  I'm hoping someone
>> on this list can help.
>
> Hi Russ,
>
> you may want to configure PostgreSQL to log connections and queries,
> with execution times. Then cross-check PostgreSQL, Django and pgbouncer
> logs to understand exactly where that 6 seconds are spent. I bet on
> pgbouncer connecting to PostgreSQL (SSL?) but we don't have enough data
> to do a proper analysis here.
>
> federico
>
> --
> Federico Di Gregorio                         federico.digregorio@dndg.it
> Studio Associato Di Nunzio e Di Gregorio                  http://dndg.it
>             Quis custodiet ipsos custodes? -- Juvenal, Satires, VI, 347
>
> --
> Sent via psycopg mailing list (psycopg@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/psycopg


Re: Strange slowness

От
Russ Neufeld
Дата:
Thanks Daniele.  We're not using ssl, but I think you might be right to look at pgbouncer.  It does not seem to be
maintainingconnections to postgresql for very long, and as a result when a load spike comes in I suspect there's a fair
bitof time and contention in reestablishing them.  I'm looking there now. 

On Apr 30, 2012, at 5:46 PM, Daniele Varrazzo wrote:

> On Mon, Apr 30, 2012 at 10:29 PM, Russ Neufeld <russn@nasuni.com> wrote:
>> Hi all,
>>
>> I'm investigating some occasional slowness in our API servers, and I've hit
>> upon some strange behavior that I can't explain.  I'm hoping someone on this
>> list can help.
>>
>> Stack is django 1.3.1, Ubuntu 10.04, python 2.6.5, psycopg 2.4.5, postgresql
>> 9.0, pgbouncer 1.4.1.  All of these run on the same machine with the
>> exception of postgresql, which runs on a dedicated db machine.  So, when
>> django creates a connection to the database, it is connecting to localhost
>> and really talking to pgbouncer.
>>
>> Here's the relevant code
>> from django/db/backends/postgresql_psycopg2/base.py:
>>
>> self.connection = Database.connect(**conn_params)
>> self.connection.set_client_encoding('UTF8')
>> self.connection.set_isolation_level(self.isolation_level)
>>
>> cursor = self.connection.cursor()
>>
>> cursor.execute("SET TIME ZONE %s", [settings_dict['TIME_ZONE']])
>>
>> I put timing code between these lines, and found that calling
>> set_client_encoding() was sometimes taking upwards of 6 seconds.  Very
>> strange, but it led me to the fact that our db encoding was mistakenly set
>> to sql_ascii, so I've fixed that.  Now, I'm seeing occasional 6 seconds
>> delays on the next line - set_isolation_level.  We're using defaults, so
>> this is setting the level to 1, or "read committed".  Theoretically that's a
>> noop, but obviously the code needs to check that the value passed in equals
>> the value already set, and then return.  Still, it shouldn't take upwards of
>> 6 seconds to do this at times.  The machines are not swapping, and not doing
>> anything else besides serving some static content.
>>
>> It seems very strange that after I fixed the db encoding, the timing problem
>> would jump to the next line.  Before I fixed the db encoding, we never saw
>> long times spent in set_isolation_level.  This leads me to believe the issue
>> is something else, perhaps network related.  For example,
>> supposed Database.connect lazily creates a connection, and now
>> that set_client_encoding can short circuit due to the encodings matching,
>> the first time the connection is used is now in set_isolation_level.  This
>> is just wild speculation, but I'm at a loss in explaining what I'm seeing.
>>
>> Anyone have any ideas about what could be going on, or where to look
>> further?  Thanks,
>
> From the description I'd look into how pgbouncer connects to the db:
> if it was a normal PG connection, the slowest of the commands would be
> connect() (and looking at Django's code, Database is just an alias for
> psycopg2, so it's really just it). For a test I would connect directly
> to the db instead of pgbouncer and see if connect() becomes the slow
> part. You can also enable logging in the server to get the duration of
> each statement, which should show you that set time zone or whatever
> is the first command is not the slow one (you could combine it with a
> couple of prints, which I bet will show you that the delay happens
> between the execute() statement and the moment the server receives the
> command).
>
> Could it be that pgbouncer is slow to connect because of some ssl
> issue? If you rule it out of your setup and see the slowness at
> connect time, then it would be an area to look into.
>
> -- Daniele