Обсуждение: How to log 'user time' in postgres logs

Поиск
Список
Период
Сортировка

How to log 'user time' in postgres logs

От
francis picabia
Дата:

Recently had a problem where autovacuum was accidentally left off
and the database took 6 seconds for every task from PHP.
I had no clue the database was the issue because I
had the minimal 

log_duration = on 
log_line_prefix = '<%t>'

With those settings all queries seen were roughly 1ms

I need this log to show the true time it takes to get a result back.

In the Linux world we have the time command which shows the user
time reflecting all overhead added up.  I'd like postgres to show
times like that and then if there are problems I can look further,
change logging details, etc..

I checked docs, googled,  and didn't see anything obvious.


Re: How to log 'user time' in postgres logs

От
Adrian Klaver
Дата:
On 8/28/19 5:36 AM, francis picabia wrote:
> 
> Recently had a problem where autovacuum was accidentally left off
> and the database took 6 seconds for every task from PHP.
> I had no clue the database was the issue because I
> had the minimal
> 
> log_duration = on
> log_line_prefix = '<%t>'
> 
> With those settings all queries seen were roughly 1ms
> 
> I need this log to show the true time it takes to get a result back.
> 
> In the Linux world we have the time command which shows the user
> time reflecting all overhead added up.  I'd like postgres to show
> times like that and then if there are problems I can look further,
> change logging details, etc..
> 
> I checked docs, googled,  and didn't see anything obvious.

I'm having a hard time believing autovacuum was involved in this, given 
you say the queries took only 1ms on  average. That would have been the 
part that would have been impacted by bloated tables/out-of-date 
statistics.

Some questions:

1) How did you arrive at the 6 second figure?

2) Is the PHP application on the same server as the database?

> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: How to log 'user time' in postgres logs

От
Tom Lane
Дата:
Adrian Klaver <adrian.klaver@aklaver.com> writes:
> On 8/28/19 5:36 AM, francis picabia wrote:
>> I had no clue the database was the issue because I
>> had the minimal
>> log_duration = on
>> log_line_prefix = '<%t>'
>> With those settings all queries seen were roughly 1ms
>> 
>> I need this log to show the true time it takes to get a result back.

> I'm having a hard time believing autovacuum was involved in this, given 
> you say the queries took only 1ms on  average.

Also: the time logged by log_duration IS the time it took, so far as the
server can tell.

Perhaps there's monstrous network delays involved, but what I'm suspicious
of is inefficient client-side processing.

            regards, tom lane



Re: How to log 'user time' in postgres logs

От
francis picabia
Дата:

The server was running Moodle.  The slow load time was noticed when loading a quiz containing
multiple images.  All Apache log results showed a 6 seconds or a multiple of 6 for how long
it took to retrieve each image.

Interestingly, if I did a wget, on the server, to the image link (which was processed through a pluginfile.php URL)
even the HTML page returned of "please login first" took consistently 6 seconds.  Never 2, 3, 4, 5 or 7, 8, 9...
So whatever was wrong, there was a 6 second penalty for this.  We found the problem
outside of the quizzes as well, so it wasn't a single part of the PHP code.  In addition, our development
server with a clone of data, database, apps, etc., was fast at doing any of the tests.
Of course a database brought in with a dump will not include any of the cruft
DB structures, so the dev server was free of the problem (and it had fewer cores, memory, etc).

I was not suspecting PostgreSQL as the culprit, because all query times in the log showed
roughly 1 ms response.  I could tail -f the log file while loading a quiz to watch the times
reported.

Last night the vacuum was run (on a database of about 40GB if dumped), and since
then the quizzes and everything run as fast as would be expected.  It had run
for about 1.5 months without vacuum.

Apache, PHP and Postgres all on the same server.  Memory and CPUs not taxed, load kept level
while the problem quizzes were being tested.

Given this experience, I'd like something that reflected the true times Postgres was spending
on any work it was doing.

The other possibility was it just didn't log the slower times, but I have seen larger numbers
in the 10,000 ms range in the night when some backups and housekeeping happens.

All  log related settings:
checkpoint_segments = 12
logging_collector = on
log_destination = 'stderr'
log_directory = 'pg_log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1440
log_rotation_size = 0
log_min_messages = info
log_min_error_statement = debug1
log_duration = on
log_line_prefix = '<%t>'

I know it does sound strange, but this is what we battled with for 2.5 days until the light
came on that the vacuum had been set to off on the target system during server migration.


On Wed, Aug 28, 2019 at 11:07 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 8/28/19 5:36 AM, francis picabia wrote:
>
> Recently had a problem where autovacuum was accidentally left off
> and the database took 6 seconds for every task from PHP.
> I had no clue the database was the issue because I
> had the minimal
>
> log_duration = on
> log_line_prefix = '<%t>'
>
> With those settings all queries seen were roughly 1ms
>
> I need this log to show the true time it takes to get a result back.
>
> In the Linux world we have the time command which shows the user
> time reflecting all overhead added up.  I'd like postgres to show
> times like that and then if there are problems I can look further,
> change logging details, etc..
>
> I checked docs, googled,  and didn't see anything obvious.

I'm having a hard time believing autovacuum was involved in this, given
you say the queries took only 1ms on  average. That would have been the
part that would have been impacted by bloated tables/out-of-date
statistics.

Some questions:

1) How did you arrive at the 6 second figure?

2) Is the PHP application on the same server as the database?

>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: How to log 'user time' in postgres logs

От
Tom Lane
Дата:
francis picabia <fpicabia@gmail.com> writes:
> The server was running Moodle.  The slow load time was noticed when loading
> a quiz containing
> multiple images.  All Apache log results showed a 6 seconds or a multiple
> of 6 for how long
> it took to retrieve each image.

> Interestingly, if I did a wget, on the server, to the image link (which was
> processed through a pluginfile.php URL)
> even the HTML page returned of "please login first" took consistently 6
> seconds.  Never 2, 3, 4, 5 or 7, 8, 9...
> So whatever was wrong, there was a 6 second penalty for this.

Hmm ... some weird DNS behavior, perhaps?  That is one way to explain
a pattern like this.  How long does it take to "dig" or "nslookup"
your server name?

            regards, tom lane



Re: How to log 'user time' in postgres logs

От
francis picabia
Дата:
On Wed, Aug 28, 2019 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
francis picabia <fpicabia@gmail.com> writes:
> The server was running Moodle.  The slow load time was noticed when loading
> a quiz containing
> multiple images.  All Apache log results showed a 6 seconds or a multiple
> of 6 for how long
> it took to retrieve each image.

> Interestingly, if I did a wget, on the server, to the image link (which was
> processed through a pluginfile.php URL)
> even the HTML page returned of "please login first" took consistently 6
> seconds.  Never 2, 3, 4, 5 or 7, 8, 9...
> So whatever was wrong, there was a 6 second penalty for this.

Hmm ... some weird DNS behavior, perhaps?  That is one way to explain
a pattern like this.  How long does it take to "dig" or "nslookup"
your server name?

                        regards, tom lane

Nope, it isn't DNS related.  The dev server uses same DNS service and
was very fast to do the same actions.  I also run the DNS server and would be aware
of any complaints about the performance.  I thought firewall for awhile but there was
nothing there related to it.  

It all cleared up minutes after the vacuum -a -v -z  was run,
and was reported slow several days before this.  I myself saw it consistently slow
for the quiz for the 2.5 days I fought the problem.  We're certain the vacuum
made the difference in performance.

If there is no way to log the "user time", I guess we'd need to log something
like the connection and disconnection to get another idea on the time elapsing.


Re: How to log 'user time' in postgres logs

От
Adrian Klaver
Дата:
On 8/28/19 11:36 AM, francis picabia wrote:
> 
> The server was running Moodle.  The slow load time was noticed when 
> loading a quiz containing
> multiple images.  All Apache log results showed a 6 seconds or a 
> multiple of 6 for how long
> it took to retrieve each image.
> 
> Interestingly, if I did a wget, on the server, to the image link (which 
> was processed through a pluginfile.php URL)
> even the HTML page returned of "please login first" took consistently 6 
> seconds.  Never 2, 3, 4, 5 or 7, 8, 9...
> So whatever was wrong, there was a 6 second penalty for this.  We found 
> the problem

Caching software in the mix?

More below.

> outside of the quizzes as well, so it wasn't a single part of the PHP 
> code.  In addition, our development
> server with a clone of data, database, apps, etc., was fast at doing any 
> of the tests.
> Of course a database brought in with a dump will not include any of the 
> cruft
> DB structures, so the dev server was free of the problem (and it had 
> fewer cores, memory, etc).
> 
> I was not suspecting PostgreSQL as the culprit, because all query times 
> in the log showed
> roughly 1 ms response.  I could tail -f the log file while loading a 
> quiz to watch the times
> reported.
> 
> Last night the vacuum was run (on a database of about 40GB if dumped), 
> and since
> then the quizzes and everything run as fast as would be expected.  It 
> had run
> for about 1.5 months without vacuum.
> 
> Apache, PHP and Postgres all on the same server.  Memory and CPUs not 
> taxed, load kept level
> while the problem quizzes were being tested.
> 
> Given this experience, I'd like something that reflected the true times 
> Postgres was spending
> on any work it was doing.

The log_duration will show that.
If you want to verify open psql and set \timing on and run your statements.


> 
> The other possibility was it just didn't log the slower times, but I 
> have seen larger numbers
> in the 10,000 ms range in the night when some backups and housekeeping 
> happens.
> 
> All  log related settings:
> checkpoint_segments = 12
> logging_collector = on
> log_destination = 'stderr'
> log_directory = 'pg_log'
> log_filename = 'postgresql-%a.log'
> log_truncate_on_rotation = on
> log_rotation_age = 1440
> log_rotation_size = 0
> log_min_messages = info
> log_min_error_statement = debug1
> log_duration = on
> log_line_prefix = '<%t>'
> 
> I know it does sound strange, but this is what we battled with for 2.5 
> days until the light
> came on that the vacuum had been set to off on the target system during 
> server migration.
> 



-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: How to log 'user time' in postgres logs

От
Adrian Klaver
Дата:
On 8/28/19 11:36 AM, francis picabia wrote:
> 

> 
> The other possibility was it just didn't log the slower times, but I 
> have seen larger numbers
> in the 10,000 ms range in the night when some backups and housekeeping 
> happens.
> 
> All  log related settings:
> checkpoint_segments = 12
> logging_collector = on
> log_destination = 'stderr'
> log_directory = 'pg_log'
> log_filename = 'postgresql-%a.log'
> log_truncate_on_rotation = on
> log_rotation_age = 1440
> log_rotation_size = 0
> log_min_messages = info
> log_min_error_statement = debug1
> log_duration = on
> log_line_prefix = '<%t>'

You might also want to log:

log_connections = on

log_disconnections = on

And add to log_line_prefix some mix of:

%u     User name

%p     Process ID

%s     Process start time stamp


> 
> I know it does sound strange, but this is what we battled with for 2.5 
> days until the light
> came on that the vacuum had been set to off on the target system during 
> server migration.
> 
> 



-- 
Adrian Klaver
adrian.klaver@aklaver.com