Обсуждение: Very long execution time of "select nextval('..');"

Поиск
Список
Период
Сортировка

Very long execution time of "select nextval('..');"

От
mljv@planwerk6.de
Дата:
Hi,

we run postgresql-8.1 on a dedicated debian box 64bit, dual-core CPU, 8GB RAM,
RAID-1.

We select our primary keys with select nextval before we actually insert a
record. In my logs i print every statement  which takes longer than 250ms

there are lots of values fetched each day with nextval, but i have about 20-50
statements each day which take up to 17 seconds(!):

LOG:  duration: 12636.746 ms  statement: EXECUTE <unnamed>  [PREPARE:  select
nextval ('member_id_seq')]

[it is always the same statement so i just show some numbers:
LOG:  duration: 4000.991 ms  ...
LOG:  duration: 2994.532 ms  ...
LOG:  duration: 611.167 ms  ...
LOG:  duration: 17072.196 ms
LOG:  duration: 16570.860 ms
LOG:  duration: 14816.153 ms
LOG:  duration: 265.855 ms
LOG:  duration: 1238.361 ms

...
there are about 50 commands per day like this taking longer than 250ms.

i have no idea why something like nextval() can take longer than 1ms.

and if i do it manually it shows up like this:
select nextval ('member_id_seq');
 nextval
---------
  569304
(1 row)

Time: 0.651 ms

Unfortunatly  i can not tell at which time this happens as the log doesn't
show the time of day.

As it is a web application everything above 250ms is not acceptable. Something
like 17seconds (!) is like not working at all.

i observe my server with munin and the load of the server is at the maximum
about 0.5.

Can this be related to the autovacuum process, which we run every 50 minutes?

It seems that something is blocked. But i thought that nextval is never
blocked by anyhing else.

now i looked at some stats
select blks_read, blks_hit from pg_statio_user_sequences;
( i dropped names and relids as they are not important.)

blks_read | blks_hit
-----------+----------
         2 |        0
        22 |      125
        14 |      142
         2 |        0
        14 |        0
        43 |      498
        27 |       24
        34 |        0
        25 |       12
        55 |       55
         8 |       17
         2 |        0
        14 |        0
        34 |        0
         2 |        0
        33 |      539
        58 |       25
        59 |       53
        34 |        0
         2 |        0
         2 |        0
        39 |      135
         2 |        0
        38 |        4
        34 |        0
         2 |        0
        14 |        0
        15 |      131
        23 |     1223
        22 |      147
         1 |     2015
        10 |        2
        13 |       67
        34 |        0
        34 |        0
        32 |     3610
        46 |      590

I dont have any clue what is happening but something runs rather suboptimal.

Any help is very appreciated.

kind regards,
Janning

Re: Very long execution time of "select nextval('..');"

От
Shane Ambler
Дата:
mljv@planwerk6.de wrote:

> Unfortunatly  i can not tell at which time this happens as the log doesn't
> show the time of day.
>

Start with your postgresql.conf -

http://www.postgresql.org/docs/8.2/interactive/runtime-config-logging.html

points of interest :-

log_min_messages - debug1 to get some more info in the log

log_line_prefix - %t will show the timestamp of the log entry

log_statement - can record the queries to the log so you can see just
what is being run as you go through the log.







--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

Re: Very long execution time of "select nextval('..');"

От
Tom Lane
Дата:
mljv@planwerk6.de writes:
> we run postgresql-8.1 on a dedicated debian box 64bit, dual-core CPU, 8GB RAM,
> RAID-1.

8.1.what?

> LOG:  duration: 12636.746 ms  statement: EXECUTE <unnamed>  [PREPARE:  select
> nextval ('member_id_seq')]

That's just bizarre, especially if your system isn't showing any other
signs of stress.

> Unfortunatly  i can not tell at which time this happens as the log doesn't
> show the time of day.

See log_line_prefix.  I think what you need to do is gather some
evidence about what else is happening at the same time --- can you
afford to enable log_statement = all?  Also, you should try to correlate
this with spikes in I/O demand (try running "vmstat 1" or similar).

It could be that this is related to checkpointing, which you won't see
in a log_statement trace.  In 8.1 you'd have to crank up
log_min_messages to DEBUG2 to get log entries for checkpoint start and
end, which is going to result in a mighty verbose log, but you may have
to do that to confirm or disprove the idea.

            regards, tom lane

Re: Very long execution time of "select nextval('..');"

От
mljv@planwerk6.de
Дата:
Am Sonntag 27 Januar 2008 18:56:49 schrieb Tom Lane:
> mljv@planwerk6.de writes:
> > we run postgresql-8.1 on a dedicated debian box 64bit, dual-core CPU, 8GB
> > RAM, RAID-1.
>
> 8.1.what?

8.1.11-0etch1

> > LOG:  duration: 12636.746 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > select nextval ('member_id_seq')]
>
> That's just bizarre, especially if your system isn't showing any other
> signs of stress.
>
> > Unfortunatly  i can not tell at which time this happens as the log
> > doesn't show the time of day.
>
> See log_line_prefix.  I think what you need to do is gather some
> evidence about what else is happening at the same time --- can you
> afford to enable log_statement = all?  Also, you should try to correlate
> this with spikes in I/O demand (try running "vmstat 1" or similar).

i guess log_statement=all will use all IO of my disks as there are lots of
queries

> It could be that this is related to checkpointing, which you won't see
> in a log_statement trace.  In 8.1 you'd have to crank up
> log_min_messages to DEBUG2 to get log entries for checkpoint start and
> end, which is going to result in a mighty verbose log, but you may have
> to do that to confirm or disprove the idea.

ok, at the moment i got some traffic and my load is at 1.5. But now with
logging the timestamp I have seen that the long durations are quite regular
at intervals of 10 minutes.

but what in hell can make nextval take so long? even if checkpointing is badly
configured. I always thought that nextval is one of the fastest operations.
So if it takes 500 ms, fine. things like this can always happen, but 20
seconds sounds more like a hardware failure. But i can't see any.

ok, i will do some research tomorrow as i 'll try to go to sleep now.

kind regards
Janning

Re: Very long execution time of "select nextval('..');"

От
Tom Lane
Дата:
mljv@planwerk6.de writes:
> ok, at the moment i got some traffic and my load is at 1.5. But now with
> logging the timestamp I have seen that the long durations are quite regular
> at intervals of 10 minutes.

Well, that's pretty suggestive.  Tell us about your checkpoint and
bgwriter settings.  Also, is there any other service running on the
machine that might have activity spikes every 10 minutes?

            regards, tom lane

Re: Very long execution time of "select nextval('..');"

От
Greg Smith
Дата:
On Sun, 27 Jan 2008, mljv@planwerk6.de wrote:

> ok, at the moment i got some traffic and my load is at 1.5. But now with
> logging the timestamp I have seen that the long durations are quite regular
> at intervals of 10 minutes.

Sure sounds like checkpoints.  You should turn on the checkpoint warning
feature so it always triggers and see if the long queries completely just
after the checkpoints finish.  Notes on that and what you can do to
possibly improve checkpoint behavior are at
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm The
early parts of that mostly refer to 8.2 but 8.1 is basically the same in
this area.

> but what in hell can make nextval take so long? even if checkpointing is badly
> configured.

You're in a situation where your amount of RAM far exceeds your disk I/O
capabilities.  Brutally bad checkpoints are easy to encounter in that
setup.  Linux by default will use 10% of RAM to hold writes.  At
checkpoint time, that entire Linux buffer cache has to be cleared of
database writes on top of what's written by the checkpoint itself.  How
long do you think it takes to write >800MB of database data with a
significant random-access component to it when your disk is a simple
RAID-1?  20 seconds is not out of the question.

You may want to significantly reduce the size of the Linux write buffer
and see if that helps.
http://www.westnet.com/~gsmith/content/linux-pdflush.htm goes over theory
and suggestions here.

> I always thought that nextval is one of the fastest operations.
> So if it takes 500 ms, fine. things like this can always happen, but 20
> seconds sounds more like a hardware failure. But i can't see any.

Just about everything gets blocked behind the worse checkpoint spikes.
The thing that kind of bothers me about your case is that I'd expect other
queries would also be blocked and you'd have a whole set of >250ms ones
lined up just after the checkpoint is done.  That you're only reporting
issues with nextval makes me wonder if there isn't some other locking
driving the main behavior, perhaps something that just gets worse at
checkpoint time rather than being directly caused by it.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Very long execution time of "select nextval('..');"

От
mljv@planwerk6.de
Дата:
Hi Greg,

first fo all: thanks a lot. i think i understood most of your comments, but -
of course - have now more questions than before :-)

Am Montag, 28. Januar 2008 01:07 schrieb Greg Smith:
> On Sun, 27 Jan 2008, mljv@planwerk6.de wrote:
> > ok, at the moment i got some traffic and my load is at 1.5. But now with
> > logging the timestamp I have seen that the long durations are quite
> > regular at intervals of 10 minutes.
>
> Sure sounds like checkpoints.  You should turn on the checkpoint warning
> feature so it always triggers and see if the long queries completely just
> after the checkpoints finish.

i read und tried to understand all about checkpoints.
i have set
   checkpoint_warning=3600
and restartet Postgresql but i dont see any checkpoint warnings inside my logs
with
   log_min_messages = info
   log_min_error_statement = warning

so i think a checkpoint warning should pop up frequently as i have
  checkpoint_segments = 10
  checkpoint_timeout = 300
so at least each five minutes a warning should be shown, but it isn't.

> Notes on that and what you can do to
> possibly improve checkpoint behavior are at
> http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm The
> early parts of that mostly refer to 8.2 but 8.1 is basically the same in
> this area.

so this what i have done to tune checkpoints. Please let me know if it sounds
reasonable. It list all parameters which in my opinion affect write
performance and which dont have their default value

fsync = on
wal_buffers = 16
bgwriter_delay = 200
bgwriter_lru_percent = 10.0
bgwriter_lru_maxpages = 100
bgwriter_all_percent = 5
bgwriter_all_maxpages = 200

checkpoint_timeout = 300
checkpoint_segments = 10
checkpoint_warning = 3600

shared_buffers = 30000
work_mem = 10240
maintenance_work_mem = 163840
max_fsm_pages = 500000
effective_cache_size = 180000
random_page_cost = 3

autovacuum = on
autovacuum_naptime = 3000

stats_start_collector = on
stats_command_string = off
stats_block_level = on
stats_row_level = on
stats_reset_on_server_start = on

but after a restart i still got lots of queries above 500ms (my new value for
log_min_duration) and still some of them are so simple as select nextval.
(Logs are shortend for your convinience):

11:53:48 duration: 14473.594 ms  EXECUTE <unnamed>  [PREPARE:  select nextval
11:53:48 duration: 12527.846 ms  EXECUTE <unnamed>  [PREPARE:  select nextval
11:53:48 duration: 12488.325 ms  EXECUTE <unnamed>  [PREPARE:  select nextval
11:53:48 duration: 10300.745 ms  EXECUTE <unnamed>  [PREPARE:  select nextval
11:53:48 duration: 4638.686 ms  EXECUTE <unnamed>  [PREPARE:  select nextval
11:53:48 duration: 4012.015 ms  EXECUTE <unnamed>  [PREPARE:  select nextval
11:53:48 duration: 2087.131 ms  EXECUTE <unnamed>  [PREPARE:  select nextval
11:53:48 duration: 11669.099 ms  EXECUTE <unnamed>  [PREPARE:  select nextval
11:54:14 duration: 2007.563 ms  EXECUTE <unnamed>  [PREPARE:  select nextval

> > but what in hell can make nextval take so long? even if checkpointing is
> > badly configured.
>
> You're in a situation where your amount of RAM far exceeds your disk I/O
> capabilities.  Brutally bad checkpoints are easy to encounter in that
> setup.  Linux by default will use 10% of RAM to hold writes.  At
> checkpoint time, that entire Linux buffer cache has to be cleared of
> database writes on top of what's written by the checkpoint itself.  How
> long do you think it takes to write >800MB of database data with a
> significant random-access component to it when your disk is a simple
> RAID-1?  20 seconds is not out of the question.

ok. But is it something you really recommend? I fear changing such values as i
do not understand their impact.

> You may want to significantly reduce the size of the Linux write buffer
> and see if that helps.
> http://www.westnet.com/~gsmith/content/linux-pdflush.htm goes over theory
> and suggestions here.
>
> > I always thought that nextval is one of the fastest operations.
> > So if it takes 500 ms, fine. things like this can always happen, but 20
> > seconds sounds more like a hardware failure. But i can't see any.
>
> Just about everything gets blocked behind the worse checkpoint spikes.
> The thing that kind of bothers me about your case is that I'd expect other
> queries would also be blocked and you'd have a whole set of >250ms ones
> lined up just after the checkpoint is done.  That you're only reporting
> issues with nextval makes me wonder if there isn't some other locking
> driving the main behavior, perhaps something that just gets worse at
> checkpoint time rather than being directly caused by it.

i have many queries far above  250ms. I just choosed to report nextval
statements only because other statements are rather complicated and i would
just get an answer about "explain analyze" which is not my problem.

I appreciate your help very much. thanks a lot.

kind regards,
janning


Re: Very long execution time of "select nextval('..');"

От
mljv@planwerk6.de
Дата:
Hi Greg, hi Tom,

Am Sonntag, 27. Januar 2008 22:44 schrieb Tom Lane:
> mljv@planwerk6.de writes:
> > ok, at the moment i got some traffic and my load is at 1.5. But now with
> > logging the timestamp I have seen that the long durations are quite
> > regular at intervals of 10 minutes.
>
> Well, that's pretty suggestive.  Tell us about your checkpoint and
> bgwriter settings.  Also, is there any other service running on the
> machine that might have activity spikes every 10 minutes?

thanks for your suggestions and the very detail explanation.

i pretty much solved my problem. i changed the checkpoint and bg_writer
settings, but what was much more important, i droped one Job who did some bad
stuff. This job did update some rows on certain situations. these situation
came to often so this job was updating all the time.

I never ment to blame postgresql for my trouble as i work with it for years
and i am pretty sure that postgresql is doing an excelent job. i 've got to
blame me for this bad programming.

thanks a lot for helping as this checkpoint discussion opened my mind for the
real problem which was not easy to see (at least for me).

But one more question to my problem: Before i solved it i saw some statements
which say "SELECT waiting" in the process table (ps ax)

i thought "waiting" means some kind of database lock (row or table lock). Is
it true? If not, what other conditions can occur if a process says "SELECT
waiting"? Can it mean "Disk/IO Waiting" or "Network IO waiting"?

kind regrads
Janning