Re: Odd blocking (or massively latent) issue - even with EXPLAIN

Поиск

Список

Период

Сортировка

От	Jim Vanns
Тема	Re: Odd blocking (or massively latent) issue - even with EXPLAIN
Дата	24 июля 2012 г. 05:48:39
Msg-id	1343119691.12579.151.camel@sys367.ldn.framestore.com обсуждение исходный текст
Ответ на	Re: Odd blocking (or massively latent) issue - even with EXPLAIN ("Martin French" <Martin.French@romaxtech.com>)
Ответы	Re: Odd blocking (or massively latent) issue - even with EXPLAIN
Список	pgsql-performance

Дерево обсуждения

> This looks like this is a pure IO issue. You mentioned that this was a
> software RAID system. I wonder if there's some complication there.
>
> Have you tried setting the disk queues to deadline?
>
> echo "deadline" > /sys/block/{DEVICE-NAME}/queue/scheduler
> That might help. But to be honest, it really does sound disk/software
> raid related with the CPU and IO being so high.
>
> Can you attempt to replicate the problem on another system without
> software RAID?
>
> Also, you might want to try a disk test on the machine, it's 24GB ram
> right?
>
> so, try the following tests on the Postgres data disk (you'll
> obviously need lots of space for this):
>
> Write Test:
> time sh -c "dd if=/dev/zero of=bigfile bs=8k count=6000000 && sync"
>
> Read Test:
> time dd if=bigfile of=/dev/null bs=8k

I've already tried something very similar using dd. No performance
penalties during a normal running of the system - or when this blocking
happens either actually. But I agree, it does indeed sound like some
sort of I/O problem. I just don't know what! I do have a few more tricks
up my sleeve that I'll try today. I'll post any results that I have.

That latter test - won't that pretty much just read from the page cache?
'sync' may well have forced dirty pages to disk but does it actually
evict them to?

Anyway, that is off topic... perhaps ;)

Thanks again,

Jim

> ( Tests taken from Greg Smiths page:
> http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm )
>
> >
> > > >
> > > > We regularly run vacuum analyze at quiet periods - generally
> 1-2s
> > > daily.
> >
> > (this is to answer to someone who didn't reply to the list)
> >
> > We run full scans using vacuumdb so don't just rely on autovacuum.
> The
> > small table is so small (<50 tuples) a sequence scan is always
> > performed.
> >
> > > > These sessions (that only read data) that are blocked can block
> from
> > > > anything from between only 5 minutes to 10s of hours then
> > > miraculously
> > > > complete successfully at once.
> > > >
> > >
> > > Are any "blockers" shown in pg_stat_activity?
> >
> > None. Ever. Nothing in pg_locks either.
> >
> > > >
> > > > checkpoint_segments = 128
> > > > maintenance_work_mem = 256MB
> > > > synchronous_commit = off
> > > > random_page_cost = 3.0
> > > > wal_buffers = 16MB
> > > > shared_buffers = 8192MB
> > > > checkpoint_completion_target = 0.9
> > > > effective_cache_size = 18432MB
> > > > work_mem = 32MB
> > > > effective_io_concurrency = 12
> > > > max_stack_depth = 8MB
> > > > log_autovacuum_min_duration = 0
> > > > log_lock_waits = on
> > > > autovacuum_vacuum_scale_factor = 0.1
> > > > autovacuum_naptime = 8
> > > > autovacuum_max_workers = 4
> > >
> > > Memory looks reasonably configured to me. effective_cache_size is
> only
> > > an indication to the planner and is not actually allocated.
> >
> > I realise that.
> >
> > > Is anything being written to the logfiles?
> >
> > Nothing obvious - and we log a fair amount. No tmp table creations,
> > no locks held.
> >
> > To add to this EXPLAIN reports it took only 0.23ms to run (for
> example)
> > whereas the wall clock time is more like 20-30 minutes (or up to n
> hours
> > as I said where everything appears to click back into place at the
> same
> > time).
> >
> > Thanks.
> >
>
> Something else you might want to try is running with a default
> Postgresql.conf, if the query/explain then runs fine, then that would
> lead me to believe that there is a configuration issue. Although I'm
> pretty convinced that it may be the disk set up.
>
> Cheers
> ============================================= Romax Technology Limited
> Rutherford House Nottingham Science & Technology Park Nottingham, NG7
> 2PZ England Telephone numbers: +44 (0)115 951 88 00 (main) For other
> office locations see: http://www.romaxtech.com/Contact
> ================================= =============== E-mail:
> info@romaxtech.com Website: www.romaxtech.com
> ================================= ================ Confidentiality
> Statement This transmission is for the addressee only and contains
> information that is confidential and privileged. Unless you are the
> named addressee, or authorised to receive it on behalf of the
> addressee you may not copy or use it, or disclose it to anyone else.
> If you have received this transmission in error please delete from
> your system and contact the sender. Thank you for your cooperation.
> =================================================
>

--
Jim Vanns
Systems Programmer
Framestore

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Odd blocking (or massively latent) issue - even with EXPLAIN