Обсуждение: pg 9.1 brings host machine down

Поиск
Список
Период
Сортировка

pg 9.1 brings host machine down

От
Konstantin Mikhailov
Дата:
I'm faced with a problem running postgres 9.1.3 which seems to
nobody else see before. Tried to search and only one relevant
post fond (about millions of files in pgsql_tmp).

Sympthoms:

Some postgres process size is getting abnormally big compared
to other postgres processes. Top shows the 'normal' pg processed
is about VIRT 120m, RES ~30m and SHR ~30m. That one
is about 6500m, 3.4g, 30m corresp. Total RAM avail - 8g.
When one more such a process appears the host going into
deep swap and pg restart can help only (actually the stop
won't even stop such a process - after shutdown it still alive
and can be only killed).

base/pgsql_tmp contains millions of files. In this situation stop
and dirty restart is possible - the normal startup is impossible
either. Read somewhere that it tries to delete (a millions
files) from that directory. I can't even imagine when it finish
the deletion so i'm simple move that folder outside the base
- then start can succeed.

on ubuntu 11.10,12.04 x64. cpu intel core Q9650 3GHz.
8G RAM.

Does anybody see that behaviour or maybe have some glue how to
handle it.

PS: the my preliminary conclusion: some sql is produces
a lot of files in the temporary table spaces - very quickly.
When sql is finished postgres tries to cleanup the folder
reading all contents of the folder and removing the files
one by one. It does the removal slow (watched the folder
by `find pgsql_tmp | wc -l') but process still consumes the
RAM. Next such sql will be a killer :(


Re: pg 9.1 brings host machine down

От
Vitalii Tymchyshyn
Дата:
Hello.

Seen this already.
It looks like cross join + sort. Badly configured ORM tools like
Hibernate with multiple one-to-many relationships fetched with 'join'
strategy may produce such result.
Unfortunately I don't know if it's possible to protect from such a case
at server side.

Best regards, Vitalii Tymchyshyn

06.06.12 15:05, Konstantin Mikhailov написав(ла):
> I'm faced with a problem running postgres 9.1.3 which seems to
> nobody else see before. Tried to search and only one relevant
> post fond (about millions of files in pgsql_tmp).
>
> Sympthoms:
>
> Some postgres process size is getting abnormally big compared
> to other postgres processes. Top shows the 'normal' pg processed
> is about VIRT 120m, RES ~30m and SHR ~30m. That one
> is about 6500m, 3.4g, 30m corresp. Total RAM avail - 8g.
> When one more such a process appears the host going into
> deep swap and pg restart can help only (actually the stop
> won't even stop such a process - after shutdown it still alive
> and can be only killed).
>
> base/pgsql_tmp contains millions of files. In this situation stop
> and dirty restart is possible - the normal startup is impossible
> either. Read somewhere that it tries to delete (a millions
> files) from that directory. I can't even imagine when it finish
> the deletion so i'm simple move that folder outside the base
> - then start can succeed.
>
> on ubuntu 11.10,12.04 x64. cpu intel core Q9650 3GHz.
> 8G RAM.
>
> Does anybody see that behaviour or maybe have some glue how to
> handle it.
>
> PS: the my preliminary conclusion: some sql is produces
> a lot of files in the temporary table spaces - very quickly.
> When sql is finished postgres tries to cleanup the folder
> reading all contents of the folder and removing the files
> one by one. It does the removal slow (watched the folder
> by `find pgsql_tmp | wc -l') but process still consumes the
> RAM. Next such sql will be a killer :(
>
>


Re: pg 9.1 brings host machine down

От
Patric Bechtel
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

which fs with which settings are you using? What's the work_mem settings? Which size do the files
have?

Depending on the answer of above questions I would suggest:
- - RAM disk, SSD or separate disk for pgsql_tmp
- - using xfs with noatime,nodiratime,delaylog,logbufs=8,logbsize=256k,nobarrier for the tmp area
- - separating pg_xlog on yet another disk (xfs, too, but with barrier)
- - using deadline scheduler for all database disks
- - increasing work_mem to at least the "common" file size +50%

there's more if I'd know more about the setup.

hth,

Patric

Vitalii Tymchyshyn schrieb am 06.06.2012 14:25:
> Hello.
>
> Seen this already. It looks like cross join + sort. Badly configured ORM tools like Hibernate
> with multiple one-to-many relationships fetched with 'join' strategy may produce such result.
> Unfortunately I don't know if it's possible to protect from such a case at server side.
>
> Best regards, Vitalii Tymchyshyn
>
> 06.06.12 15:05, Konstantin Mikhailov написав(ла):
>> I'm faced with a problem running postgres 9.1.3 which seems to nobody else see before. Tried
>> to search and only one relevant post fond (about millions of files in pgsql_tmp).
>>
>> Sympthoms:
>>
>> Some postgres process size is getting abnormally big compared to other postgres processes.
>> Top shows the 'normal' pg processed is about VIRT 120m, RES ~30m and SHR ~30m. That one is
>> about 6500m, 3.4g, 30m corresp. Total RAM avail - 8g. When one more such a process appears
>> the host going into deep swap and pg restart can help only (actually the stop won't even stop
>> such a process - after shutdown it still alive and can be only killed).
>>
>> base/pgsql_tmp contains millions of files. In this situation stop and dirty restart is
>> possible - the normal startup is impossible either. Read somewhere that it tries to delete (a
>> millions files) from that directory. I can't even imagine when it finish the deletion so i'm
>> simple move that folder outside the base - then start can succeed.
>>
>> on ubuntu 11.10,12.04 x64. cpu intel core Q9650 3GHz. 8G RAM.
>>
>> Does anybody see that behaviour or maybe have some glue how to handle it.
>>
>> PS: the my preliminary conclusion: some sql is produces a lot of files in the temporary table
>> spaces - very quickly. When sql is finished postgres tries to cleanup the folder reading all
>> contents of the folder and removing the files one by one. It does the removal slow (watched
>> the folder by `find pgsql_tmp | wc -l') but process still consumes the RAM. Next such sql
>> will be a killer :(
>>
>>
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: GnuPT 2.5.2

iEYEARECAAYFAk/PT7sACgkQfGgGu8y7ypCr+QCglfi5t4mllLrqVBTbk8SIHt7i
2y8An2wzekmPmx7DsXDQ/h/t2lwDfYDs
=BHRV
-----END PGP SIGNATURE-----

Re: pg 9.1 brings host machine down

От
Julien Cigar
Дата:
if you have millions of files in data/pgsql_tmp it means that you're
using temporary tables (very) heavily .. or you've a huge sorting
activity (of large tables) and that the sort happens on disk (you can
verify that with an EXPLAIN ANALYZE of the query, you'll see something
like "external disk merge").
What you can do is either raise work_mem (be careful that it takes more
space to sort in memory than on disk), or add more RAM (and raise
work_mem too). By the way, it is generally a good idea to monitor that
directory to get and idea of how much concurrent sorting is happening on
your database.
In some extreme case you can also create a dedicated tablespace and add
that tablespace to temp_tablespaces.**
**
On 06/06/2012 14:05, Konstantin Mikhailov wrote:
> I'm faced with a problem running postgres 9.1.3 which seems to
> nobody else see before. Tried to search and only one relevant
> post fond (about millions of files in pgsql_tmp).
>
> Sympthoms:
>
> Some postgres process size is getting abnormally big compared
> to other postgres processes. Top shows the 'normal' pg processed
> is about VIRT 120m, RES ~30m and SHR ~30m. That one
> is about 6500m, 3.4g, 30m corresp. Total RAM avail - 8g.
> When one more such a process appears the host going into
> deep swap and pg restart can help only (actually the stop
> won't even stop such a process - after shutdown it still alive
> and can be only killed).
>
> base/pgsql_tmp contains millions of files. In this situation stop
> and dirty restart is possible - the normal startup is impossible
> either. Read somewhere that it tries to delete (a millions
> files) from that directory. I can't even imagine when it finish
> the deletion so i'm simple move that folder outside the base
> - then start can succeed.
>
> on ubuntu 11.10,12.04 x64. cpu intel core Q9650 3GHz.
> 8G RAM.
>
> Does anybody see that behaviour or maybe have some glue how to
> handle it.
>
> PS: the my preliminary conclusion: some sql is produces
> a lot of files in the temporary table spaces - very quickly.
> When sql is finished postgres tries to cleanup the folder
> reading all contents of the folder and removing the files
> one by one. It does the removal slow (watched the folder
> by `find pgsql_tmp | wc -l') but process still consumes the
> RAM. Next such sql will be a killer :(
>
>


--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.


Вложения

Re: pg 9.1 brings host machine down

От
Konstantin Mikhailov
Дата:
Thanks alot. I've tried to play with work_mem and after few days
of the production testing pg behaves much better. See no more
files in the pgsql_tmp folder. pg processes consumes reasonable
memory, no swap operation any more. I've studied official pg
docs about work_mem an still have no idea which optimal value
work_mem should have. 1MB is obviously too small. I've increased
up to 32m. due to a lot of the sorts and hash joins in the queries.


On Wed, Jun 6, 2012 at 6:40 PM, Patric Bechtel <patric.bechtel@gmail.com> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

which fs with which settings are you using? What's the work_mem settings? Which size do the files
have?

Depending on the answer of above questions I would suggest:
- - RAM disk, SSD or separate disk for pgsql_tmp
- - using xfs with noatime,nodiratime,delaylog,logbufs=8,logbsize=256k,nobarrier for the tmp area
- - separating pg_xlog on yet another disk (xfs, too, but with barrier)
- - using deadline scheduler for all database disks
- - increasing work_mem to at least the "common" file size +50%

there's more if I'd know more about the setup.

hth,

Patric

Vitalii Tymchyshyn schrieb am 06.06.2012 14:25:
> Hello.
>
> Seen this already. It looks like cross join + sort. Badly configured ORM tools like Hibernate
> with multiple one-to-many relationships fetched with 'join' strategy may produce such result.
> Unfortunately I don't know if it's possible to protect from such a case at server side.
>
> Best regards, Vitalii Tymchyshyn
>
> 06.06.12 15:05, Konstantin Mikhailov написав(ла):
>> I'm faced with a problem running postgres 9.1.3 which seems to nobody else see before. Tried
>> to search and only one relevant post fond (about millions of files in pgsql_tmp).
>>
>> Sympthoms:
>>
>> Some postgres process size is getting abnormally big compared to other postgres processes.
>> Top shows the 'normal' pg processed is about VIRT 120m, RES ~30m and SHR ~30m. That one is
>> about 6500m, 3.4g, 30m corresp. Total RAM avail - 8g. When one more such a process appears
>> the host going into deep swap and pg restart can help only (actually the stop won't even stop
>> such a process - after shutdown it still alive and can be only killed).
>>
>> base/pgsql_tmp contains millions of files. In this situation stop and dirty restart is
>> possible - the normal startup is impossible either. Read somewhere that it tries to delete (a
>> millions files) from that directory. I can't even imagine when it finish the deletion so i'm
>> simple move that folder outside the base - then start can succeed.
>>
>> on ubuntu 11.10,12.04 x64. cpu intel core Q9650 3GHz. 8G RAM.
>>
>> Does anybody see that behaviour or maybe have some glue how to handle it.
>>
>> PS: the my preliminary conclusion: some sql is produces a lot of files in the temporary table
>> spaces - very quickly. When sql is finished postgres tries to cleanup the folder reading all
>> contents of the folder and removing the files one by one. It does the removal slow (watched
>> the folder by `find pgsql_tmp | wc -l') but process still consumes the RAM. Next such sql
>> will be a killer :(
>>
>>
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: GnuPT 2.5.2

iEYEARECAAYFAk/PT7sACgkQfGgGu8y7ypCr+QCglfi5t4mllLrqVBTbk8SIHt7i
2y8An2wzekmPmx7DsXDQ/h/t2lwDfYDs
=BHRV
-----END PGP SIGNATURE-----

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: pg 9.1 brings host machine down

От
Craig Ringer
Дата:
On 06/09/2012 01:52 AM, Konstantin Mikhailov wrote:
> Thanks alot. I've tried to play with work_mem and after few days
> of the production testing pg behaves much better. See no more
> files in the pgsql_tmp folder. pg processes consumes reasonable
> memory, no swap operation any more. I've studied official pg
> docs about work_mem an still have no idea which optimal value
> work_mem should have. 1MB is obviously too small. I've increased
> up to 32m. due to a lot of the sorts and hash joins in the queries.
>
The trouble is that the optimal work_mem depends on your workload and
hardware. Or that's my understanding, anyway.

A workload with a few simple queries that sort lots of big data might
want work_mem to be really huge (but not so huge that it causes
thrashing or pushes indexes out of cache).

A workload with lots of really complicated queries full of CTEs,
subqueries, etc might use several times work_mem per connection, and if
there are lots of connections at once might use unexpectedly large
amounts of RAM and cause thrashing or cache competition even with quite
a small work_mem.

Right now, Pg doesn't have the diagnostic tools or automatic tuning to
make it possible to determine an ideal value in any simple way, so it's
mostly a matter of examining query plans, tuning, and monitoring.
Automatic tuning of work_mem would be great, but would also probably be
_really_ hard, and still wouldn't solve the problem where n sorts can
consume n times work_mem, so you can't give complicated_query a strict
enough work_mem limit without severely starving big_simple_query or
having to run a session-local "SET work_mem" before it.

A system for auto-tuning Pg at runtime would be amazing, but also very
_very_ hard, so tweaking params based on benchmarking and examination of
runtime performance is your only real option for now.


--
Craig Ringer