Обсуждение: vacuumdb not letting me connect to db

Поиск
Список
Период
Сортировка

vacuumdb not letting me connect to db

От
Atul Kumar
Дата:
Hi,

I have 160 GB of RAM, postgres 9.6 is running on the server.

after upgrade I ran the below command:

"/usr/edb/as9.6/bin/vacuumdb" -U 'enterprisedb' --verbose --all -j 300
--analyze-only

after running that  command I was not able to connect the database
using psql for few minutes.

After 20-30 minutes i was able to connect to the db and at that time I
checked the pg_stst_activity, the active connections was reduced to
27.

my max_connections is set to 700.

I tried to find out the reason for not being abled to connect the db
(when 300 jobs were running) but still not got the answer. So
suggestions are welcome.



Re: vacuumdb not letting me connect to db

От
Laurenz Albe
Дата:
On Thu, 2021-02-04 at 16:56 +0530, Atul Kumar wrote:
> I have 160 GB of RAM, postgres 9.6 is running on the server.
> 
> after upgrade I ran the below command:
> 
> "/usr/edb/as9.6/bin/vacuumdb" -U 'enterprisedb' --verbose --all -j 300
> --analyze-only
> 
> after running that  command I was not able to connect the database
> using psql for few minutes.

That is to be expected.

If you have 300 processes performing I/O and using CPU, your machine
will vertainly be overloaded.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: vacuumdb not letting me connect to db

От
Atul Kumar
Дата:
Hi,

The CPU and RAM are normal even on 300 jobs ( only 1-4% of consumption) but I don’t understand one thing here that if max_connections is set to 700 then why I am not able to connect the db.

As the running jobs (300) are lesser than half of max_connections.


Regards 
Atul


On Thursday, February 4, 2021, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2021-02-04 at 16:56 +0530, Atul Kumar wrote:
> I have 160 GB of RAM, postgres 9.6 is running on the server.
>
> after upgrade I ran the below command:
>
> "/usr/edb/as9.6/bin/vacuumdb" -U 'enterprisedb' --verbose --all -j 300
> --analyze-only
>
> after running that  command I was not able to connect the database
> using psql for few minutes.

That is to be expected.

If you have 300 processes performing I/O and using CPU, your machine
will vertainly be overloaded.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Re: vacuumdb not letting me connect to db

От
Ron
Дата:
On 2/4/21 5:26 AM, Atul Kumar wrote:
Hi,

I have 160 GB of RAM, postgres 9.6 is running on the server.

after upgrade I ran the below command:

"/usr/edb/as9.6/bin/vacuumdb" -U 'enterprisedb' --verbose --all -j 300
--analyze-only

after running that  command I was not able to connect the database
using psql for few minutes.

What's the exact error message?

After 20-30 minutes i was able to connect to the db and at that time I
checked the pg_stst_activity, the active connections was reduced to
27.

my max_connections is set to 700.

I tried to find out the reason for not being abled to connect the db
(when 300 jobs were running) but still not got the answer. So

Connect to the cluster before running "vacuumdb -j300", and start looking at pg_stst_activity while vacuumdb is running.

--
Angular momentum makes the world go 'round.

Re: vacuumdb not letting me connect to db

От
Ron
Дата:
What about disk IO?  That's what really gets saturated when running 300 threads.

On 2/4/21 11:00 AM, Atul Kumar wrote:
Hi,

The CPU and RAM are normal even on 300 jobs ( only 1-4% of consumption) but I don’t understand one thing here that if max_connections is set to 700 then why I am not able to connect the db.

As the running jobs (300) are lesser than half of max_connections.


Regards 
Atul


On Thursday, February 4, 2021, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2021-02-04 at 16:56 +0530, Atul Kumar wrote:
> I have 160 GB of RAM, postgres 9.6 is running on the server.
>
> after upgrade I ran the below command:
>
> "/usr/edb/as9.6/bin/vacuumdb" -U 'enterprisedb' --verbose --all -j 300
> --analyze-only
>
> after running that  command I was not able to connect the database
> using psql for few minutes.

That is to be expected.

If you have 300 processes performing I/O and using CPU, your machine
will vertainly be overloaded.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com


--
Angular momentum makes the world go 'round.

Re: vacuumdb not letting me connect to db

От
Ravi Krishna
Дата:
>The CPU and RAM are normal even on 300 jobs ( only 1-4% of consumption)
>but I don’t understand one thing here that if max_connections is set to 700 then
>why I am not able to connect the db. As the running jobs (300) are lesser than
>half of max_connections.

Please paste the error message




Re: vacuumdb not letting me connect to db

От
Atul Kumar
Дата:
There is no error message, when I try to connect the database while running vacuumdb with 300 jobs, it gets stuck.

On Thursday, February 4, 2021, Ravi Krishna <rkrishna_pg@aol.com> wrote:
>The CPU and RAM are normal even on 300 jobs ( only 1-4% of consumption)
>but I don’t understand one thing here that if max_connections is set to 700 then
>why I am not able to connect the db. As the running jobs (300) are lesser than
>half of max_connections.

Please paste the error message

Re: vacuumdb not letting me connect to db

От
Atul Kumar
Дата:
There is no error message, when I try to connect the database while running vacuumdb with 300 jobs, it gets stuck.

On Thursday, February 4, 2021, Ron <ronljohnsonjr@gmail.com> wrote:
On 2/4/21 5:26 AM, Atul Kumar wrote:
Hi,

I have 160 GB of RAM, postgres 9.6 is running on the server.

after upgrade I ran the below command:

"/usr/edb/as9.6/bin/vacuumdb" -U 'enterprisedb' --verbose --all -j 300
--analyze-only

after running that  command I was not able to connect the database
using psql for few minutes.

What's the exact error message?

After 20-30 minutes i was able to connect to the db and at that time I
checked the pg_stst_activity, the active connections was reduced to
27.

my max_connections is set to 700.

I tried to find out the reason for not being abled to connect the db
(when 300 jobs were running) but still not got the answer. So

Connect to the cluster before running "vacuumdb -j300", and start looking at pg_stst_activity while vacuumdb is running.

--
Angular momentum makes the world go 'round.

Re: vacuumdb not letting me connect to db

От
Ron
Дата:
Your problem screams "IO saturation".

On 2/4/21 12:07 PM, Atul Kumar wrote:
There is no error message, when I try to connect the database while running vacuumdb with 300 jobs, it gets stuck.

On Thursday, February 4, 2021, Ravi Krishna <rkrishna_pg@aol.com> wrote:
>The CPU and RAM are normal even on 300 jobs ( only 1-4% of consumption)
>but I don’t understand one thing here that if max_connections is set to 700 then
>why I am not able to connect the db. As the running jobs (300) are lesser than
>half of max_connections.

Please paste the error message


--
Angular momentum makes the world go 'round.

Re: vacuumdb not letting me connect to db

От
Ravi Krishna
Дата:

>There is no error message, when I try to connect the database while 
>running vacuumdb with 300 jobs, it gets stuck.

But you mentioned max connection which now seems to be a red herring.
Based on your description, the impression I got is that you are getting
"sorry, too many clients already" error.

As others have pointed out, the db is probably saturated with I/O error.



Re: vacuumdb not letting me connect to db

От
Atul Kumar
Дата:
ok, How do I resolve it ?

Any suggestions ?











On 2/5/21, Ron <ronljohnsonjr@gmail.com> wrote:
> Your problem screams "IO saturation".
>
> On 2/4/21 12:07 PM, Atul Kumar wrote:
>> There is no error message, when I try to connect the database while
>> running vacuumdb with 300 jobs, it gets stuck.
>>
>> On Thursday, February 4, 2021, Ravi Krishna <rkrishna_pg@aol.com
>> <mailto:rkrishna_pg@aol.com>> wrote:
>>
>>     >The CPU and RAM are normal even on 300 jobs ( only 1-4% of
>> consumption)
>>     >but I don’t understand one thing here that if max_connections is set
>>     to 700 then
>>     >why I am not able to connect the db. As the running jobs (300) are
>>     lesser than
>>     >half of max_connections.
>>
>>     Please paste the error message
>>
>
> --
> Angular momentum makes the world go 'round.
>



Re: vacuumdb not letting me connect to db

От
Ron
Дата:
Obviously... don't use 300 threads.

On 2/5/21 2:15 AM, Atul Kumar wrote:
> ok, How do I resolve it ?
>
> Any suggestions ?
>
> On 2/5/21, Ron <ronljohnsonjr@gmail.com> wrote:
>> Your problem screams "IO saturation".
>>
>> On 2/4/21 12:07 PM, Atul Kumar wrote:
>>> There is no error message, when I try to connect the database while
>>> running vacuumdb with 300 jobs, it gets stuck.
>>>
>>> On Thursday, February 4, 2021, Ravi Krishna <rkrishna_pg@aol.com
>>> <mailto:rkrishna_pg@aol.com>> wrote:
>>>
>>>      >The CPU and RAM are normal even on 300 jobs ( only 1-4% of
>>> consumption)
>>>      >but I don’t understand one thing here that if max_connections is set
>>>      to 700 then
>>>      >why I am not able to connect the db. As the running jobs (300) are
>>>      lesser than
>>>      >half of max_connections.
>>>
>>>      Please paste the error message
>>>
>> --
>> Angular momentum makes the world go 'round.
>>

-- 
Angular momentum makes the world go 'round.



Re: vacuumdb not letting me connect to db

От
Rob Sargent
Дата:

On 2/5/21 9:11 AM, Ron wrote:
> Obviously... don't use 300 threads.
> 
No, no Ron.  Clearly the answer is more CPUs



Re: vacuumdb not letting me connect to db

От
Ron
Дата:
On 2/5/21 10:22 AM, Rob Sargent wrote:
>
>
> On 2/5/21 9:11 AM, Ron wrote:
>> Obviously... don't use 300 threads.
>>
> No, no Ron.  Clearly the answer is more CPUs

I hope you're being sarcastic.

-- 
Angular momentum makes the world go 'round.



Re: vacuumdb not letting me connect to db

От
"Gavan Schneider"
Дата:
On 6 Feb 2021, at 3:37, Ron wrote:

> On 2/5/21 10:22 AM, Rob Sargent wrote:
>>
>>
>> On 2/5/21 9:11 AM, Ron wrote:
>>> Obviously... don't use 300 threads.
>>>
>> No, no Ron.  Clearly the answer is more CPUs
>
> I hope you're being sarcastic.
>
A reasonable conjecture… though there is the consideration that 300 
CPU intensive tasks spread across a given number of CPUs is going to 
waste some resources with context switching., i.e., need more CPUs :)

Basically if there is plenty of wait time for I/O completion then CPU 
task switching can get more total work done.  So far so obvious. In this 
thread I can see where it is disappointing to have a system considered 
capable of 700 connections getting saturated by a “mere” 300 
threads. But this is only a “problem” if connections are equated to 
threads. PG max connection count is about external users having access 
to resources needed to get a task done. Like all resource allocations 
this relies on estimated average usage, i.e., each connection only asks 
for a lot of CPU in brief bursts and then the result is transmitted with 
a time lag before the connection makes another CPU demand. The system 
designer should use estimations about usage and load to budget and 
configure the system, and, monitor it all against actual performance in 
the real world. Of course estimates are a standing request for outliers 
and the system will show stress under an unexpected load.

So far I have not seen an analysis of where the bottle neck has 
occurred: CPU RAM HD and/or the data bus connecting these. Some of these 
hardware resources maxed out to the extent the system would not 
immediately pick up an additional work unit. As I see it OP started 300 
CPU intensive tasks on hardware intended for 700 connections. If the 
connection count was designed with say 50% CPU intensive time per 
connection you would expect this hardware to be fully saturated with 300 
CPU intensive tasks. More than that, doing the task with 300 threads 
would probably take longer than (say) 200 threads as the increased CPU 
context swapping time is just wasted effort.

OP now has a choice: decrease threads or (seriously) upgrade the 
hardware. We in the gallery would love to see a plot of total time to 
completion as a function of threads invoked (50-300 increments of 50) 
assuming the starting conditions are the same :)

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong.
— H. L. Mencken, 1920



Re: vacuumdb not letting me connect to db

От
Atul Kumar
Дата:
Hi Gavan,

Thanks for providing the details, I need more clarification on this as how should I analyze that what should be ideal no. of connections should we set to avoid IO overhead based on the available hardware resources.

How to do this calculation ?

Note: even during 300 threads, my RAM utilisation is totally normal.


Regards 
Atul






On Saturday, February 6, 2021, Gavan Schneider <list.pg.gavan@pendari.org> wrote:
On 6 Feb 2021, at 3:37, Ron wrote:

On 2/5/21 10:22 AM, Rob Sargent wrote:


On 2/5/21 9:11 AM, Ron wrote:
Obviously... don't use 300 threads.

No, no Ron.  Clearly the answer is more CPUs

I hope you're being sarcastic.

A reasonable conjecture… though there is the consideration that 300 CPU intensive tasks spread across a given number of CPUs is going to waste some resources with context switching., i.e., need more CPUs :)

Basically if there is plenty of wait time for I/O completion then CPU task switching can get more total work done.  So far so obvious. In this thread I can see where it is disappointing to have a system considered capable of 700 connections getting saturated by a “mere” 300 threads. But this is only a “problem” if connections are equated to threads. PG max connection count is about external users having access to resources needed to get a task done. Like all resource allocations this relies on estimated average usage, i.e., each connection only asks for a lot of CPU in brief bursts and then the result is transmitted with a time lag before the connection makes another CPU demand. The system designer should use estimations about usage and load to budget and configure the system, and, monitor it all against actual performance in the real world. Of course estimates are a standing request for outliers and the system will show stress under an unexpected load.

So far I have not seen an analysis of where the bottle neck has occurred: CPU RAM HD and/or the data bus connecting these. Some of these hardware resources maxed out to the extent the system would not immediately pick up an additional work unit. As I see it OP started 300 CPU intensive tasks on hardware intended for 700 connections. If the connection count was designed with say 50% CPU intensive time per connection you would expect this hardware to be fully saturated with 300 CPU intensive tasks. More than that, doing the task with 300 threads would probably take longer than (say) 200 threads as the increased CPU context swapping time is just wasted effort.

OP now has a choice: decrease threads or (seriously) upgrade the hardware. We in the gallery would love to see a plot of total time to completion as a function of threads invoked (50-300 increments of 50) assuming the starting conditions are the same :)

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920

Re: vacuumdb not letting me connect to db

От
"Gavan Schneider"
Дата:
On 6 Feb 2021, at 23:06, Atul Kumar wrote:

> Thanks for providing the details, I need more clarification on this as 
> how
> should I analyze that what should be ideal no. of connections should 
> we set
> to avoid IO overhead based on the available hardware resources.
>
How to do this analysis properly is outside my expertise and likely off 
topic for this list. But you can get an overall idea doing what I 
suggested: measure the performance (i.e., total time to completion) when 
using different numbers of threads). In an ideal world the time will get 
less as you invoke more threads. But there will be a point where there 
is a bottle neck (or some resource hits its limit) and the time will be 
more or less the same no matter how many threads.

> How to do this calculation ?
>
That’s likely too hard. Measurement will tell you what you need to 
know. Once you know how many threads it takes to saturate you will know 
you need to use a lesser number. This may, or may not, get you to 
revisit the max 700 connections setting but once you know there is a 
performance limit, and you decide it’s not good enough, then you can 
ask the hardware people for help.

> Note: even during 300 threads, my RAM utilisation is totally normal.
>
  That’s not unexpected. The CPU activity should be using the data 
held in RAM not spending all those cycles allocating RAM.

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong.
— H. L. Mencken, 1920



Re: vacuumdb not letting me connect to db

От
Gmail
Дата:

> On Feb 5, 2021, at 9:37 AM, Ron <ronljohnsonjr@gmail.com> wrote:
>
>> On 2/5/21 10:22 AM, Rob Sargent wrote:
>>
>>
>>> On 2/5/21 9:11 AM, Ron wrote:
>>> Obviously... don't use 300 threads.
>>>
>> No, no Ron.  Clearly the answer is more CPUs
>
> I hope you're being sarcastic.
>
> --
> Angular momentum makes the world go 'round.
>
Ron, I was being sarcastic, but I’ve gone over the thread and I do not see any mention of the number of cores on the
server. If it has a single quad core chip then the queue for each processor could be on average 75 deep at the outset,
allof them anxiously awaiting data.  -j connections are processes, correct? Not threads. 
Atul, if you list the tools you know and have used for watching system performance and then others can perhaps suggest
alternatives(or re-interpret results) 




Re: vacuumdb not letting me connect to db

От
Ron
Дата:


On 2/6/21 6:06 AM, Atul Kumar wrote:
Hi Gavan,

Thanks for providing the details, I need more clarification on this as how should I analyze that what should be ideal no. of connections should we set to avoid IO overhead based on the available hardware resources.
How to do this calculation ?


Run "iotop -o -u postgres", and then compare that with the total bandwidth available to the system.  If it's (even almost) saturated, then everything else will be starved.

The "--jobs=" value should AT MOST be some *small* multiple of the number of CPUs (like 1x, 1.5x or maybe 2x if the core count is low, and nothing else is running on the system.

Note: even during 300 threads, my RAM utilisation is totally normal.


Regards 
Atul






On Saturday, February 6, 2021, Gavan Schneider <list.pg.gavan@pendari.org> wrote:
On 6 Feb 2021, at 3:37, Ron wrote:

On 2/5/21 10:22 AM, Rob Sargent wrote:


On 2/5/21 9:11 AM, Ron wrote:
Obviously... don't use 300 threads.

No, no Ron.  Clearly the answer is more CPUs

I hope you're being sarcastic.

A reasonable conjecture… though there is the consideration that 300 CPU intensive tasks spread across a given number of CPUs is going to waste some resources with context switching., i.e., need more CPUs :)

Basically if there is plenty of wait time for I/O completion then CPU task switching can get more total work done.  So far so obvious. In this thread I can see where it is disappointing to have a system considered capable of 700 connections getting saturated by a “mere” 300 threads. But this is only a “problem” if connections are equated to threads. PG max connection count is about external users having access to resources needed to get a task done. Like all resource allocations this relies on estimated average usage, i.e., each connection only asks for a lot of CPU in brief bursts and then the result is transmitted with a time lag before the connection makes another CPU demand. The system designer should use estimations about usage and load to budget and configure the system, and, monitor it all against actual performance in the real world. Of course estimates are a standing request for outliers and the system will show stress under an unexpected load.

So far I have not seen an analysis of where the bottle neck has occurred: CPU RAM HD and/or the data bus connecting these. Some of these hardware resources maxed out to the extent the system would not immediately pick up an additional work unit. As I see it OP started 300 CPU intensive tasks on hardware intended for 700 connections. If the connection count was designed with say 50% CPU intensive time per connection you would expect this hardware to be fully saturated with 300 CPU intensive tasks. More than that, doing the task with 300 threads would probably take longer than (say) 200 threads as the increased CPU context swapping time is just wasted effort.

OP now has a choice: decrease threads or (seriously) upgrade the hardware. We in the gallery would love to see a plot of total time to completion as a function of threads invoked (50-300 increments of 50) assuming the starting conditions are the same :)

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920

--
Angular momentum makes the world go 'round.

Re: vacuumdb not letting me connect to db

От
"Peter J. Holzer"
Дата:
On 2021-02-06 12:54:11 -0600, Ron wrote:
> On 2/6/21 6:06 AM, Atul Kumar wrote:
>
>     Hi Gavan,
>
>     Thanks for providing the details, I need more clarification on this as how
>     should I analyze that what should be ideal no. of connections should we set
>     to avoid IO overhead based on the available hardware resources.
>     How to do this calculation ?
>
>
>
> Run "iotop -o -u postgres", and then compare that with the total bandwidth
> available to the system.  If it's (even almost) saturated, then everything else
> will be starved.
>
> The "--jobs=" value should AT MOST be some *small* multiple of the number of
> CPUs (like 1x, 1.5x or maybe 2x if the core count is low, and nothing else is
> running on the system.

While this is generally good advice, Atul reported earlier in the thread
that CPU consumption is only 1 to 4 %. So the problem is clearly not
related to CPU usage, but very probably (as some have already noted) on
I/O. You could use a similar rule of thumb for disks ("a small multiple
of the number of disks") but with SSDs this might not be very accurate.

I think the only method to get to the optimal number of vacuums
that can be run in parallel is to determine it experimentally:
300 is clearly too high. So try 150, 75, 37, etc. For each number try to
do some other work - is the performance acceptable? If yes, note the
time until vacuum is finished. FInally among those where the performance
was acceptable choose the value which was fastest.

(Note: If you do this on the same database, subsequent runs will benefit
from work already done, so the take the results with a grain of salt).

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения