Re: System load consideration before spawning parallel workers

Поиск
Список
Период
Сортировка
От Haribabu Kommi
Тема Re: System load consideration before spawning parallel workers
Дата
Msg-id CAJrrPGdDzDgU8XgDJW1BvgZj72DcHVy3PdH5Ya-z4_7TGLW_6A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: System load consideration before spawning parallel workers  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Ответы Re: System load consideration before spawning parallel workers  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Список pgsql-hackers


On Fri, Sep 2, 2016 at 3:01 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 8/16/16 3:39 AM, Haribabu Kommi wrote:
> Yes, we need to consider many parameters as a system load, not just only
> the CPU. Here I attached a POC patch that implements the CPU load
> calculation and decide the number of workers based on the available CPU
> load. The load calculation code is not an optimized one, there are many ways
> that can used to calculate the system load. This is just for an example.

I see a number of discussion points here:

We don't yet have enough field experience with the parallel query
facilities to know what kind of use patterns there are and what systems
for load management we need.  So I think building a highly specific
system like this seems premature.  We have settings to limit process
numbers, which seems OK as a start, and those knobs have worked
reasonably well in other areas (e.g., max connections, autovacuum).  We
might well want to enhance this area, but we'll need more experience and
information.

Yes, I agree that parallel query is a new feature and we cannot decide it's 
affect now itself.

 
If we think that checking the CPU load is a useful way to manage process
resources, why not apply this to more kinds of processes?  I could
imagine that limiting connections by load could be useful.  Parallel
workers is only one specific niche of this problem.

Yes, I agree that parallel is only one problem.

How about Postmater calculates the CPU and etc load on the system and
update it in a shared location where every backend can access the details.
Using that, we can decide what operations to control. Using some GUC
specified interval, Postmater updates the system load, so this will not affect
the performance of other backends.

 
As I just wrote in another message in this thread, I don't trust system
load metrics very much as a gatekeeper.  They are reasonable for
long-term charting to discover trends, but there are numerous potential
problems for using them for this kind of resource control thing.

All of this seems very platform specific, too.  You have
Windows-specific code, but the rest seems very Linux-specific.  The
dstat tool I had never heard of before.  There is stuff with cgroups,
which I don't know how portable they are across different Linux
installations.  Something about Solaris was mentioned.  What about the
rest?  How can we maintain this in the long term?  How do we know that
these facilities actually work correctly and not cause mysterious problems?

The CPU load calculation patch is a POC patch, i didn't evaluate it's behavior
in all platforms.

 
Maybe a couple of hooks could be useful to allow people to experiment
with this.  But the hooks should be more general, as described above.
But I think a few GUC settings that can be adjusted at run time could be
sufficient as well.

With the GUC settings of parallel it is possible to control the behavior where
it improves the performance because of more parallel workers when there is
very less load on the system. In case if the system load increases and use of
more parallel workers can add the overhead instead of improvement to existing
current behavior when the load is high.

In such cases, the number of parallel workers needs to be reduced with change
in GUC settings. Instead of that, I just thought, how about if we do the same
automatically.


Regards,
Hari Babu
Fujitsu Australia

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Fetter
Дата:
Сообщение: Re: Suggestions for first contribution?
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Stopping logical replication protocol