Обсуждение: OT: Performance of VM
This is a bit off-topic, since it is not about the performance of PG itself. But maybe some have the same issue. We run PostgreSQL in virtual machines which get provided by our customer. We are not responsible for the hypervisor and have not access to it. The IO performance of our application was terrible slow yesterday. The users blamed us, but it seems that there was something wrong with the hypervisor. For the next time I would like to have reliable figures, to underline my guess that the hypervisor (and not our application) is the bottle neck. I have the vague strategy to make some io performance check every N minutes and record the numbers. Of course I could do some dirty scripting, but I would like to avoid to re-invent things. I guess this was already solved by people which have more brain and more experience than I have :-) What do you suggest to get some reliable figures? Regards, Thomas Güttler -- Thomas Guettler http://www.thomas-guettler.de/ I am looking for feedback: https://github.com/guettli/programming-guidelines
Am 05.02.2018 um 14:14 schrieb Thomas Güttler: > What do you suggest to get some reliable figures? sar is often recommended, see https://blog.2ndquadrant.com/in-the-defense-of-sar/. Can you exclude other reasons like vacuum / vacuum freeze? Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
Have them check the memory and CPU allocation of the hypervisor, make sure its not overallocated. Make sure the partitions for stroage are aligned (see here: https://blogs.vmware.com/vsphere/2011/08/guest-os-partition-alignment.html) . Install tuned, and enable the throughput performance profile. Oracle has a problem with transparent hugepages, postgres may well have the same problem, so consider disabling transparent hugepages. There is no reason why performance on a VM would be worse than performance on a physical server.
On Mon, Feb 5, 2018 at 7:26 AM, Andreas Kretschmer <andreas@a-kretschmer.de> wrote:
Am 05.02.2018 um 14:14 schrieb Thomas Güttler:What do you suggest to get some reliable figures?
sar is often recommended, see https://blog.2ndquadrant.com/in-the-defense-of-sar/.
Can you exclude other reasons like vacuum / vacuum freeze?
Regards, Andreas
--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com
--
Andrew W. Kerber
'If at first you dont succeed, dont take up skydiving.'
'If at first you dont succeed, dont take up skydiving.'
Am 05.02.2018 um 17:22 schrieb Andrew Kerber: > Oracle has a problem with transparent hugepages, postgres may well > have the same problem, so consider disabling transparent hugepages. yes, that's true. Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
Am 05.02.2018 um 14:26 schrieb Andreas Kretschmer: > > > Am 05.02.2018 um 14:14 schrieb Thomas Güttler: >> What do you suggest to get some reliable figures? > > sar is often recommended, see https://blog.2ndquadrant.com/in-the-defense-of-sar/. > > Can you exclude other reasons like vacuum / vacuum freeze? In the current case it was a problem in the hypervisor. But I want to be prepared for the next time. The tool sar looks good. This way I can generate a chart where I can see peaks. Nice. .... But one thing is still unclear. Imagine I see a peak in the chart. The peak was some hours ago. AFAIK sar has only the aggregated numbers. But I need to know details if I want to answer the question "Why?". The peak has gone and ps/top/iotop don't help me anymore. Any idea? Regards, Thomas Güttler -- Thomas Guettler http://www.thomas-guettler.de/ I am looking for feedback: https://github.com/guettli/programming-guidelines
On Tue, 2018-02-06 at 15:31 +0100, Thomas Güttler wrote:
.... But one thing is still unclear. Imagine I see a peak in the chart. The peak was some hours ago. AFAIK sar has only the aggregated numbers. But I need to know details if I want to answer the question "Why?". The peak has gone and ps/top/iotop don't help me anymore.
The typical solution is to store stats on everything you can think of with munin, cacti, ganglia, or similar systems.
I know with ganglia at least, in addition to all the many details it already tracks on a system and the many plugins already available for it, you can write your own plugins or simple agents, so you can keep stats on anything you can code around.
Munin's probably the easiest to try out, though.
On Mon, Feb 5, 2018 at 5:22 PM, Andrew Kerber <andrew.kerber@gmail.com> wrote: > Have them check the memory and CPU allocation of the hypervisor, make sure > its not overallocated. Make sure the partitions for stroage are aligned (see > here: > https://blogs.vmware.com/vsphere/2011/08/guest-os-partition-alignment.html) > . Install tuned, and enable the throughput performance profile. Oracle has a > problem with transparent hugepages, postgres may well have the same problem, > so consider disabling transparent hugepages. There is no reason why > performance on a VM would be worse than performance on a physical server. Not theoretically. But in practice if you have anything run in a VM like in this case you do not know what else is working on that box. Analyzing these issues can be really cumbersome and tricky. This is why I am generally skeptical of running a resource intensive application like a RDBMS in a VM. To get halfway predictable results you want at least a minimum of resources (CPU, memory, IO bandwidth) reserved for that VM. Anecdote: we once had a customer run our application in a VM (which is supported) and complain about slowness. Eventually we found out that they over committed memory - not in sum for all VMs which is common, but this single VM had been configured to have more memory than was physically available in the machine. Kind regards robert -- [guy, jim, charlie].each {|him| remember.him do |as, often| as.you_can - without end} http://blog.rubybestpractices.com/
I am consultant that specializes in virtualizing oracle enterprise level workloads. I’m picking up Postgres as a secondaryskill. You are right if you don’t manage it properly, you can have problems running enterprise workloads on vms. But it can be done with proper management. And the HA and DR advantages of virtual systems are huge. Sent from my iPhone > On Feb 10, 2018, at 5:20 AM, Robert Klemme <shortcutter@googlemail.com> wrote: > >> On Mon, Feb 5, 2018 at 5:22 PM, Andrew Kerber <andrew.kerber@gmail.com> wrote: >> Have them check the memory and CPU allocation of the hypervisor, make sure >> its not overallocated. Make sure the partitions for stroage are aligned (see >> here: >> https://blogs.vmware.com/vsphere/2011/08/guest-os-partition-alignment.html) >> . Install tuned, and enable the throughput performance profile. Oracle has a >> problem with transparent hugepages, postgres may well have the same problem, >> so consider disabling transparent hugepages. There is no reason why >> performance on a VM would be worse than performance on a physical server. > > Not theoretically. But in practice if you have anything run in a VM > like in this case you do not know what else is working on that box. > Analyzing these issues can be really cumbersome and tricky. This is > why I am generally skeptical of running a resource intensive > application like a RDBMS in a VM. To get halfway predictable results > you want at least a minimum of resources (CPU, memory, IO bandwidth) > reserved for that VM. > > Anecdote: we once had a customer run our application in a VM (which is > supported) and complain about slowness. Eventually we found out that > they over committed memory - not in sum for all VMs which is common, > but this single VM had been configured to have more memory than was > physically available in the machine. > > Kind regards > > robert > > -- > [guy, jim, charlie].each {|him| remember.him do |as, often| as.you_can > - without end} > http://blog.rubybestpractices.com/
Am 06.02.2018 um 15:31 schrieb Thomas Güttler: > > > Am 05.02.2018 um 14:26 schrieb Andreas Kretschmer: >> >> >> Am 05.02.2018 um 14:14 schrieb Thomas Güttler: >>> What do you suggest to get some reliable figures? >> >> sar is often recommended, see >> https://blog.2ndquadrant.com/in-the-defense-of-sar/. >> >> Can you exclude other reasons like vacuum / vacuum freeze? > > In the current case it was a problem in the hypervisor. > > But I want to be prepared for the next time. > > The tool sar looks good. This way I can generate a chart where I can see > peaks. Nice. > > .... But one thing is still unclear. Imagine I see a peak in the chart. > The peak > was some hours ago. AFAIK sar has only the aggregated numbers. > > But I need to know details if I want to answer the question "Why?". The > peak > has gone and ps/top/iotop don't help me anymore. > > Any idea? I love atop (atoptool.nl) for exactly that kind of situation. It will save a snapshot every 10 minutes by default, which you can then simply "scroll" back to. Helped me pinpointing nightly issues countless times. Only really available for Linux though (in case you're on *BSD). Best regards, -- Gunnar "Nick" Bluth RHCE/SCLA Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de _____________________________________________________________ In 1984 mainstream users were choosing VMS over UNIX. Ten years later they are choosing Windows over UNIX. What part of that message aren't you getting? - Tom Payne
Вложения
+1 for atop. Be sure to adjust the sampling interval so it suits your needs. It'll tell you what caused the spike.
Alternatively you could probably use sysdig, but I expect that'd result in a fair performance hit if your system is already struggling.
Micky
On 14 February 2018 at 08:15, Gunnar "Nick" Bluth <gunnar.bluth@pro-open.de> wrote:
Am 06.02.2018 um 15:31 schrieb Thomas Güttler:
>
>
> Am 05.02.2018 um 14:26 schrieb Andreas Kretschmer:
>>
>>
>> Am 05.02.2018 um 14:14 schrieb Thomas Güttler:
>>> What do you suggest to get some reliable figures?
>>
>> sar is often recommended, see
>> https://blog.2ndquadrant.com/in-the-defense-of-sar/.
>>
>> Can you exclude other reasons like vacuum / vacuum freeze?
>
> In the current case it was a problem in the hypervisor.
>
> But I want to be prepared for the next time.
>
> The tool sar looks good. This way I can generate a chart where I can see
> peaks. Nice.
>
> .... But one thing is still unclear. Imagine I see a peak in the chart.
> The peak
> was some hours ago. AFAIK sar has only the aggregated numbers.
>
> But I need to know details if I want to answer the question "Why?". The
> peak
> has gone and ps/top/iotop don't help me anymore.
>
> Any idea?
I love atop (atoptool.nl) for exactly that kind of situation. It will
save a snapshot every 10 minutes by default, which you can then simply
"scroll" back to. Helped me pinpointing nightly issues countless times.
Only really available for Linux though (in case you're on *BSD).
Best regards,
--
Gunnar "Nick" Bluth
RHCE/SCLA
Mobil +49 172 8853339
Email: gunnar.bluth@pro-open.de
____________________________________________________________ _
In 1984 mainstream users were choosing VMS over UNIX.
Ten years later they are choosing Windows over UNIX.
What part of that message aren't you getting? - Tom Payne
On 11/02/18 00:20, Robert Klemme wrote: > On Mon, Feb 5, 2018 at 5:22 PM, Andrew Kerber <andrew.kerber@gmail.com> wrote: >> Have them check the memory and CPU allocation of the hypervisor, make sure >> its not overallocated. Make sure the partitions for stroage are aligned (see >> here: >> https://blogs.vmware.com/vsphere/2011/08/guest-os-partition-alignment.html) >> . Install tuned, and enable the throughput performance profile. Oracle has a >> problem with transparent hugepages, postgres may well have the same problem, >> so consider disabling transparent hugepages. There is no reason why >> performance on a VM would be worse than performance on a physical server. > Not theoretically. But in practice if you have anything run in a VM > like in this case you do not know what else is working on that box. > Analyzing these issues can be really cumbersome and tricky. This is > why I am generally skeptical of running a resource intensive > application like a RDBMS in a VM. To get halfway predictable results > you want at least a minimum of resources (CPU, memory, IO bandwidth) > reserved for that VM. > > Anecdote: we once had a customer run our application in a VM (which is > supported) and complain about slowness. Eventually we found out that > they over committed memory - not in sum for all VMs which is common, > but this single VM had been configured to have more memory than was > physically available in the machine. > Agreed. If you can get the IO layer to have some type of guaranteed performance (e.g AWS Provisioned IOPS), then that is a big help. However (as you say above) debugging memory and cpu contention (from within the guest) is tricky indeed. Anecdote: concluded VM needed more cpu, so went to 8 to 16 - performance got significantly *worse*. We prevailed on the devops guys (this was *not* AWS) to migrate the VM is a less busy host. Everything was fine thereafter. regards Mark