Postgres on Kubernetes/VMware

Поиск
Список
Период
Сортировка
От George Sexton
Тема Postgres on Kubernetes/VMware
Дата
Msg-id 82f63aa3-6d11-1361-1457-12829c05cf2a@mhsoftware.com
обсуждение исходный текст
Список pgsql-general

Everyone,

 I’ve run into an issue that’s got me stumped and I would be really grateful for any ideas. We’re deploying Postgres 11.8 on Kubernetes 1.17.9. The nodes are running RedHat EL 7.9 with the latest kernel for that distribution.

 We create a Kubernetes pod that runs Postgres. It’s setup with /pgdata going to a PersistentVolumeClaim that’s thin provisioned with a size set at 500GB. Initially, the allocated size of the PersistentVolumeClaim’s corresponding disk file (VMware .VMDK) is around 12GB. However, the size of the .VMDK keeps increasing at roughly 150-300MB/Hour even though the size of the /pgdata folder as shown by df is stable around 230MB. To try to troubleshoot this, I set up Postgres to put the pg_wal folder on a different partition. That didn’t make any difference.

 Replication is turned on for this Postgres instance, and it was observed that the replica’s .VMDK file grew at a rate of 25MB/Hour.

 I’ve checked to make sure that the partition is mounted with the DISCARD option. I created a test program that would allocate disk space, and then free it. I confirmed that when DISCARD is used for mount, VMware will reclaim the space as expected. For example, if the .VMDK grows to 20GB during a test, it shrinks back down to some size like 1GB when the test completes.

 To try to sort this out, I’ve done the following:

  1. Written the test program and confirmed that on the ESXI Host/VCenter space reclamation works.
  2. Moved the pg_wal directory to another partition.
  3. Changed Auto-Vacuum frequency to run less often (12 hours).
  4. Executed “vacuum full”, followed by fstrim </mount point> from the Kubernetes node. This freed perhaps 12 MB from the /pgdata folder, and around 19MB from the .VMDK size.
  5. Checked the /proc/fd descriptors for the running connections and confirmed no unusual temp files are opened.
  6. Confirmed the replication is working as expected.
  7. Set archive_mode off to eliminate that as a source of noise.

 Does anyone have any ideas about what’s causing this? Is there anything unusual about how Postgres allocates temporary data files or frees them? I’m really just grasping. Thanks for looking!

  George

В списке pgsql-general по дате отправления:

Предыдущее
От: Devrim Gündüz
Дата:
Сообщение: Re: pg_upgrade from 12 to 13 failes with plpython2
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: pg_upgrade from 12 to 13 failes with plpython2