Re: pg_restore out of memory

Поиск
Список
Период
Сортировка
От Felipe Santos
Тема Re: pg_restore out of memory
Дата
Msg-id CAPYcRiUmTBVp7QPtsYD0zpRWonfjqCoqjcV+auOp5SCZhp=zkA@mail.gmail.com
обсуждение исходный текст
Ответ на pg_restore out of memory  (Miguel Ramos <org.postgresql@miguel.ramos.name>)
Ответы Re: pg_restore out of memory  (Miguel Ramos <org.postgresql@miguel.ramos.name>)
Список pgsql-general


2016-07-12 8:25 GMT-03:00 Miguel Ramos <org.postgresql@miguel.ramos.name>:

Hi,

We have backed up a database and now when trying to restore it to the same server we get this:

> # pg_restore -d recovery /mnt/paysdeloire2013_convertida2.1.dump
> pg_restore: [custom archiver] out of memory
> 12:09:56.58  9446.593u+1218.508s 24.3%  167+2589k  6+0io  0pf+0sw 6968822cs


Some information about the application:

- We have sensor data, including pictures, and number crunshing output, then so the large tables on this database have 319, 279, 111 and 26GB. Mostly on TOAST pages, but the 279GB one divides it evenly. This database is 765GB. We try to keep them under 4TB.
- Transactions are large, some 100 MB at a time.
- We also use PostGIS.

About the server (dedicated):

- FreeBSD 9.1-RELEASE #0 on AMD64
- 16 GB of RAM
- 8x3GB hardware RAID 10
- 10TB slice for pgdata UFS-formatted and 32kB block
- PostgreSQL 9.1.8 custom compiled to get 32kB blocks
- Installed in 2013 with ~10 people working with it, 145 days uptime today.

I found two relevant threads on the mailing-lists.
The most recent one sugested that postgresql was being configured to use more memory than what's available.
The older one sugested that the system limits on the size of the data or stack segments were lower than required.

So here are some server parameters (relevant or otherwise):

> max_connections = 100
> shared_buffers = 4GB  -- 25% of RAM
> temp_buffers = 32MB  -- irrelevant?
> work_mem = 64MB
> maintenance_work_mem = was 1G lowered to 256M then 64M
> wal_buffers = -1  -- should mean 1/32 of shared_buffers = 128MB
> checkpoint_segments = 64  -- WAL segments are 16MB
> effective_cache_size = 8GB  -- irrelevant?


I suspect that the restore fails when constructing the indices. After the process is aborted, the data appears to be all or most there, but no indices.
So, all I did so far, was lowering maintenance_work_mem and it didn't work.

System limits, as you can see, are at defaults (32GB for data and 512MB for stack):

> # limit
> cputime      unlimited
> filesize     unlimited
> datasize     33554432 kbytes
> stacksize    524288 kbytes
> coredumpsize unlimited
> memoryuse    unlimited
> vmemoryuse   unlimited
> descriptors  11095
> memorylocked unlimited
> maxproc      5547
> sbsize       unlimited
> swapsize     unlimited

Shared memory is configured to allow for the single shared memory segment postgresql appears to use, plus a bit of extra (8GB):

> # ipcs -M
> shminfo:
>         shmmax:   8589934592    (max shared memory segment size)
>         shmmin:            1    (min shared memory segment size)
>         shmmni:          192    (max number of shared memory identifiers)
>         shmseg:          128    (max shared memory segments per process)
>         shmall:      2097152    (max amount of shared memory in pages)

And semaphores (irrelevant?)...

> # ipcs -S
> seminfo:
>         semmni:          256    (# of semaphore identifiers)
>         semmns:          512    (# of semaphores in system)
>         semmnu:          256    (# of undo structures in system)
>         semmsl:          340    (max # of semaphores per id)
>         semopm:          100    (max # of operations per semop call)
>         semume:           50    (max # of undo entries per process)
>         semusz:          632    (size in bytes of undo structure)
>         semvmx:        32767    (semaphore maximum value)
>         semaem:        16384    (adjust on exit max value)


I don't know what else to try.
I lowered maintenance_work_mem without restarting the server.
In some of the attempts, but not all, the restore was done while people were working.

Each attempt takes 12 hours...
We couldn't use the directory -Fd dump because it's postgresql 9.1.
The original database is still on the server, this is a test restore.

We have about one or two months of slack before we really need to remove them from the server to recover space.


--
Miguel Ramos


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Hi Miguel,

I would try lowering max_connections to 50 and then set work_mem to 128MB.

After that restart your server and retry the restore.

Tell us if that helps.

Regards,

В списке pgsql-general по дате отправления:

Предыдущее
От: Miguel Ramos
Дата:
Сообщение: pg_restore out of memory
Следующее
От: David Rowley
Дата:
Сообщение: Re: Question about antijoin