Обсуждение: Postgres 10.3 delete on partition table cannot execute, out of memory.

Поиск
Список
Период
Сортировка

Postgres 10.3 delete on partition table cannot execute, out of memory.

От
Mark Magnus
Дата:
Bug

Partitioned table has 7202 partitions. No partition contains more than 50 records. Partitioning is done on a foreign key. Any delete operation i.e.
"delete from contacts where id = ?" or
"delete from contacts where id = ? and account_id = ?" or
"delete from contacts where account_id = ?"
results in out of memory condition.

Default Postgres Configuration with exception
max_locks_per_transaction = 1024

Postgres Logs

2018-03-15 14:26:40.340 AEDT [7120] LOG:  server process (PID 8177) was terminated by signal 9: Killed
2018-03-15 14:26:40.340 AEDT [7120] DETAIL:  Failed process was running: delete from contacts where id = 82398 and account_id = 9000
2018-03-15 14:26:40.354 AEDT [7120] LOG:  terminating any other active server processes
2018-03-15 14:26:40.367 AEDT [3821] WARNING:  terminating connection because of crash of another server process
2018-03-15 14:26:40.367 AEDT [3821] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.367 AEDT [3821] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres WARNING:  terminating connection because of crash of another server process
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.369 AEDT [7726] mark@postgres HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development WARNING:  terminating connection because of crash of another server process
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2018-03-15 14:26:40.569 AEDT [7120] LOG:  all server processes terminated; reinitializing
2018-03-15 14:26:40.639 AEDT [9244] LOG:  database system was interrupted; last known up at 2018-03-15 13:08:47 AEDT
2018-03-15 14:26:41.745 AEDT [9251] mark@postgres FATAL:  the database system is in recovery mode
2018-03-15 14:26:41.746 AEDT [9252] mark@postgres FATAL:  the database system is in recovery mode
2018-03-15 14:26:44.778 AEDT [9244] LOG:  database system was not properly shut down; automatic recovery in progress
2018-03-15 14:26:44.798 AEDT [9244] LOG:  redo starts at 0/56782CE0
2018-03-15 14:26:44.798 AEDT [9244] LOG:  invalid record length at 0/56782D18: wanted 24, got 0
2018-03-15 14:26:44.798 AEDT [9244] LOG:  redo done at 0/56782CE0
2018-03-15 14:26:44.870 AEDT [7120] LOG:  database system is ready to accept connections


System

ubuntu-16.4 LTS

mark-thinkpad-x230       
    description: Notebook
    product: 2306CTO (LENOVO_MT_2306)
    vendor: LENOVO
    version: ThinkPad X230
    serial: PK12NFA
    width: 64 bits
    capabilities: smbios-2.7 dmi-2.7 vsyscall32
    configuration: administrator_password=disabled chassis=notebook family=ThinkPad X230 power-on_password=disabled sku=LENOVO_MT_2306 uuid=814592E2-9652-CB11-A65F-83069E2A672A
  *-core
       description: Motherboard
       product: 2306CTO
       vendor: LENOVO
       physical id: 0
       version: Not Defined
       serial: 1ZST039410C
       slot: Not Available
     *-cpu
          description: CPU
          product: Intel(R) Core(TM) i3-3130M CPU @ 2.60GHz
          vendor: Intel Corp.
          physical id: 1
          bus info: cpu@0
          version: Intel(R) Core(TM) i3-3130M CPU @ 2.60GHz
          serial: None
          slot: CPU Socket - U3E1
          size: 2208MHz
          capacity: 2600MHz
          width: 64 bits
          clock: 100MHz
          capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c lahf_lm epb retpoline kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts cpufreq
          configuration: cores=2 enabledcores=2 threads=4

P.S. Please feel free to contact me for more information.

Regards

Mark Magnus
Software Developer 

Impact Data Pty Ltd
Data Driven. Direct. Digital.

a:  Level 1/428 Little Burke Street, Melbourne, VIC 3000

Re: Postgres 10.3 delete on partition table cannot execute, out ofmemory.

От
Fabio Pardi
Дата:
Hi Mark,


To me it looks like OOM was triggered.

Could you post to us a bit more info?

like how much RAM on the machine, Postgres memory settings (like work_mem shared buffers maintenance_work_mem ,
max_connections),what was happening on the DB at the time of the incident and linux logs (if OOM was triggered, then is
likelyyou will find something like that), please?
 

Regards,

Fabio Pardi

On 03/15/2018 05:15 AM, Mark Magnus wrote:
> Bug
> 
> Partitioned table has 7202 partitions. No partition contains more than 50 records. Partitioning is done on a foreign
key.Any delete operation i.e.
 
> "delete from contacts where id = ?" or
> "delete from contacts where id = ? and account_id = ?" or
> "delete from contacts where account_id = ?"
> results in out of memory condition.
> 
> Default Postgres Configuration with exception
> max_locks_per_transaction = 1024
> 
> Postgres Logs
> 
> 2018-03-15 14:26:40.340 AEDT [7120] LOG:  server process (PID 8177) was terminated by signal 9: Killed
> 2018-03-15 14:26:40.340 AEDT [7120] DETAIL:  Failed process was running: delete from contacts where id = 82398 and
account_id= 9000
 
> 2018-03-15 14:26:40.354 AEDT [7120] LOG:  terminating any other active server processes
> 2018-03-15 14:26:40.367 AEDT [3821] WARNING:  terminating connection because of crash of another server process
> 2018-03-15 14:26:40.367 AEDT [3821] DETAIL:  The postmaster has commanded this server process to roll back the
currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
 
> 2018-03-15 14:26:40.367 AEDT [3821] HINT:  In a moment you should be able to reconnect to the database and repeat
yourcommand.
 
> 2018-03-15 14:26:40.369 AEDT [7726] mark@postgres WARNING:  terminating connection because of crash of another server
process
> 2018-03-15 14:26:40.369 AEDT [7726] mark@postgres DETAIL:  The postmaster has commanded this server process to roll
backthe current transaction and exit, because another server process exited abnormally and possibly corrupted shared
memory.
> 2018-03-15 14:26:40.369 AEDT [7726] mark@postgres HINT:  In a moment you should be able to reconnect to the database
andrepeat your command.
 
> 2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development WARNING:  terminating connection because of crash
ofanother server process
 
> 2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development DETAIL:  The postmaster has commanded this server
processto roll back the current transaction and exit, because another server process exited abnormally and possibly
corruptedshared memory.
 
> 2018-03-15 14:26:40.392 AEDT [7749] mark@partitioning_development HINT:  In a moment you should be able to reconnect
tothe database and repeat your command.
 
> 2018-03-15 14:26:40.569 AEDT [7120] LOG:  all server processes terminated; reinitializing
> 2018-03-15 14:26:40.639 AEDT [9244] LOG:  database system was interrupted; last known up at 2018-03-15 13:08:47 AEDT
> 2018-03-15 14:26:41.745 AEDT [9251] mark@postgres FATAL:  the database system is in recovery mode
> 2018-03-15 14:26:41.746 AEDT [9252] mark@postgres FATAL:  the database system is in recovery mode
> 2018-03-15 14:26:44.778 AEDT [9244] LOG:  database system was not properly shut down; automatic recovery in progress
> 2018-03-15 14:26:44.798 AEDT [9244] LOG:  redo starts at 0/56782CE0
> 2018-03-15 14:26:44.798 AEDT [9244] LOG:  invalid record length at 0/56782D18: wanted 24, got 0
> 2018-03-15 14:26:44.798 AEDT [9244] LOG:  redo done at 0/56782CE0
> 2018-03-15 14:26:44.870 AEDT [7120] LOG:  database system is ready to accept connections
> 
> 
> System
> 
> ubuntu-16.4 LTS
> 
> mark-thinkpad-x230       
>     description: Notebook
>     product: 2306CTO (LENOVO_MT_2306)
>     vendor: LENOVO
>     version: ThinkPad X230
>     serial: PK12NFA
>     width: 64 bits
>     capabilities: smbios-2.7 dmi-2.7 vsyscall32
>     configuration: administrator_password=disabled chassis=notebook family=ThinkPad X230 power-on_password=disabled
sku=LENOVO_MT_2306uuid=814592E2-9652-CB11-A65F-83069E2A672A
 
>   *-core
>        description: Motherboard
>        product: 2306CTO
>        vendor: LENOVO
>        physical id: 0
>        version: Not Defined
>        serial: 1ZST039410C
>        slot: Not Available
>      *-cpu
>           description: CPU
>           product: Intel(R) Core(TM) i3-3130M CPU @ 2.60GHz
>           vendor: Intel Corp.
>           physical id: 1
>           bus info: cpu@0
>           version: Intel(R) Core(TM) i3-3130M CPU @ 2.60GHz
>           serial: None
>           slot: CPU Socket - U3E1
>           size: 2208MHz
>           capacity: 2600MHz
>           width: 64 bits
>           clock: 100MHz
>           capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl
xtopologynonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid
sse4_1sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c lahf_lm epb retpoline kaiser tpr_shadow vnmi flexpriority
eptvpid fsgsbase smep erms xsaveopt dtherm arat pln pts cpufreq
 
>           configuration: cores=2 enabledcores=2 threads=4
> 
> P.S. Please feel free to contact me for more information.
> 
> Regards
> 
> Mark Magnus
> Software Developer 
> 
> Impact Data Pty Ltd
> Data Driven. Direct. Digital.
> 
> a:  Level 1/428 Little Burke Street, Melbourne, VIC 3000
> e:  mark.magnus@impactdata.com.au <mailto:jay@impactdata.com.au>
> w:  www.impactdata.com.au <http://www.impactdata.com.au/>


Re: Postgres 10.3 delete on partition table cannot execute, out ofmemory.

От
Amit Langote
Дата:
Hi.

On 2018/03/15 13:15, Mark Magnus wrote:
> Bug
> 
> Partitioned table has 7202 partitions. No partition contains more than 50
> records. Partitioning is done on a foreign key. Any delete operation i.e.
> "delete from contacts where id = ?" or
> "delete from contacts where id = ? and account_id = ?" or
> "delete from contacts where account_id = ?"
> results in out of memory condition.
> 
> Default Postgres Configuration with exception
> max_locks_per_transaction = 1024

As Fabio says, it seems most likely that OOM was triggered while planning
the delete operation.  I can reproduce OOM being triggered on my modest
development machine, so perhaps that's what's happening in your case too.

This is unfortunately expected, given that the underlying planning
mechanism cannot cope beyond a few hundred partitions. :-(  See a relevant
note in the documentation; last line of the page at this link:
https://www.postgresql.org/docs/devel/static/ddl-partitioning.html.

Until things improve in that area, one workaround might be to perform the
delete operation directly on the partition, as it's possible to do that.
Or redesign your schema to use less number of partitions.

Thanks,
Amit