Обсуждение: 8.3rc1 Out of memory when performing update

Поиск

Список

Период

Сортировка

8.3rc1 Out of memory when performing update

От

"Stephen Denne"

Дата:

24 января 2008 г., 23:32:52

A simple update query, over roughly 17 million rows, populating a newly added column in a table, resulted in an out of
memoryerror when the process memory usage reached 2GB. Could this be due to a poor choice of some configuration
parameter,or is there a limit on how many rows I can update in a single statement?
 

Log:
...
2008-01-25 09:42:08.119 NZDT [3432]: [1-1] LOG:  checkpoint starting: time
2008-01-25 09:42:08.697 NZDT [3432]: [2-1] LOG:  checkpoint complete: wrote 2 buffers (0.0%); 0 transaction log file(s)
added,0 removed, 0 recycled; write=0.218 s, sync=0.047 s, total=0.578 s
 
...
2008-01-25 10:44:49.011 NZDT [3824]: [1-1] LOG:  connection received: host=(removed) port=3207
2008-01-25 10:44:49.042 NZDT [3824]: [2-1] LOG:  connection authorized: user=postgres database=(removed)
2008-01-25 10:52:08.204 NZDT [3432]: [3-1] LOG:  checkpoint starting: time
2008-01-25 10:52:39.673 NZDT [3432]: [4-1] LOG:  checkpoint complete: wrote 275 buffers (6.7%); 1 transaction log
file(s)added, 0 removed, 0 recycled; write=27.078 s, sync=1.485 s, total=31.407 s
 
2008-01-25 11:02:08.055 NZDT [3432]: [5-1] LOG:  checkpoint starting: time
2008-01-25 11:02:32.759 NZDT [3432]: [6-1] LOG:  checkpoint complete: wrote 222 buffers (5.4%); 0 transaction log
file(s)added, 0 removed, 69 recycled; write=22.766 s, sync=0.968 s, total=24.704 s
 
2008-01-25 11:12:08.344 NZDT [3432]: [7-1] LOG:  checkpoint starting: time
2008-01-25 11:12:38.423 NZDT [3432]: [8-1] LOG:  checkpoint complete: wrote 268 buffers (6.5%); 0 transaction log
file(s)added, 0 removed, 77 recycled; write=27.875 s, sync=1.312 s, total=30.094 s
 
2008-01-25 11:22:08.088 NZDT [3432]: [9-1] LOG:  checkpoint starting: time
2008-01-25 11:22:29.526 NZDT [3432]: [10-1] LOG:  checkpoint complete: wrote 188 buffers (4.6%); 0 transaction log
file(s)added, 0 removed, 48 recycled; write=18.155 s, sync=1.391 s, total=21.312 s
 
2008-01-25 11:32:08.362 NZDT [3432]: [11-1] LOG:  checkpoint starting: time
2008-01-25 11:33:21.706 NZDT [3432]: [12-1] LOG:  checkpoint complete: wrote 672 buffers (16.4%); 0 transaction log
file(s)added, 0 removed, 59 recycled; write=70.423 s, sync=1.562 s, total=73.375 s
 
2008-01-25 11:42:08.244 NZDT [3432]: [13-1] LOG:  checkpoint starting: time
2008-01-25 11:42:27.010 NZDT [3432]: [14-1] LOG:  checkpoint complete: wrote 175 buffers (4.3%); 0 transaction log
file(s)added, 0 removed, 51 recycled; write=17.077 s, sync=1.204 s, total=18.813 s
 
2008-01-25 11:52:08.299 NZDT [3432]: [15-1] LOG:  checkpoint starting: time
2008-01-25 11:52:33.627 NZDT [3432]: [16-1] LOG:  checkpoint complete: wrote 233 buffers (5.7%); 0 transaction log
file(s)added, 0 removed, 64 recycled; write=23.328 s, sync=1.468 s, total=25.391 s
 
TopMemoryContext: 49816 total in 6 blocks; 5656 free (6 chunks); 44160 used
  RI compare cache: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used
  RI query cache: 8192 total in 1 blocks; 5968 free (0 chunks); 2224 used
  TopTransactionContext: 8192 total in 1 blocks; 7792 free (0 chunks); 400 used
  Operator class cache: 8192 total in 1 blocks; 3848 free (0 chunks); 4344 used
  Operator lookup cache: 24576 total in 2 blocks; 14072 free (6 chunks); 10504 used
  MessageContext: 40960 total in 3 blocks; 19960 free (5 chunks); 21000 used
  smgr relation table: 8192 total in 1 blocks; 2808 free (0 chunks); 5384 used
  TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used
  Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
  PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
    PortalHeapMemory: 1024 total in 1 blocks; 760 free (0 chunks); 264 used
      ExecutorState: 2044715008 total in 270 blocks; 21056 free (262 chunks); 2044693952 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks); 4816 used
  CacheMemoryContext: 667472 total in 20 blocks; 182800 free (1 chunks); 484672 used
  location_ix: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
...    [Cut 58 indexes with very similar lines to the above, to save space]
  MdSmgr: 8192 total in 1 blocks; 7240 free (0 chunks); 952 used
  LOCALLOCK hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
  Timezones: 49432 total in 2 blocks; 5968 free (0 chunks); 43464 used
  ErrorContext: 8192 total in 1 blocks; 8176 free (3 chunks); 16 used
2008-01-25 11:53:10.315 NZDT [3824]: [3-1] ERROR:  out of memory
2008-01-25 11:53:10.362 NZDT [3824]: [4-1] DETAIL:  Failed on request of size 28.
2008-01-25 11:53:10.362 NZDT [3824]: [5-1] STATEMENT:  UPDATE document_file SET document_type_id = (SELECT
document_type_idFROM document d where d.id = document_id);
 
2008-01-25 12:00:53.571 NZDT [3604]: [1-1] LOG:  connection received: host=(removed) port=3399
2008-01-25 12:00:54.274 NZDT [3604]: [2-1] LOG:  connection authorized: user=postgres database=(removed)
2008-01-25 12:00:55.727 NZDT [3604]: [3-1] LOG:  duration: 1264.999 ms  statement: SET DateStyle=ISO;SELECT oid,
pg_encoding_to_char(encoding)AS encoding, datlastsysoid
 
      FROM pg_database WHERE oid = 16466
2008-01-25 12:02:08.322 NZDT [3432]: [17-1] LOG:  checkpoint starting: time
2008-01-25 12:07:03.591 NZDT [3432]: [18-1] LOG:  checkpoint complete: wrote 2784 buffers (68.0%); 0 transaction log
file(s)added, 0 removed, 92 recycled; write=292.488 s, sync=1.515 s, total=295.473 s
 
2008-01-25 12:10:07.031 NZDT [3604]: [4-1] LOG:  duration: 539646.999 ms  statement: select count(*) from
document_file;
2008-01-25 12:12:08.048 NZDT [3432]: [19-1] LOG:  checkpoint starting: time
2008-01-25 12:15:22.176 NZDT [3432]: [20-1] LOG:  checkpoint complete: wrote 949 buffers (23.2%); 0 transaction log
file(s)added, 0 removed, 8 recycled; write=193.097 s, sync=0.936 s, total=194.127 s
 

Environment:
OS: Windows XP
PostgreSQL: 8.3RC1

Non default Resource and WAL configuration settings:
shared_buffers = 32MB
max_fsm_pages = 204800
checkpoint_segments = 300
checkpoint_timeout = 10min

The previous query (not logged due to log_min_duration_statement = 500) had been:
ALTER TABLE document_file ADD document_type_id integer;

The query plan:
Seq Scan on document_file  (cost=0.00..280337907.00 rows=27619541 width=617)
  SubPlan
    ->  Index Scan using pk_document_id on document d  (cost=0.00..10.12 rows=1 width=4)
          Index Cond: (id = $0)

Stephen Denne

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 

__________________________________________________________________
  This email has been scanned by the DMZGlobal Business Quality
              Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

Re: 8.3rc1 Out of memory when performing update

От

Fernando Ike

Дата:

25 января 2008 г., 01:24:52

Em Fri, 25 Jan 2008 12:46:20 +1300
"Stephen Denne" <Stephen.Denne@datamail.co.nz> escreveu:

> A simple update query, over roughly 17 million rows, populating a
> newly added column in a table, resulted in an out of memory error
> when the process memory usage reached 2GB. Could this be due to a
> poor choice of some configuration parameter, or is there a limit on
> how many rows I can update in a single statement?
>

   I believe that it is plataform problem. Because on *nix this limit
don't occur. But I don't specialist Windows.



Kind Regards,
--
Fernando Ike
http://www.midstorm.org/~fike/weblog

Re: 8.3rc1 Out of memory when performing update

От

Tom Lane

Дата:

25 января 2008 г., 02:22:57

"Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:
> A simple update query, over roughly 17 million rows, populating a newly added column in a table, resulted in an out
ofmemory error when the process memory usage reached 2GB. Could this be due to a poor choice of some configuration
parameter,or is there a limit on how many rows I can update in a single statement? 

Do you have any triggers or foreign keys on that table?  For that
matter, let's see its whole schema definition.

            regards, tom lane

Re: 8.3rc1 Out of memory when performing update

От

"Stephen Denne"

Дата:

25 января 2008 г., 02:35:10

> "Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:
> > A simple update query, over roughly 17 million rows,
> > populating a newly added column in a table, resulted in an
> > out of memory error when the process memory usage reached
> > 2GB. Could this be due to a poor choice of some configuration
> > parameter, or is there a limit on how many rows I can update
> > in a single statement?
>
> Do you have any triggers or foreign keys on that table?  For that
> matter, let's see its whole schema definition.
>
>             regards, tom lane

No triggers on that table, one primary key, one foreign key, two indexes.
The foreign key references a primary key which is also an integer.
No other tables which reference document_file.
No inherited tables.
There are as many document_file rows as there are rows in the document table,
document_file.document_id is unique, though not constrained.
(Designed as a one to many relationship, but only ever used as one to one.)

I altered the update statement slightly, and reran the query.

I disabled autovacuum after a while and cancelled the autovacuum process that was trying to vacuum analyze
document_file.

The altered query has been running over 3 hours now,
without using lots of memory (38M private bytes).
2046 temp files were created (2.54GB worth),
which have recently changed from slowly growing in size
to very very slowly reducing in number.

Altered query that has not crashed:
UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id;

Hash Join  (cost=674810.80..6701669.63 rows=16972702 width=621)
  Hash Cond: (df.document_id = d.id)
  ->  Seq Scan on document_file df  (cost=0.00..750298.65 rows=27702365 width=617)
  ->  Hash  (cost=396352.02..396352.02 rows=16972702 width=8)
        ->  Seq Scan on document d  (cost=0.00..396352.02 rows=16972702 width=8)

c.f. original (re-explained):
UPDATE document_file SET document_type_id = (SELECT document_type_id FROM document d where d.id = document_id);

Seq Scan on document_file  (cost=0.00..281183329.64 rows=27702834 width=617)
  SubPlan
    ->  Index Scan using pk_document_id on document d  (cost=0.00..10.12 rows=1 width=4)
          Index Cond: (id = $0)

Schema as reported by pgadmin:

CREATE TABLE document_file
(
  id integer NOT NULL DEFAULT nextval(('document_file_seq'::text)::regclass),
  document_id integer NOT NULL,
  archive_directory_location character varying(255) NOT NULL,
  mime_type character varying(255),
  file_name character varying(255) NOT NULL,
  life_cycle_status character varying(255),
  version integer DEFAULT 0,
  is_current boolean DEFAULT true,
  file_size integer NOT NULL,
  document_type_id integer,
  CONSTRAINT pk_document_file_id PRIMARY KEY (id),
  CONSTRAINT fk_document_id FOREIGN KEY (document_id)
      REFERENCES document (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE
)
WITH (OIDS=FALSE);
ALTER TABLE document_file OWNER TO postgres;
GRANT ALL ON TABLE document_file TO postgres;
GRANT ALL ON TABLE document_file TO vapps;
GRANT ALL ON TABLE document_file TO vrconfig;

CREATE INDEX location_ix
  ON document_file
  USING btree
  (archive_directory_location);

CREATE INDEX tc_file_document
  ON document_file
  USING btree
  (document_id);

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 

__________________________________________________________________
  This email has been scanned by the DMZGlobal Business Quality
              Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

Re: 8.3rc1 Out of memory when performing update

От

Tom Lane

Дата:

25 января 2008 г., 03:50:35

"Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:
> I altered the update statement slightly, and reran the query.
> The altered query has been running over 3 hours now,
> without using lots of memory (38M private bytes).
> 2046 temp files were created (2.54GB worth),
> which have recently changed from slowly growing in size
> to very very slowly reducing in number.

Hmm.  I think what that really means is you haven't got to the part of
the query where the leak is :-(.  In my attempt to reproduce this
I found that 8.3 has introduced a memory leak into the RI trigger
support, such that even if an UPDATE doesn't change the FK columns
it's still likely to leak a few dozen bytes per updated row.

Please see if the attached patch makes it better for you.

            regards, tom lane
Index: ri_triggers.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/ri_triggers.c,v
retrieving revision 1.101
diff -c -r1.101 ri_triggers.c
*** ri_triggers.c    3 Jan 2008 21:23:15 -0000    1.101
--- ri_triggers.c    25 Jan 2008 04:45:56 -0000
***************
*** 3099,3104 ****
--- 3099,3106 ----
          elog(ERROR, "conkey is not a 1-D smallint array");
      riinfo->nkeys = numkeys;
      memcpy(riinfo->fk_attnums, ARR_DATA_PTR(arr), numkeys * sizeof(int16));
+     if ((Pointer) arr != DatumGetPointer(adatum))
+         pfree(arr);                /* free de-toasted copy, if any */

      adatum = SysCacheGetAttr(CONSTROID, tup,
                               Anum_pg_constraint_confkey, &isNull);
***************
*** 3113,3118 ****
--- 3115,3122 ----
          ARR_ELEMTYPE(arr) != INT2OID)
          elog(ERROR, "confkey is not a 1-D smallint array");
      memcpy(riinfo->pk_attnums, ARR_DATA_PTR(arr), numkeys * sizeof(int16));
+     if ((Pointer) arr != DatumGetPointer(adatum))
+         pfree(arr);                /* free de-toasted copy, if any */

      adatum = SysCacheGetAttr(CONSTROID, tup,
                               Anum_pg_constraint_conpfeqop, &isNull);
***************
*** 3127,3132 ****
--- 3131,3138 ----
          ARR_ELEMTYPE(arr) != OIDOID)
          elog(ERROR, "conpfeqop is not a 1-D Oid array");
      memcpy(riinfo->pf_eq_oprs, ARR_DATA_PTR(arr), numkeys * sizeof(Oid));
+     if ((Pointer) arr != DatumGetPointer(adatum))
+         pfree(arr);                /* free de-toasted copy, if any */

      adatum = SysCacheGetAttr(CONSTROID, tup,
                               Anum_pg_constraint_conppeqop, &isNull);
***************
*** 3141,3146 ****
--- 3147,3154 ----
          ARR_ELEMTYPE(arr) != OIDOID)
          elog(ERROR, "conppeqop is not a 1-D Oid array");
      memcpy(riinfo->pp_eq_oprs, ARR_DATA_PTR(arr), numkeys * sizeof(Oid));
+     if ((Pointer) arr != DatumGetPointer(adatum))
+         pfree(arr);                /* free de-toasted copy, if any */

      adatum = SysCacheGetAttr(CONSTROID, tup,
                               Anum_pg_constraint_conffeqop, &isNull);
***************
*** 3155,3160 ****
--- 3163,3170 ----
          ARR_ELEMTYPE(arr) != OIDOID)
          elog(ERROR, "conffeqop is not a 1-D Oid array");
      memcpy(riinfo->ff_eq_oprs, ARR_DATA_PTR(arr), numkeys * sizeof(Oid));
+     if ((Pointer) arr != DatumGetPointer(adatum))
+         pfree(arr);                /* free de-toasted copy, if any */

      ReleaseSysCache(tup);
  }

Re: 8.3rc1 Out of memory when performing update

От

"Stephen Denne"

Дата:

25 января 2008 г., 09:14:52

I don't have a PostgreSQL build environment.

It is now Friday night for me. I left the alternate query running, and will find out on Monday what happened.

If I drop the fk constraint, and/or its index, would I still be affected by the leak you found?

Regards,

Stephen Denne.

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Fri 25/01/2008 5:50 p.m.
To: Stephen Denne
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] 8.3rc1 Out of memory when performing update

"Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:
> I altered the update statement slightly, and reran the query.
> The altered query has been running over 3 hours now,
> without using lots of memory (38M private bytes).
> 2046 temp files were created (2.54GB worth),
> which have recently changed from slowly growing in size
> to very very slowly reducing in number.

Hmm. I think what that really means is you haven't got to the part of
the query where the leak is :-(. In my attempt to reproduce this
I found that 8.3 has introduced a memory leak into the RI trigger
support, such that even if an UPDATE doesn't change the FK columns
it's still likely to leak a few dozen bytes per updated row.

Please see if the attached patch makes it better for you.

regards, tom lane

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________ This email has been scanned by the DMZGlobal Business Quality              Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

Re: 8.3rc1 Out of memory when performing update

От

cgallant@gmail.com

Дата:

25 января 2008 г., 17:32:55

>
> > A simple update query, over roughly 17 million rows, populating a
> > newly added column in a table, resulted in an out of memory error
> > when the process memory usage reached 2GB. Could this be due to a
> > poor choice of some configuration parameter, or is there a limit on
> > how many rows I can update in a single statement?
> >
>
>    I believe that it is plataform problem. Because on *nix this limit
> don't occur. But I don't specialist Windows.

On most Windows Servers(except for database edition and a few other
variants), 2Gb is the most you can address to a single
process without booting the machine with a special parameter called \3G
which will allow for allocating up to 3Gb per process.  That is the
limit unless you get special versions of windows server 2003 as far as I
know.  If you do a google search on \3G with windows you will find what
I am refering too.

Hope that helps a bit.

Cheers,

---
Curtis Gallant
cgallant@gmail.com

Re: 8.3rc1 Out of memory when performing update

От

"Roberts, Jon"

Дата:

25 января 2008 г., 17:36:49

> Subject: Re: [PERFORM] 8.3rc1 Out of memory when performing update
>
> >
> > > A simple update query, over roughly 17 million rows, populating a
> > > newly added column in a table, resulted in an out of memory error
> > > when the process memory usage reached 2GB. Could this be due to a
> > > poor choice of some configuration parameter, or is there a limit
on
> > > how many rows I can update in a single statement?
> > >
> >
> >    I believe that it is plataform problem. Because on *nix this
limit
> > don't occur. But I don't specialist Windows.
>
> On most Windows Servers(except for database edition and a few other
> variants), 2Gb is the most you can address to a single
> process without booting the machine with a special parameter called
\3G
> which will allow for allocating up to 3Gb per process.  That is the
> limit unless you get special versions of windows server 2003 as far as
I
> know.  If you do a google search on \3G with windows you will find
what
> I am refering too.

Windows 32 bit is limited to 2 or 3 GB as you state but 64 bit Windows
isn't.  32 bit Linux has similar limits too.


Jon

Re: 8.3rc1 Out of memory when performing update

От

Magnus Hagander

Дата:

25 января 2008 г., 17:45:19

Roberts, Jon wrote:
>> Subject: Re: [PERFORM] 8.3rc1 Out of memory when performing update
>>
>>>> A simple update query, over roughly 17 million rows, populating a
>>>> newly added column in a table, resulted in an out of memory error
>>>> when the process memory usage reached 2GB. Could this be due to a
>>>> poor choice of some configuration parameter, or is there a limit
> on
>>>> how many rows I can update in a single statement?
>>>>
>>>    I believe that it is plataform problem. Because on *nix this
> limit
>>> don't occur. But I don't specialist Windows.
>> On most Windows Servers(except for database edition and a few other
>> variants), 2Gb is the most you can address to a single
>> process without booting the machine with a special parameter called
> \3G
>> which will allow for allocating up to 3Gb per process.  That is the
>> limit unless you get special versions of windows server 2003 as far as
> I
>> know.  If you do a google search on \3G with windows you will find
> what
>> I am refering too.
>
> Windows 32 bit is limited to 2 or 3 GB as you state but 64 bit Windows
> isn't.  32 bit Linux has similar limits too.

Well, PostgreSQL on Windows is a 32-bit binary, so the limit applies to
this case.

//Magnus

Re: 8.3rc1 Out of memory when performing update

От

"Stephen Denne"

Дата:

27 января 2008 г., 20:55:29

>>"Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:
>>> I altered the update statement slightly, and reran the query.
>>> The altered query has been running over 3 hours now,
>>> without using lots of memory (38M private bytes).
>>> 2046 temp files were created (2.54GB worth),
>>> which have recently changed from slowly growing in size
>>> to very very slowly reducing in number.

To which Tom Lane replied:
>>Hmm.  I think what that really means is you haven't got to the part of
>>the query where the leak is :-(.

I then said:
> It is now Friday night for me. I left the alternate query running, and will find out on Monday what happened.

Well, it is now Monday morning for me, and those temp files are still slowly reducing in number.
There are now only 1629 of them left, so I'm guessing that the query is about 20% done.
The PC seems to have been steadily but very slowly working away at this very simple query for close to 70 hours.
I decided not to leave this query running for a fortnight to find out if I then strike the memory leak.
Private Bytes had grown to 685MB
I cancelled the query.

Rough snapshot of what was happening with IO (a single 7200 IDE disk):

The process for the update query was reading about 500KB/second , writing between 80KB/second to 200KB/second.
The stats collector process was writing about 100KB/second
The wal writer process was writing about 200KB/second
The writer process was writing about 400KB/second
Checkpoints were 10 minutes apart, taking about 85 seconds to write 1000 buffers.

What could cause such poor performance?
I presume that the disk was being forced to move the head a great deal.

I also asked:
> If I drop the fk constraint, and/or its index, would I still be affected by the leak you found?

I dropped two indexes and one fk constraint and ran VACUUM FULL VERBOSE ANALYZE document_file;
As an indication of the disk performance: at its peak the vacuum process was reading and writing 20MB/seconds
(sustained),completing in less than 11 minutes. 

I reran the original query.
It used constant memory (6.7MB private bytes)
It was reading 2 to 3MB/second, writing 3 to 6MB/second.
The stats collector process was writing about 100KB/second
The wal writer process was writing about 200KB/second
The writer process was initially writing about 1MB/second, increasing to about 3MB/second

Checkpoints in the middle of this query were taking up to 13 seconds to write 100 buffers.

The checkpoint after the query took 300 seconds (exactly half the checkpoint interval), and was writing about
60KB/second.It wrote 2148 buffers. 

So dropping the fk constraint and index results in successful query execution with constant memory usage. Does this
confirmthat the memory leak you found is the one I was suffering from? 

I'd also class the extremely poor performance of the alternate query as a bug.
Why take a fortnight when you could take three quarters of an hour? (Granted there where two less indexes to update,
butthat is still too long.) 

Aside: I must say that I am impressed with PostgreSQL's handling of this connection. It recovers extremely well from
runningout of memory, cancelling very long running queries, reloading config (to turn autovacuum off), and continues to
workas expected (the 3 day old connection that is). 

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 

__________________________________________________________________
  This email has been scanned by the DMZGlobal Business Quality
              Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

Re: 8.3rc1 Out of memory when performing update

От

Tom Lane

Дата:

28 января 2008 г., 03:16:34

"Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:
> So dropping the fk constraint and index results in successful query execution with constant memory usage. Does this
confirmthat the memory leak you found is the one I was suffering from? 

Well, it confirms that you were suffering from that memory leak.  What's
not clear is whether the leak completely explains the bad performance
you saw.  The leak would have resulted in extra swapping, but I wouldn't
have thought it would drive the machine completely into swap hell.
Would the monitoring tools you were using have shown swapping I/O?

If you want to try a focused experiment, please apply the patch shown
here:
http://archives.postgresql.org/pgsql-committers/2008-01/msg00322.php
and see if the behavior gets better.

> I'd also class the extremely poor performance of the alternate query as a bug.

Yeah, we should look into that.  The plan you showed before estimated
about 16.9M rows in "document" --- is that about right?  What have you
got work_mem set to?

> Aside: I must say that I am impressed with PostgreSQL's handling of this connection. It recovers extremely well from
runningout of memory, cancelling very long running queries, reloading config (to turn autovacuum off), and continues to
workas expected (the 3 day old connection that is). 

Hey, we've had plenty of opportunity to improve the system's robustness
in the face of such things ;-)

            regards, tom lane

Re: 8.3rc1 Out of memory when performing update

От

"Stephen Denne"

Дата:

28 января 2008 г., 04:36:34

Tom Lane wrote:
> "Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:
> > So dropping the fk constraint and index results in
> successful query execution with constant memory usage. Does
> this confirm that the memory leak you found is the one I was
> suffering from?
>
> Well, it confirms that you were suffering from that memory
> leak.  What's
> not clear is whether the leak completely explains the bad performance
> you saw.  The leak would have resulted in extra swapping, but
> I wouldn't
> have thought it would drive the machine completely into swap hell.

The query crashing from out of memory did so after an hour,
which isn't bad performance given the workaround with less indexes to update succeeded after 45 minutes.

It was the rewritten one which I killed after 3 days.

> Would the monitoring tools you were using have shown swapping I/O?

I was using Process Explorer, which shows page faults and deltas,
which are not included in the read & write IO stats.
The query with poor IO performance wasn't swapping.

> > I'd also class the extremely poor performance of the
> alternate query as a bug.
>
> Yeah, we should look into that.  The plan you showed before estimated
> about 16.9M rows in "document" --- is that about right?  What have you
> got work_mem set to?

Yes, 16894164 rows.
Exactly the same number of rows in document as in document_file.
[count(*) queries taking 38 and 63 seconds]

work_mem appears to be left as the default 1MB

I get 1023 temp files created straight away, which take four minutes (250s) to grow to about 448K each
(reading @ 5MB/s writing @ 2MB/s)
memory usage during this first phase slowly increased from 13.4M to 14.4M
then 1023 more temp files are created, and they grow to about 2170K each
(reading @ 2MB/s writing @ 2MB/s until the checkpoint starts, when the speed decreases to 200K/s, and doesn't increase
againafter the checkpoint finishes.) 
memory usage during this first phase slowly increased from 22.5M to 26.4M
My concern is with what it then does. (Spends a fortnight doing really slow IO)

An hour's worth of logs from during this phase show 6 checkpoints, and 6 temp files reported (which seems to coincide
withthem being deleted): 

2008-01-26 06:02:08.086 NZDT [3432]: [233-1] LOG:  checkpoint starting: time
2008-01-26 06:03:28.916 NZDT [3432]: [234-1] LOG:  checkpoint complete: wrote 899 buffers (21.9%); 0 transaction log
file(s)added, 0 removed, 11 recycled; write=77.798 s, sync=2.750 s, total=80.830 s 
2008-01-26 06:12:08.094 NZDT [3432]: [235-1] LOG:  checkpoint starting: time
2008-01-26 06:12:23.407 NZDT [3824]: [209-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp3824.1321", size
2224520
2008-01-26 06:12:23.407 NZDT [3824]: [210-1] STATEMENT:  UPDATE ONLY document_file AS df SET document_type_id =
d.document_type_idFROM document AS d WHERE d.id = document_id; 
2008-01-26 06:12:24.157 NZDT [3824]: [211-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp3824.477", size 461356
2008-01-26 06:12:24.157 NZDT [3824]: [212-1] STATEMENT:  UPDATE ONLY document_file AS df SET document_type_id =
d.document_type_idFROM document AS d WHERE d.id = document_id; 
2008-01-26 06:13:21.876 NZDT [3432]: [236-1] LOG:  checkpoint complete: wrote 724 buffers (17.7%); 0 transaction log
file(s)added, 0 removed, 17 recycled; write=71.500 s, sync=2.108 s, total=73.781 s 
2008-01-26 06:22:08.024 NZDT [3432]: [237-1] LOG:  checkpoint starting: time
2008-01-26 06:23:25.415 NZDT [3432]: [238-1] LOG:  checkpoint complete: wrote 877 buffers (21.4%); 0 transaction log
file(s)added, 0 removed, 11 recycled; write=74.141 s, sync=2.985 s, total=77.391 s 
2008-01-26 06:29:36.311 NZDT [3824]: [213-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp3824.1350", size
2220990
2008-01-26 06:29:36.311 NZDT [3824]: [214-1] STATEMENT:  UPDATE ONLY document_file AS df SET document_type_id =
d.document_type_idFROM document AS d WHERE d.id = document_id; 
2008-01-26 06:29:36.982 NZDT [3824]: [215-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp3824.516", size 463540
2008-01-26 06:29:36.982 NZDT [3824]: [216-1] STATEMENT:  UPDATE ONLY document_file AS df SET document_type_id =
d.document_type_idFROM document AS d WHERE d.id = document_id; 
2008-01-26 06:32:08.016 NZDT [3432]: [239-1] LOG:  checkpoint starting: time
2008-01-26 06:33:19.501 NZDT [3432]: [240-1] LOG:  checkpoint complete: wrote 872 buffers (21.3%); 0 transaction log
file(s)added, 0 removed, 15 recycled; write=69.062 s, sync=2.171 s, total=71.484 s 
2008-01-26 06:42:08.101 NZDT [3432]: [241-1] LOG:  checkpoint starting: time
2008-01-26 06:43:27.431 NZDT [3432]: [242-1] LOG:  checkpoint complete: wrote 813 buffers (19.8%); 0 transaction log
file(s)added, 0 removed, 14 recycled; write=76.579 s, sync=2.592 s, total=79.329 s 
2008-01-26 06:46:45.558 NZDT [3824]: [217-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp3824.1940", size
2229130
2008-01-26 06:46:45.558 NZDT [3824]: [218-1] STATEMENT:  UPDATE ONLY document_file AS df SET document_type_id =
d.document_type_idFROM document AS d WHERE d.id = document_id; 
2008-01-26 06:46:46.246 NZDT [3824]: [219-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp3824.631", size 459564
2008-01-26 06:46:46.246 NZDT [3824]: [220-1] STATEMENT:  UPDATE ONLY document_file AS df SET document_type_id =
d.document_type_idFROM document AS d WHERE d.id = document_id; 
2008-01-26 06:52:08.078 NZDT [3432]: [243-1] LOG:  checkpoint starting: time
2008-01-26 06:53:31.173 NZDT [3432]: [244-1] LOG:  checkpoint complete: wrote 983 buffers (24.0%); 0 transaction log
file(s)added, 0 removed, 13 recycled; write=78.203 s, sync=4.641 s, total=83.094 s 

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 

__________________________________________________________________
  This email has been scanned by the DMZGlobal Business Quality
              Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

Re: 8.3rc1 Out of memory when performing update

От

"Guillaume Smet"

Дата:

28 января 2008 г., 07:18:22

On Jan 25, 2008 5:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hmm.  I think what that really means is you haven't got to the part of
> the query where the leak is :-(.  In my attempt to reproduce this
> I found that 8.3 has introduced a memory leak into the RI trigger
> support, such that even if an UPDATE doesn't change the FK columns
> it's still likely to leak a few dozen bytes per updated row.
>
> Please see if the attached patch makes it better for you.

Just FYI, somebody on #postgresql had the exact same problem of
memleaks during update yesterday and your patch fixed it for him too.

--
Guillaume

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: 8.3rc1 Out of memory when performing update