Обсуждение: Out of memory

Поиск
Список
Период
Сортировка

Out of memory

От
Jeremy Palmer
Дата:
Hi,

I've been having repeated troubles trying to get a PostgreSQL app to play nicely on Ubuntu. I recently posted a message
onthis list about an out of memory error and got a resolution by reducing the work_mem setting. However I'm now getting
furtherout of memory issues during the same stage of plpgsql function as mentioned before. 

The function itself is run as part of larger transaction which does the following:

1/ Maintains 104 tables (15 PostGIS tables), by loading and applying incremental table changes. A typical incremental
loadwith maintain about 10,000 rows. 

2/ When each one of these tables is updated an after trigger is fired that maintains an associated table revision
table.

3/ After all of the tables are maintained a plpgsql function is called to build/maintain a set of de-normalised tables.
Thesetables total about 20GB. Each one of these tables is compared against the previous table revision to determine its
rowchanges. It's in this function that the out of memory exception is occurring. 

The server log error message I'm getting in the function is here http://pastebin.com/346zi2sS. It's very long and
containsthe top transaction memory debug info. 

My initial observation about this error is that maybe PostgreSQL is encountering a memory corruption error because the
amountof OS memory does not seem to run out. The plpgsql function uses functions from both PostGIS and pgc_checksum
(http://pgfoundry.org/projects/pg-comparator)- so maybe they are the cause of the problem. Or maybe I have configured
somethingwrong... 

I did some memory logging during and the execution of the function. It shows for the majority of the transaction
executionthat the actual memory used is about 1GB (grows from the initial 600mb) with about 6.5GB cached for the OS: 

             total       used       free     shared    buffers     cached
Mem:          8004       7839        165          0          0       6802
-/+ buffers/cache:       1037       6967
Swap:          397          0        397

But just before the out of memory error occurs there is a spike to 2.5GB of used memory, but there us still 4.5GB
cachedby the OS: 

             total       used       free     shared    buffers     cached
Mem:          8004       7702        301          0          0       4854
-/+ buffers/cache:       2848       5156
Swap:          397          0        397

Then after the error the memory slowly returns this state:

             total       used       free     shared    buffers     cached
Mem:          8004       1478       6526          0          0       1133
-/+ buffers/cache:        345       7659
Swap:          397          0        397

The OS I'm running is:

Linux TSTLHAPP01 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC 2011 x86_64 GNU/Linux.

It’s a running on VMWare and, has 2 CPU’s and 8GB of RAM. This VM is dedicated to PostgreSQL, not much else is running
otherthan cacti, ssh and ftp server daemons. The main OS parameters I have tuned are: 

vm.swappiness=0
vm.overcommit_memory=2
kernel.shmmax = 4196769792
kernel.shmall = 1024602

And the PostgreSQL is:

PostgreSQL 9.0.3 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit.

The main changed postgresql.conf parameters I've tuned are:

shared_buffers = 512MB
maintenance_work_mem = 512MB
temp_buffers = 256MB
work_mem = 1MB
wal_buffers = 16MB
effective_cache_size = 4094MB

The size of the database is 350GB. The typical number of users connected to the database is 1 or 2. This database is
usedfor loading external data, managing revision table information and generating and outputting de-normalised
datasets,so it does not have a high number of transactions running. Typically 1 large one per day. 

Two questions:

1) Have I set the OS and postgresql parameter to sensible values given the hardware and database utilization.
2) Can anyone help me make sense of the top transaction memory error to help track down the issue?

Any other suggestions would be greatly appreciated.

Thanks
Jeremy

________________________________________
From: Jeremy Palmer
Sent: Saturday, 26 March 2011 9:57 p.m.
To: Scott Marlowe
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] Out of memory

Hi Scott,

It was the work_mem that was set too high. I reduced it to 32mb and the function executed.

Just so I understand this. Every time a sort is performed within a function, the sort memory is allocated, and then it
notreleased until the function completes? Rather then deallocating the memory after each sort operation has completed. 

Thanks,
Jeremy

________________________________________
From: Scott Marlowe [scott.marlowe@gmail.com]
Sent: Friday, 25 March 2011 5:04 p.m.
To: Jeremy Palmer
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Out of memory

On Thu, Mar 24, 2011 at 9:23 PM, Jeremy Palmer <JPalmer@linz.govt.nz> wrote:
> I’ve been getting database out of memory failures with some queries which
> deal with a reasonable amount of data.
>
> I was wondering what I should be looking at to stop this from happening.
>
> The typical messages I been getting are like this:
> http://pastebin.com/Jxfu3nYm
> The OS is:
>
> Linux TSTLHAPP01 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC
> 2011 x86_64 GNU/Linux.
>
> It’s a running on VMWare and, has 2 CPU’s and 8GB of RAM. This VM is
> dedicated to PostgreSQL. The main OS parameters I have tuned are:
>
> work_mem = 200MB

That's a really big work_mem.  I have mainline db servers with 128G of
ram that have work_mem set to 16M and that is still considered a
little high in my book.  If you drop work_mem down to 1MB does the out
of memory go away?  work_mem is how much memory EACH sort can use on
its own, if you have a plpgsql procedure that keeps running query
after query, it could use a LOT of memory really fast.
______________________________________________________________________________________________________

This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and
destroythe original message. 
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.

Thank you.
______________________________________________________________________________________________________

Re: Out of memory

От
Jeff Davis
Дата:
On Tue, 2011-04-05 at 21:50 +1200, Jeremy Palmer wrote:
> Hi,
>
> I've been having repeated troubles trying to get a PostgreSQL app to play nicely on Ubuntu. I recently posted a
messageon this list about an out of memory error and got a resolution by reducing the work_mem setting. However I'm now
gettingfurther out of memory issues during the same stage of plpgsql function as mentioned before. 
>
> The function itself is run as part of larger transaction which does the following:

Where is the source to the function?

Regards,
    Jeff Davis


Re: Out of memory

От
John R Pierce
Дата:
On 04/05/11 2:50 AM, Jeremy Palmer wrote:
> I've been having repeated troubles trying to get a PostgreSQL app to play nicely on Ubuntu. I recently posted a
messageon this list about an out of memory error and got a resolution by reducing the work_mem setting. However I'm now
gettingfurther out of memory issues during the same stage of plpgsql function as mentioned before. 
>
> The function itself is run as part of larger transaction which does the following:
>
> 1/ Maintains 104 tables (15 PostGIS tables), by loading and applying incremental table changes. A typical incremental
loadwith maintain about 10,000 rows. 
>
> 2/ When each one of these tables is updated an after trigger is fired that maintains an associated table revision
table.
>
> 3/ After all of the tables are maintained a plpgsql function is called to build/maintain a set of de-normalised
tables.These tables total about 20GB. Each one of these tables is compared against the previous table revision to
determineits row changes. It's in this function that the out of memory exception is occurring. 

a few random questions...

Does that all really have to be a single transaction?

Do you really need to use triggers for your revision tracking, and can't
rely on your daily update cycle to manually set the revision information?

Is it really necessary to generate massive denormalized tables, rather
than using view's to join the data?


> shared_buffers = 512MB
> maintenance_work_mem = 512MB
> temp_buffers = 256MB
> work_mem = 1MB
> wal_buffers = 16MB
> effective_cache_size = 4094MB
>
> The size of the database is 350GB. The typical number of users connected to the database is 1 or 2. This database is
usedfor loading external data, managing revision table information and generating and outputting de-normalised
datasets,so it does not have a high number of transactions running. Typically 1 large one per day. 


with only 1-2 connections, you certainly could increase the work_mem.
Alternately, this single giant transaction could manually set a larger
work_mem which would only apply to it.   Personally, given your 8gb
system and what you've described, I think I'd set the tuning parameters
something like...

    shared_buffers = 1GB
    maintenance_work_mem = 128MB
    temp_buffers = 64MB
    work_mem = 16MB
    wal_buffers = 16MB
    effective_cache_size = 4094MB


adjust effective_cache_size to somewhat less than the 'cached' value
shown in `free -m` after your system has been running for awhile.

Re: Out of memory

От
Jeremy Palmer
Дата:
Hi John,

> Does that all really have to be a single transaction?

Yes - I need to ensure that of the changesets and denormalised tables are created in the same transaction, so that if
anerror occurs the database is rolled back to the last successfully applied changeset. I don't want to get into the
businessof the splitting it into separate transactions and then having to revert changes that were applied in a
previoustransaction step. 

> Do you really need to use triggers for your revision tracking, and can't
> rely on your daily update cycle to manually set the revision information?

They are not necessary, but it has the cleanest code implementation and makes the revision maintenance to the tables
almosttransparent. If they are causing the problem I could change the logic... 

> Is it really necessary to generate massive denormalized tables, rather
> than using view's to join the data?

Yes - to create the tables is complex, and often involves complex functions and multiple temp tables. The overall time
tocreate these tables is somewhere in the area of 3hours on this server. I'm also unloading these tables multiple times
forseparate purposes, so they would need to be materialised anyway. 

> with only 1-2 connections, you certainly could increase the work_mem.

I can't increase this value at the moment on this server because I was getting out of memory errors with the initial
populationof the database (which builds the denormalized tables, but does not determine the changeset to the previous
tablerevision).  

I tried values, 256mb - 2mb and could only get the query to run with 1mb. I suspect even this was pushing the boundary,
sowhen I got to the next stage in my testing - to apply incremental updates - the memory issue raised it head again. 

Regards,
Jeremy

______________________________________________________________________________________________________

This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and
destroythe original message. 
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.

Thank you.
______________________________________________________________________________________________________

Re: Out of memory

От
Jeremy Palmer
Дата:
Hi Jeff,

< Where is the source to the function? 

The source is located here: https://github.com/linz/linz_bde_uploader

The main function LDS_MaintainSimplifiedLayers that is being called is on line 37 is in
https://github.com/linz/linz_bde_uploader/blob/master/sql/lds_layer_functions.sql. 

The actual out of memory exception was caught with the bde_GetTableDifferences function source file on line 3263 in
https://github.com/linz/linz_bde_uploader/blob/master/sql/bde_control_functions.sql.

When I was actually getting an out of memory issue when creating the tables (not maintaining them), the query that
seemedto kill the transaction was the one located at line 1463 of
https://github.com/linz/linz_bde_uploader/blob/master/sql/lds_layer_functions.sql.After I dropped the work_mem to 1MB
itgot past that and completed ok. But during the maintenance of the table the row differences need to be calculated and
thenapplied to the table. See the LDS_ApplyTableDifferences function on line 353. 

Regards,
Jeremy
______________________________________________________________________________________________________

This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and
destroythe original message. 
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.

Thank you.
______________________________________________________________________________________________________

Re: Out of memory

От
Jeremy Palmer
Дата:
Well after a few days of further investigation I still can't track the issue down. The main problem I can only
reproducethe error running the whole transaction. So I can't isolate the problem down to a simple use case or even
smallersubset of the transaction, which would have been nice for posting to this list. 

Does anyone have an idea of how I might go about trying to tackle this problem now. Should I try further reducing the
memorysettings? Or install a debug version of PostgreSQL and get some further information about what is going on before
theerror. Any advice would be greatly appreciated. 

Regards,
Jeremy

________________________________________
From: Jeremy Palmer
Sent: Tuesday, 5 April 2011 9:50 p.m.
To: pgsql-general@postgresql.org
Subject: Out of memory

Hi,

I've been having repeated troubles trying to get a PostgreSQL app to play nicely on Ubuntu. I recently posted a message
onthis list about an out of memory error and got a resolution by reducing the work_mem setting. However I'm now getting
furtherout of memory issues during the same stage of plpgsql function as mentioned before. 

The function itself is run as part of larger transaction which does the following:

1/ Maintains 104 tables (15 PostGIS tables), by loading and applying incremental table changes. A typical incremental
loadwith maintain about 10,000 rows. 

2/ When each one of these tables is updated an after trigger is fired that maintains an associated table revision
table.

3/ After all of the tables are maintained a plpgsql function is called to build/maintain a set of de-normalised tables.
Thesetables total about 20GB. Each one of these tables is compared against the previous table revision to determine its
rowchanges. It's in this function that the out of memory exception is occurring. 

The server log error message I'm getting in the function is here http://pastebin.com/346zi2sS. It's very long and
containsthe top transaction memory debug info. 

My initial observation about this error is that maybe PostgreSQL is encountering a memory corruption error because the
amountof OS memory does not seem to run out. The plpgsql function uses functions from both PostGIS and pgc_checksum
(http://pgfoundry.org/projects/pg-comparator)- so maybe they are the cause of the problem. Or maybe I have configured
somethingwrong... 

I did some memory logging during and the execution of the function. It shows for the majority of the transaction
executionthat the actual memory used is about 1GB (grows from the initial 600mb) with about 6.5GB cached for the OS: 

             total       used       free     shared    buffers     cached
Mem:          8004       7839        165          0          0       6802
-/+ buffers/cache:       1037       6967
Swap:          397          0        397

But just before the out of memory error occurs there is a spike to 2.5GB of used memory, but there us still 4.5GB
cachedby the OS: 

             total       used       free     shared    buffers     cached
Mem:          8004       7702        301          0          0       4854
-/+ buffers/cache:       2848       5156
Swap:          397          0        397

Then after the error the memory slowly returns this state:

             total       used       free     shared    buffers     cached
Mem:          8004       1478       6526          0          0       1133
-/+ buffers/cache:        345       7659
Swap:          397          0        397

The OS I'm running is:

Linux TSTLHAPP01 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC 2011 x86_64 GNU/Linux.

It’s a running on VMWare and, has 2 CPU’s and 8GB of RAM. This VM is dedicated to PostgreSQL, not much else is running
otherthan cacti, ssh and ftp server daemons. The main OS parameters I have tuned are: 

vm.swappiness=0
vm.overcommit_memory=2
kernel.shmmax = 4196769792
kernel.shmall = 1024602

And the PostgreSQL is:

PostgreSQL 9.0.3 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit.

The main changed postgresql.conf parameters I've tuned are:

shared_buffers = 512MB
maintenance_work_mem = 512MB
temp_buffers = 256MB
work_mem = 1MB
wal_buffers = 16MB
effective_cache_size = 4094MB

The size of the database is 350GB. The typical number of users connected to the database is 1 or 2. This database is
usedfor loading external data, managing revision table information and generating and outputting de-normalised
datasets,so it does not have a high number of transactions running. Typically 1 large one per day. 

Two questions:

1) Have I set the OS and postgresql parameter to sensible values given the hardware and database utilization.
2) Can anyone help me make sense of the top transaction memory error to help track down the issue?

Any other suggestions would be greatly appreciated.

Thanks
Jeremy
______________________________________________________________________________________________________

This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and
destroythe original message. 
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.

Thank you.
______________________________________________________________________________________________________

9.0 Out of memory

От
Jeremy Palmer
Дата:
Hi All,

I running PostgreSQL 9.0.3 and getting an out of memory error while running a big transaction. This error does not
crashthe backend.  

The nature of this transaction is it is sequentially applying data updates to a large number (104) of tables, then
afterapplying those updates, a series of materialised views are created or updated using a set of pl/pgsql function. I
needthis operation to be one transaction so I can rollback the update if any issue occurs. 

The size of the database and the database is 350GB and I would characterise this system as a data warehouse.

At this stage I can't isolate the problem down to a simple use case or even smaller subset of the transaction, which
wouldhave been nice for posting to this list. I can only replicate the error when I run the "entire" transaction -
runningthe table update part or materialised views parts separately work fine. To make matter worse I do not see any
contextaround my error "out of memory" message - which makes the message useless. 

The actual error occurs during a table compare operation within a function which uses 2 cursor to scan for differences.
Howeverdirectly before this I have a query that generates a temp table that I had trouble with earlier (when initially
tunningthe server) and had to set the work_mem to 1MB so it would run: 

    CREATE TEMP TABLE tmp_titles AS
    SELECT
        TTL.audit_id AS id,
        TTL.title_no,
        TTL.status,
        TTLT.char_value AS type,
        LOC.name AS land_district,
        TTL.issue_date,
        TTLG.char_value AS guarantee_status,
        string_agg(
            DISTINCT(
                ETTT.char_value || ', ' ||
                ETT.share || COALESCE(', ' || LGD.legal_desc_text, '') ||
                COALESCE(', ' || to_char(ROUND(LGD.total_area, 0), 'FM9G999G999G999G999') || ' m2', '')
            ),
            E'\r\n'
            ORDER BY
                ETTT.char_value || ', ' ||
                ETT.share || COALESCE(', ' || LGD.legal_desc_text, '') ||
                COALESCE(', ' || to_char(ROUND(LGD.total_area, 0), 'FM9G999G999G999G999') || ' m2', '') ASC
        ) AS estate_description,
        string_agg(
            DISTINCT
                CASE PRP.type
                    WHEN 'CORP' THEN PRP.corporate_name
                    WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
                END,
            ', '
            ORDER BY
                CASE PRP.type
                    WHEN 'CORP' THEN PRP.corporate_name
                    WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
                END ASC
        ) AS owners,
        count(
            DISTINCT
                CASE PRP.type
                    WHEN 'CORP' THEN PRP.corporate_name
                    WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
                END
        ) AS number_owners,
        TPA.title_no IS NOT NULL AS part_share,
        -- With Postgis 1.5.2 the ST_Collect aggregate returns a truncated
        -- collection when a null value is found. To fix this the shapes
        -- are order so all null shapes row are at the end of input list.
        ST_Multi(ST_Collect(PAR.shape ORDER BY PAR.shape ASC)) AS shape
    FROM
        crs_title TTL
        LEFT JOIN crs_title_estate ETT ON TTL.title_no = ETT.ttl_title_no AND ETT.status = 'REGD'
        LEFT JOIN crs_estate_share ETS ON ETT.id = ETS.ett_id AND ETT.status = 'REGD'
        LEFT JOIN crs_proprietor PRP ON ETS.id = PRP.ets_id AND PRP.status = 'REGD'
        LEFT JOIN crs_legal_desc LGD ON ETT.lgd_id = LGD.id AND LGD.type = 'ETT' AND LGD.status = 'REGD'
        LEFT JOIN crs_legal_desc_prl LGP ON LGD.id = LGP.lgd_id
        LEFT JOIN (
            SELECT
                title_no
            FROM
                tmp_parcel_titles
            GROUP BY
                title_no
            HAVING
                count(*) > 1
        ) TPA ON TTL.title_no = TPA.title_no
        LEFT JOIN (
            SELECT
                id,
                (ST_Dump(shape)).geom AS shape
            FROM
                crs_parcel
            WHERE
                status = 'CURR' AND
                ST_GeometryType(shape) IN ('ST_MultiPolygon', 'ST_Polygon')
        ) PAR ON LGP.par_id = PAR.id
        JOIN crs_locality LOC ON TTL.ldt_loc_id = LOC.id
        JOIN crs_sys_code TTLG ON TTL.guarantee_status = TTLG.code AND TTLG.scg_code = 'TTLG'
        JOIN crs_sys_code TTLT ON TTL.type = TTLT.code AND TTLT.scg_code = 'TTLT'
        LEFT JOIN crs_sys_code ETTT ON ETT.type = ETTT.code AND ETTT.scg_code = 'ETTT'
    WHERE
        TTL.status IN ('LIVE', 'PRTC') AND
        TTL.title_no NOT IN (SELECT title_no FROM tmp_excluded_titles)
    GROUP BY
        TTL.audit_id,
        TTL.title_no,
        TTL.status,
        TTLT.char_value,
        LOC.name,
        TTL.issue_date,
        TTLG.char_value,
        TPA.title_no;

This query does a lot of string concatenation and uses the new 9.0 string_agg function. The result set is about
2millionrows. 

So down to questions:

1) Is there anything I can do to get more information out about this memory error message? Debug build maybe, attach a
debuggerand set some break points? 

2) Is there anything I can do to optimise the memory so this transaction might be able to run?

3) I noticed in the Top Memory Context dump in the server log contained a lot of the MCs for table indexes/PKs (about
850)See http://pastebin.com/346zi2sS. Is this a problem? Could these MCs be cleaned-up part way thought the
transaction?Maybe use a savepoint? Or do they exist for the life of session or transaction? 

4) Can anyone help me make sense of the top transaction memory error to help track down the issue? What number should I
belooking for? 

Well I'm pretty much at the end of the line of in terms of getting PostgreSQL to work for this task. So any guidance
wouldbe soooo much appreciated. 

Thanks again,
Jeremy


PS. My config:

The OS I'm running is:

Linux TSTLHAPP01 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC 2011 x86_64 GNU/Linux.

It's a running on VMWare and, has 2 CPU's and 8GB of RAM. This VM is dedicated to PostgreSQL, not much else is running
otherthan cacti, ssh and ftp server daemons. The main OS parameters I have tuned are: 

vm.swappiness=0
vm.overcommit_memory=2
kernel.shmmax = 4196769792
kernel.shmall = 1024602

And the PostgreSQL is:

PostgreSQL 9.0.3 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit.

The main changed postgresql.conf parameters I've tuned are:

shared_buffers = 512MB
maintenance_work_mem = 512MB
temp_buffers = 256MB
work_mem = 1MB
wal_buffers = 16MB
effective_cache_size = 4094MB

The typical number of users connected to the database is 1 or 2.

______________________________________________________________________________________________________

This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and
destroythe original message. 
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.

Thank you.
______________________________________________________________________________________________________

Re: 9.0 Out of memory

От
Tom Lane
Дата:
Jeremy Palmer <JPalmer@linz.govt.nz> writes:
> I running PostgreSQL 9.0.3 and getting an out of memory error while running a big transaction. This error does not
crashthe backend.  

If it's a standard "out of memory" message, there should be a memory
context map dumped to postmaster's stderr.  (Which is inconvenient for
some logging arrangements, but it's hard to improve that without risking
not being able to print the map for lack of memory ...)  If you could
show us the map it might help to figure out what's happening.

            regards, tom lane

Re: 9.0 Out of memory

От
Tom Lane
Дата:
Jeremy Palmer <JPalmer@linz.govt.nz> writes:
> Ok I have attached the map, or least what I think the map is.

Yup, that's what I was after.  It looks like the main problem is here:

>     PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used
>       ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used
>         ExprContext: 2622363000 total in 9 blocks; 21080 free (15 chunks); 2622341920 used

You've evidently got a leak during execution of a query that's being run
in a "portal", which most likely is a cursor or plpgsql FOR-IN-SELECT
query.  Unfortunately there's not enough information here to tell which
query that is, but maybe you can narrow it down now.  I'm guessing that
some specific function or operator you're using in that query is leaking
memory with successive executions.

            regards, tom lane

Re: 9.0 Out of memory

От
Jeremy Palmer
Дата:
Hi Tom,

Wow thank you so much for the hint!

The plpgsql code that is could be to blame is in the below snippet. I had a look and I'm not sure why it might be
leaking.Is it because I assign the v_id1 and v_id2 to the return table 'id' record, return it and then assign to v_id1
orv_id2 again from the cursor? 

CREATE OR REPLACE FUNCTION bde_control.bde_gettabledifferences(p_table1 regclass, p_table2 regclass, p_compare_key
name)
  RETURNS TABLE("action" character, id bigint) AS
...
...
    FETCH FIRST FROM v_table_cur1 INTO v_id1, v_check1, v_uniq1;
    FETCH FIRST FROM v_table_cur2 INTO v_id2, v_check2, v_uniq2;

    WHILE v_id1 IS NOT NULL AND v_id2 IS NOT NULL LOOP
        IF v_id1 < v_id2 THEN
            action := 'D';
            id := v_id1;
            RETURN NEXT;
            FETCH NEXT FROM v_table_cur1 INTO v_id1, v_check1, v_uniq1;
            CONTINUE;
        ELSIF v_id2 < v_id1 THEN
            action := 'I';
            id := v_id2;
            RETURN NEXT;
            FETCH NEXT FROM v_table_cur2 INTO v_id2, v_check2, v_uniq2;
            CONTINUE;
        ELSIF v_uniq1 <> v_uniq2 THEN
            action := 'X';
            id := v_id1;
            RETURN NEXT;
        ELSIF v_check1 <> v_check2 THEN
            action := 'U';
            id := v_id1;
            RETURN NEXT;
        END IF;
        FETCH NEXT FROM v_table_cur1 INTO v_id1, v_check1, v_uniq1;
        FETCH NEXT FROM v_table_cur2 INTO v_id2, v_check2, v_uniq2;
    END LOOP;

    WHILE v_id1 IS NOT NULL LOOP
        action := 'D';
        id := v_id1;
        RETURN NEXT;
        FETCH NEXT FROM v_table_cur1 INTO v_id1, v_check1, v_uniq1;
    END LOOP;

    WHILE v_id2 IS NOT NULL LOOP
        action := 'I';
        id := v_id2;
        RETURN NEXT;
        FETCH NEXT FROM v_table_cur2 INTO v_id2, v_check2, v_uniq2;
    END LOOP;

    CLOSE v_table_cur1;
    CLOSE v_table_cur2;

    RETURN;

The full function can be read in full here:

https://github.com/linz/linz_bde_uploader/blob/master/sql/bde_control_functions.sql#L3263

The actual query for both cursors in the case of the table that was proabily causing the error looks like this
(v_table_cur1and v_table_cur2 only differ by table referenced): 

SELECT
   id AS ID,
   COALESCE('V|' || CAST(T.estate_description AS TEXT), '|N') || '|V' ||
   CAST(T.guarantee_status AS TEXT) || '|V' ||
   CAST(T.issue_date AS TEXT) || '|V' ||
   CAST(T.land_district AS TEXT) || '|V' ||
   CAST(T.number_owners AS TEXT) || '|V' ||
   CAST(T.part_share AS TEXT) ||
   COALESCE('V|' || CAST(T.shape AS TEXT), '|N') || '|V' ||
   CAST(T.status AS TEXT) || '|V' ||
   CAST(T.title_no AS TEXT) || '|V' ||
   CAST(T.type AS TEXT) AS check_sum,
   '' AS check_uniq
FROM
   lds.titles AS T
ORDER BY
   id ASC;

The definition for the table looks like this:

CREATE TABLE titles (
    id INTEGER NOT NULL PRIMARY KEY,
    title_no VARCHAR(20) NOT NULL,
    status VARCHAR(4) NOT NULL,
    type TEXT NOT NULL,
    land_district VARCHAR(100) NOT NULL,
    issue_date TIMESTAMP NOT NULL,
    guarantee_status TEXT NOT NULL,
    estate_description TEXT,
    number_owners INT8 NOT NULL,
    part_share BOOLEAN NOT NULL,
   shape GEOMETRY,
);

CREATE INDEX shx_title_shape ON titles USING gist (shape);

Thanks,
Jeremy
________________________________________
From: Tom Lane [tgl@sss.pgh.pa.us]
Sent: Wednesday, 13 April 2011 5:44 p.m.
To: Jeremy Palmer
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] 9.0 Out of memory

Jeremy Palmer <JPalmer@linz.govt.nz> writes:
> Ok I have attached the map, or least what I think the map is.

Yup, that's what I was after.  It looks like the main problem is here:

>     PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used
>       ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used
>         ExprContext: 2622363000 total in 9 blocks; 21080 free (15 chunks); 2622341920 used

You've evidently got a leak during execution of a query that's being run
in a "portal", which most likely is a cursor or plpgsql FOR-IN-SELECT
query.  Unfortunately there's not enough information here to tell which
query that is, but maybe you can narrow it down now.  I'm guessing that
some specific function or operator you're using in that query is leaking
memory with successive executions.

                        regards, tom lane
______________________________________________________________________________________________________

This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and
destroythe original message. 
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.

Thank you.
______________________________________________________________________________________________________

Re: 9.0 Out of memory

От
Tom Lane
Дата:
Jeremy Palmer <JPalmer@linz.govt.nz> writes:
> The plpgsql code that is could be to blame is in the below snippet. I had a look and I'm not sure why it might be
leaking.Is it because I assign the v_id1 and v_id2 to the return table 'id' record, return it and then assign to v_id1
orv_id2 again from the cursor? 

No, given the info from the memory map I'd have to say that the leakage
is in the cursor not in what you do in the plpgsql function.  The cursor
query looks fairly unexciting except for the cast from geometry to text.
I don't have PostGIS installed here so I can't do any testing, but I
wonder whether the leak goes away if you remove that part of the query
(ie, leave the shape out of the "checksum" for testing purposes).
If so, you probably ought to file the issue as a PostGIS bug.

            regards, tom lane

Re: 9.0 Out of memory

От
Jeremy Palmer
Дата:
> No, given the info from the memory map I'd have to say that the leakage
> is in the cursor not in what you do in the plpgsql function.  The cursor
> query looks fairly unexciting except for the cast from geometry to text.
> I don't have PostGIS installed here so I can't do any testing, but I
> wonder whether the leak goes away if you remove that part of the query
> (ie, leave the shape out of the "checksum" for testing purposes).
> If so, you probably ought to file the issue as a PostGIS bug.

Ok I removed the geometry column from the cursor query within the function and the session still runs out of memory.
I'mstill seeing the same error message as well: 

    PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used
      ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used
        ExprContext: 2496819768 total in 9 blocks; 21080 free (15 chunks); 2496798688 used

So I guess it's not likely to be the PostGIS geometry to text cast that is leaking the memory.

One thing that has got me interested now is query that executes directly before (see SQL below). If I remove the
geometrycolumn that is generated using ST_Collect aggregate function, the subsequent function involving the cursor
querycompletes and the transaction also runs to completion. 

Is there any way that ST_Collect could be leaking memory into a context that does not get cleaned up after the query
runs?Or do I have two leaks going on here?! 

Cheers,
Jeremy

CREATE TEMP TABLE tmp_titles AS
    SELECT
        TTL.audit_id AS id,
        TTL.title_no,
        TTL.status,
        TTLT.char_value AS type,
        LOC.name AS land_district,
        TTL.issue_date,
        TTLG.char_value AS guarantee_status,
        string_agg(
            DISTINCT(
                ETTT.char_value || ', ' ||
                ETT.share || COALESCE(', ' || LGD.legal_desc_text, '') ||
                COALESCE(', ' || to_char(ROUND(LGD.total_area, 0), 'FM9G999G999G999G999') || ' m2', '')
            ),
            E'\r\n'
            ORDER BY
                ETTT.char_value || ', ' ||
                ETT.share || COALESCE(', ' || LGD.legal_desc_text, '') ||
                COALESCE(', ' || to_char(ROUND(LGD.total_area, 0), 'FM9G999G999G999G999') || ' m2', '') ASC
        ) AS estate_description,
        string_agg(
            DISTINCT
                CASE PRP.type
                    WHEN 'CORP' THEN PRP.corporate_name
                    WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
                END,
            ', '
            ORDER BY
                CASE PRP.type
                    WHEN 'CORP' THEN PRP.corporate_name
                    WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
                END ASC
        ) AS owners,
        count(
            DISTINCT
                CASE PRP.type
                    WHEN 'CORP' THEN PRP.corporate_name
                    WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
                END
        ) AS number_owners,
        TPA.title_no IS NOT NULL AS part_share,
        -- With Postgis 1.5.2 the ST_Collect aggregate returns a truncated
        -- collection when a null value is found. To fix this the shapes
        -- are order so all null shapes row are at the end of input list.
        ST_Multi(ST_Collect(PAR.shape ORDER BY PAR.shape ASC)) AS shape
    FROM
        crs_title TTL
        LEFT JOIN crs_title_estate ETT ON TTL.title_no = ETT.ttl_title_no AND ETT.status = 'REGD'
        LEFT JOIN crs_estate_share ETS ON ETT.id = ETS.ett_id AND ETT.status = 'REGD'
        LEFT JOIN crs_proprietor PRP ON ETS.id = PRP.ets_id AND PRP.status = 'REGD'
        LEFT JOIN crs_legal_desc LGD ON ETT.lgd_id = LGD.id AND LGD.type = 'ETT' AND LGD.status = 'REGD'
        LEFT JOIN crs_legal_desc_prl LGP ON LGD.id = LGP.lgd_id
        LEFT JOIN (
            SELECT
                title_no
            FROM
                tmp_parcel_titles
            GROUP BY
                title_no
            HAVING
                count(*) > 1
        ) TPA ON TTL.title_no = TPA.title_no
        LEFT JOIN (
            SELECT
                id,
                (ST_Dump(shape)).geom AS shape
            FROM
                crs_parcel
            WHERE
                status = 'CURR' AND
                ST_GeometryType(shape) IN ('ST_MultiPolygon', 'ST_Polygon')
        ) PAR ON LGP.par_id = PAR.id
        JOIN crs_locality LOC ON TTL.ldt_loc_id = LOC.id
        JOIN crs_sys_code TTLG ON TTL.guarantee_status = TTLG.code AND TTLG.scg_code = 'TTLG'
        JOIN crs_sys_code TTLT ON TTL.type = TTLT.code AND TTLT.scg_code = 'TTLT'
        LEFT JOIN crs_sys_code ETTT ON ETT.type = ETTT.code AND ETTT.scg_code = 'ETTT'
    WHERE
        TTL.status IN ('LIVE', 'PRTC') AND
        TTL.title_no NOT IN (SELECT title_no FROM tmp_excluded_titles)
    GROUP BY
        TTL.audit_id,
        TTL.title_no,
        TTL.status,
        TTLT.char_value,
        LOC.name,
        TTL.issue_date,
        TTLG.char_value,
        TPA.title_no;
______________________________________________________________________________________________________

This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and
destroythe original message. 
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.

Thank you.
______________________________________________________________________________________________________

Re: 9.0 Out of memory

От
Tom Lane
Дата:
Jeremy Palmer <JPalmer@linz.govt.nz> writes:
> Ok I removed the geometry column from the cursor query within the function and the session still runs out of memory.
I'mstill seeing the same error message as well: 

>     PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used
>       ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used
>         ExprContext: 2496819768 total in 9 blocks; 21080 free (15 chunks); 2496798688 used

> So I guess it's not likely to be the PostGIS geometry to text cast that is leaking the memory.

OK, so that was a wrong guess.

> One thing that has got me interested now is query that executes directly before (see SQL below). If I remove the
geometrycolumn that is generated using ST_Collect aggregate function, the subsequent function involving the cursor
querycompletes and the transaction also runs to completion. 

Hrm.  We were pretty much guessing as to which query was running in that
portal, I think.  It seems entirely plausible that this other query is
the one at fault instead.  It might be premature to blame ST_Collect per
se though --- in particular I'm wondering about the ORDER BY on the
ST_Collect's input.  But if this line of thought is correct, you ought
to be able to exhibit a memory leak using just that sub-part of that
query, without the surrounding function or any other baggage.  Maybe the
leak wouldn't drive the backend to complete failure without that
additional overhead; but a leak of a couple gig ought to be pretty
obvious when watching the process with "top" or similar tool.

            regards, tom lane

Re: 9.0 Out of memory

От
Jeremy Palmer
Дата:
>> Ok I removed the geometry column from the cursor query within the function and the session still runs out of memory.
I'mstill seeing the same error message as well: 

>>     PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used
>>       ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used
>>         ExprContext: 2496819768 total in 9 blocks; 21080 free (15
>> chunks); 2496798688 used

>> So I guess it's not likely to be the PostGIS geometry to text cast that is leaking the memory.

>OK, so that was a wrong guess.

Hi Tom,

I finally tracked down the issue! The query that was generating the temp table which was used as input into the 2
cursorqueries was generating an invalid, very large geometry (like 200mb).  

It turned out I had a bug in the previous testing I was doing, and I didn't actually remove the geometry column from
thesecond cursor. So I guess a 200mb geometry being cast to text used too much memory. Not sure if there is still a
leak...But I guess that depends on weather the geometry expands to over 1 GB when converted to text. 

Anyway I would like to personally thank you for you time in helping me with this issue.

Regards,
Jeremy


______________________________________________________________________________________________________

This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and
destroythe original message. 
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.

Thank you.
______________________________________________________________________________________________________