Обсуждение: High memory usage / performance issue ( temp tables ? )

Поиск
Список
Период
Сортировка

High memory usage / performance issue ( temp tables ? )

От
gmb
Дата:
Hi

I feel that there is some issue with temp tables and memory usage. Have seen
a couple of posts online regarding this, but most issues have since been
resolved or have been proved as problem unrelated to Postgres.
I'd appreciate if someone can assist me in this.

My situation: 

Version "PostgreSQL 9.2.6 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit" ) running on linux.

I have a plpgsql function which takes an XML string payload as input and
does some processing using that payload data. This function makes use of
multiple other postgres functions (sql, plpgsql, plpython3u).
Main purpose of the process is to populate multiple tables, basically doing
inserts on financial transactional tables which in turn triggers to other
tables triggering to other tables again, etc. , etc. 
The process follows a method where the XML payload data is inserted into
multiple TEMP tables ( can be up to 10 tables for each function call ) .
These are created using ON COMMIT DROP .

My problem:

We have a process using the above function to process batches of XML payload
files. 
After running a batch of 50000 xml files , I definitely see a deterioration
in performance. At first glance, I wrote this down to some sort of memory
problem.

$top
 Cpu(s): 25.1%us,  0.1%sy,  0.0%ni, 74.8%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st Mem:  14371048k total, 14195464k used,   175584k free,   424788k buffers Swap:  6288380k total,    11972k used,
6276408kfree, 12114744k cached
 
   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 20596 postgres  20   0 1688m 1.3g 1.2g S    0
9.1  0:24.38 postgres 29164 postgres  20   0 1721m 966m 933m R  100  6.9   4:30.18 postgres 28165 postgres  20   0
1782m630m 568m S    0  4.5   0:23.19 postgres 28155 postgres  20   0 1780m 460m 370m S    0  3.3   0:43.76 postgres
 

(I have to admit, I'm not a linux expert -- I have some guys at the office
who I can ask for help tomorrow).
Now accoring to this post (
http://www.postgresql.org/message-id/165E6919-697C-4C50-9EEE-38728AC6D982@tcdi.com
) , this can be a display issue in top , rather than a real memory problem.


I get the idea that the method of creating temp tables is probably causing
the problem here . At the very least , it could be cause of performance
issues.
Has anyone else been in this same situation with regards to temp tables in
Postgres 

I'd appreciate if there are any comments / advice / reprimands.


Regards 

gmb



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: High memory usage / performance issue ( temp tables ? )

От
Marc Mamin
Дата:
>Hi
>
>I feel that there is some issue with temp tables and memory usage. Have seen
>a couple of posts online regarding this, but most issues have since been
>resolved or have been proved as problem unrelated to Postgres.
>I'd appreciate if someone can assist me in this.
>
>My situation:
>
>Version "PostgreSQL 9.2.6 on x86_64-unknown-linux-gnu, compiled by gcc
>(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit" ) running on linux.
>
>I have a plpgsql function which takes an XML string payload as input and
>does some processing using that payload data. This function makes use of
>multiple other postgres functions (sql, plpgsql, plpython3u).
>Main purpose of the process is to populate multiple tables, basically doing
>inserts on financial transactional tables which in turn triggers to other
>tables triggering to other tables again, etc. , etc.
>The process follows a method where the XML payload data is inserted into
>multiple TEMP tables ( can be up to 10 tables for each function call ) .
>These are created using ON COMMIT DROP .

Are you using the same temp tables for the whole batch or do you generate a few 100K
of them ? The latter may cause performance degradation as temp objects are written to the catalog,
resulting in some bloating. It would be interesting to see the output of vacuum full verbose,
at least on pg_class and pg_attributes (beware that this may lock on busy systems).
I avoid temp tables wherever I can. It is often possible with CTEs or unlogged tables.


Another possibility is that the table statistics are getting outdated as your batch processes.
If you start with empty target tables, postgres may use full scans and keep with this
as long as the tables are not analyzed. It may help to call analyze  explicitly on the touched tables
a few times during your process. Here a look at the monitoring statistics may give some clue.
(http://blog.pgaddict.com/posts/the-two-kinds-of-stats-in-postgresql)

regards,
Marc Mamin

>
>My problem:
>
>We have a process using the above function to process batches of XML payload
>files.
>After running a batch of 50000 xml files , I definitely see a deterioration
>in performance. At first glance, I wrote this down to some sort of memory
>problem.
>
>$top
>
>  Cpu(s): 25.1%us,  0.1%sy,  0.0%ni, 74.8%id,  0.0%wa,  0.0%hi,  0.0%si,
>0.0%st
>  Mem:  14371048k total, 14195464k used,   175584k free,   424788k buffers
>  Swap:  6288380k total,    11972k used,  6276408k free, 12114744k cached
>
>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  20596 postgres  20   0 1688m 1.3g 1.2g S    0  9.1   0:24.38 postgres
>  29164 postgres  20   0 1721m 966m 933m R  100  6.9   4:30.18 postgres
>  28165 postgres  20   0 1782m 630m 568m S    0  4.5   0:23.19 postgres
>  28155 postgres  20   0 1780m 460m 370m S    0  3.3   0:43.76 postgres
>
>(I have to admit, I'm not a linux expert -- I have some guys at the office
>who I can ask for help tomorrow).
>Now accoring to this post (
>http://www.postgresql.org/message-id/165E6919-697C-4C50-9EEE-38728AC6D982@tcdi.com
>) , this can be a display issue in top , rather than a real memory problem.
>
>
>I get the idea that the method of creating temp tables is probably causing
>the problem here . At the very least , it could be cause of performance
>issues.
>Has anyone else been in this same situation with regards to temp tables in
>Postgres
>
>I'd appreciate if there are any comments / advice / reprimands.
>
>
>Regards
>
>gmb


Re: High memory usage / performance issue ( temp tables ? )

От
gmb
Дата:
>> Are you using the same temp tables for the whole batch or do you generate
a few 100K
>> of them ? 

The process re-creates the 10 temp table for each instance of the function
being called.
I.e. this will equate to 500k temp tables for 50k xml files. 
The "ON COMMIT DROP" part was added at some stage as an attempt to solve
some performance issues. THe argument was that , since a COMMIT is done
after each of the 50k xml files , the number of temp tables will not build
up and cause any problems. 

I can understand the performance issue due to load on the catalog, but I
would not have expected this to have the impact I'm experiencing.

>> It may help to call analyze  explicitly on the touched tables 
>> a few times during your process. Here a look at the monitoring statistics
>> may give some clue. 
>> (http://blog.pgaddict.com/posts/the-two-kinds-of-stats-in-postgresql) 

Thanks, I'll try this and see of this makes any difference.

THanks for the input.

Regards

gmb



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815111.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: High memory usage / performance issue ( temp tables ? )

От
gmb
Дата:
Something else I intended to ask in the original post, but forgot:

What effect , if any , will the autovacuum setting have in this type of
scenario ? 
Will performance be improved by disabling this ?
I assume that a vacuum process will be required on at least the pg_catalog
due to dropping of temp tables during the process.

Regards

gmb



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815126.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: High memory usage / performance issue ( temp tables ? )

От
Pavel Stehule
Дата:
Hi


2014-08-17 21:47 GMT+02:00 gmb <gmbouwer@gmail.com>:
Something else I intended to ask in the original post, but forgot:

What effect , if any , will the autovacuum setting have in this type of
scenario ?
Will performance be improved by disabling this ?

no, when you use temporary tables intensively - you must vacuum pg_catalog intensively - vaccum should be more aggressive and faster then a bloating of system catalog.

Probably you should to do REINDEX some system indexes more often than is usually.

Regards

Pavel
 
I assume that a vacuum process will be required on at least the pg_catalog
due to dropping of temp tables during the process.

Regards

gmb



--
View this message in context: http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815126.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: High memory usage / performance issue ( temp tables ? )

От
gmb
Дата:
Thanks for the replies.

I tried a couple of alternative methods over the weekend in the hope of
improving performance, but unfortunately to no avail.
One of these was to have the processing of the 500K xml files shared between
multiple threads ( multiple connections ). 
In an attempt to "force" the dropping of the temp tables , each thread
creates its own connection, run the function with XML payload and the
disconnects.
The impression I got was that the avg time per transaction still increases
as the process progresses.

My one concern with this method was locking  , which I'm unfortunately quite
unfamiliar with. 

Is it possible that locking could be a key problem when following this
multi-thread approach ?

Regards

gmb



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815175.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: High memory usage / performance issue ( temp tables ? )

От
Pavel Stehule
Дата:



2014-08-18 7:33 GMT+02:00 gmb <gmbouwer@gmail.com>:
Thanks for the replies.

I tried a couple of alternative methods over the weekend in the hope of
improving performance, but unfortunately to no avail.
One of these was to have the processing of the 500K xml files shared between
multiple threads ( multiple connections ).
In an attempt to "force" the dropping of the temp tables , each thread
creates its own connection, run the function with XML payload and the
disconnects.
The impression I got was that the avg time per transaction still increases
as the process progresses.

default temp_buffers = 8MB - so with ~ 10..20 clients all is done via IO, what is relative slow. Changes of system tables are not fast too on system with high load.
 

My one concern with this method was locking  , which I'm unfortunately quite
unfamiliar with.

Is it possible that locking could be a key problem when following this
multi-thread approach ?

You can write PostgreSQL extension in C - and store XML only in memory.

Temp tables are best when you do some queries or when you need indexes, but it is terrible slow cache.

else - Postgres is good as database and very slow as cache. It is good for prototyping and for less or middle load servers. For any other use different software

Maybe memcached,

Regards

Pavel
 

Regards

gmb



--
View this message in context: http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815175.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: High memory usage / performance issue ( temp tables ? )

От
gmb
Дата:



On Mon, Aug 18, 2014 at 8:45 AM, Pavel Stehule [via PostgreSQL] <[hidden email]> wrote:



2014-08-18 7:33 GMT+02:00 gmb <[hidden email]>:

Thanks for the replies.

I tried a couple of alternative methods over the weekend in the hope of
improving performance, but unfortunately to no avail.
One of these was to have the processing of the 500K xml files shared between
multiple threads ( multiple connections ).
In an attempt to "force" the dropping of the temp tables , each thread
creates its own connection, run the function with XML payload and the
disconnects.
The impression I got was that the avg time per transaction still increases
as the process progresses.

default temp_buffers = 8MB - so with ~ 10..20 clients all is done via IO, what is relative slow. Changes of system tables are not fast too on system with high load.
 

My one concern with this method was locking  , which I'm unfortunately quite
unfamiliar with.

Is it possible that locking could be a key problem when following this
multi-thread approach ?

You can write PostgreSQL extension in C - and store XML only in memory.

Temp tables are best when you do some queries or when you need indexes, but it is terrible slow cache.

else - Postgres is good as database and very slow as cache. It is good for prototyping and for less or middle load servers. For any other use different software

Maybe memcached,

Regards

Pavel
 

Regards

gmb



--
View this message in context: http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815175.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.


--
Sent via pgsql-sql mailing list ([hidden email])

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql




If you reply to this email, your message will be added to the discussion below:
http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815187.html
To unsubscribe from High memory usage / performance issue ( temp tables ? ), click here.
NAML



View this message in context: Re: High memory usage / performance issue ( temp tables ? )
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.

Re: High memory usage / performance issue ( temp tables ? )

От
gmb
Дата:
Just to make sure I understand this correct : > You can write PostgreSQL extension in C - and store XML only in
memory.Meaning that I write function(s) which processes the XML using C which will allow me more option with processing
theXML ? Will this allow me to cache the payload data in memory for the connection and not only for a single function ?
Ifnot , this approach will not work , unless I rewrite a lot of the code / functions. The current solution makes use of
aseparate function for each different part of the XML. Reason for this is that a lot of validation is also done on the
data.> Temp tables are best when you do some queries or when you need indexes, but > it is terrible slow cache.
>> else - Postgres is good as database and very slow as cache. It is good for > prototyping and for less or
middleload servers. For any other use > different software So your point is that the use of temp tables in this kind
ofscenario is not vey efficient. I'll then try to replace the temp tables with actual tables with some kind of UID.
Thanksfor the response. gmb <br /><hr align="left" width="300" /> View this message in context: <a
href="http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815206.html">Re:
Highmemory usage / performance issue ( temp tables ? )</a><br /> Sent from the <a
href="http://postgresql.1045698.n5.nabble.com/PostgreSQL-sql-f2142323.html">PostgreSQL- sql mailing list archive</a> at
Nabble.com.<br/> 

Re: High memory usage / performance issue ( temp tables ? )

От
Pavel Stehule
Дата:



2014-08-18 13:20 GMT+02:00 gmb <gmbouwer@gmail.com>:
Just to make sure I understand this correct : > You can write PostgreSQL extension in C - and store XML only in memory. Meaning that I write function(s) which processes the XML using C which will allow me more option with processing the XML ? Will this allow me to cache the payload data in memory for the connection and not only for a single function ? If not , this approach will not work , unless I rewrite a lot of the code / functions. The current solution makes use of a separate function for each different part of the XML. Reason for this is that a lot of validation is also done on the data. > Temp tables are best when you do some queries or when you need indexes, but > it is terrible slow cache. > > else - Postgres is good as database and very slow as cache. It is good for > prototyping and for less or middle load servers. For any other use > different software So your point is that the use of temp tables in this kind of scenario is not vey efficient. I'll then try to replace the temp tables with actual tables with some kind of UID. Thanks for the response. gmb

Postgres has own memory management

data can live in shared memory context (across connections), session memory context, transaction memory context or query memory context.


Regards

Pavel

 
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.