Обсуждение: High memory usage / performance issue ( temp tables ? )
Hi I feel that there is some issue with temp tables and memory usage. Have seen a couple of posts online regarding this, but most issues have since been resolved or have been proved as problem unrelated to Postgres. I'd appreciate if someone can assist me in this. My situation: Version "PostgreSQL 9.2.6 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit" ) running on linux. I have a plpgsql function which takes an XML string payload as input and does some processing using that payload data. This function makes use of multiple other postgres functions (sql, plpgsql, plpython3u). Main purpose of the process is to populate multiple tables, basically doing inserts on financial transactional tables which in turn triggers to other tables triggering to other tables again, etc. , etc. The process follows a method where the XML payload data is inserted into multiple TEMP tables ( can be up to 10 tables for each function call ) . These are created using ON COMMIT DROP . My problem: We have a process using the above function to process batches of XML payload files. After running a batch of 50000 xml files , I definitely see a deterioration in performance. At first glance, I wrote this down to some sort of memory problem. $top Cpu(s): 25.1%us, 0.1%sy, 0.0%ni, 74.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 14371048k total, 14195464k used, 175584k free, 424788k buffers Swap: 6288380k total, 11972k used, 6276408kfree, 12114744k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20596 postgres 20 0 1688m 1.3g 1.2g S 0 9.1 0:24.38 postgres 29164 postgres 20 0 1721m 966m 933m R 100 6.9 4:30.18 postgres 28165 postgres 20 0 1782m630m 568m S 0 4.5 0:23.19 postgres 28155 postgres 20 0 1780m 460m 370m S 0 3.3 0:43.76 postgres (I have to admit, I'm not a linux expert -- I have some guys at the office who I can ask for help tomorrow). Now accoring to this post ( http://www.postgresql.org/message-id/165E6919-697C-4C50-9EEE-38728AC6D982@tcdi.com ) , this can be a display issue in top , rather than a real memory problem. I get the idea that the method of creating temp tables is probably causing the problem here . At the very least , it could be cause of performance issues. Has anyone else been in this same situation with regards to temp tables in Postgres I'd appreciate if there are any comments / advice / reprimands. Regards gmb -- View this message in context: http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
>Hi > >I feel that there is some issue with temp tables and memory usage. Have seen >a couple of posts online regarding this, but most issues have since been >resolved or have been proved as problem unrelated to Postgres. >I'd appreciate if someone can assist me in this. > >My situation: > >Version "PostgreSQL 9.2.6 on x86_64-unknown-linux-gnu, compiled by gcc >(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit" ) running on linux. > >I have a plpgsql function which takes an XML string payload as input and >does some processing using that payload data. This function makes use of >multiple other postgres functions (sql, plpgsql, plpython3u). >Main purpose of the process is to populate multiple tables, basically doing >inserts on financial transactional tables which in turn triggers to other >tables triggering to other tables again, etc. , etc. >The process follows a method where the XML payload data is inserted into >multiple TEMP tables ( can be up to 10 tables for each function call ) . >These are created using ON COMMIT DROP . Are you using the same temp tables for the whole batch or do you generate a few 100K of them ? The latter may cause performance degradation as temp objects are written to the catalog, resulting in some bloating. It would be interesting to see the output of vacuum full verbose, at least on pg_class and pg_attributes (beware that this may lock on busy systems). I avoid temp tables wherever I can. It is often possible with CTEs or unlogged tables. Another possibility is that the table statistics are getting outdated as your batch processes. If you start with empty target tables, postgres may use full scans and keep with this as long as the tables are not analyzed. It may help to call analyze explicitly on the touched tables a few times during your process. Here a look at the monitoring statistics may give some clue. (http://blog.pgaddict.com/posts/the-two-kinds-of-stats-in-postgresql) regards, Marc Mamin > >My problem: > >We have a process using the above function to process batches of XML payload >files. >After running a batch of 50000 xml files , I definitely see a deterioration >in performance. At first glance, I wrote this down to some sort of memory >problem. > >$top > > Cpu(s): 25.1%us, 0.1%sy, 0.0%ni, 74.8%id, 0.0%wa, 0.0%hi, 0.0%si, >0.0%st > Mem: 14371048k total, 14195464k used, 175584k free, 424788k buffers > Swap: 6288380k total, 11972k used, 6276408k free, 12114744k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 20596 postgres 20 0 1688m 1.3g 1.2g S 0 9.1 0:24.38 postgres > 29164 postgres 20 0 1721m 966m 933m R 100 6.9 4:30.18 postgres > 28165 postgres 20 0 1782m 630m 568m S 0 4.5 0:23.19 postgres > 28155 postgres 20 0 1780m 460m 370m S 0 3.3 0:43.76 postgres > >(I have to admit, I'm not a linux expert -- I have some guys at the office >who I can ask for help tomorrow). >Now accoring to this post ( >http://www.postgresql.org/message-id/165E6919-697C-4C50-9EEE-38728AC6D982@tcdi.com >) , this can be a display issue in top , rather than a real memory problem. > > >I get the idea that the method of creating temp tables is probably causing >the problem here . At the very least , it could be cause of performance >issues. >Has anyone else been in this same situation with regards to temp tables in >Postgres > >I'd appreciate if there are any comments / advice / reprimands. > > >Regards > >gmb
>> Are you using the same temp tables for the whole batch or do you generate a few 100K >> of them ? The process re-creates the 10 temp table for each instance of the function being called. I.e. this will equate to 500k temp tables for 50k xml files. The "ON COMMIT DROP" part was added at some stage as an attempt to solve some performance issues. THe argument was that , since a COMMIT is done after each of the 50k xml files , the number of temp tables will not build up and cause any problems. I can understand the performance issue due to load on the catalog, but I would not have expected this to have the impact I'm experiencing. >> It may help to call analyze explicitly on the touched tables >> a few times during your process. Here a look at the monitoring statistics >> may give some clue. >> (http://blog.pgaddict.com/posts/the-two-kinds-of-stats-in-postgresql) Thanks, I'll try this and see of this makes any difference. THanks for the input. Regards gmb -- View this message in context: http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815111.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
Something else I intended to ask in the original post, but forgot: What effect , if any , will the autovacuum setting have in this type of scenario ? Will performance be improved by disabling this ? I assume that a vacuum process will be required on at least the pg_catalog due to dropping of temp tables during the process. Regards gmb -- View this message in context: http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815126.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
Hi
			
		2014-08-17 21:47 GMT+02:00 gmb <gmbouwer@gmail.com>:
 
Something else I intended to ask in the original post, but forgot:
What effect , if any , will the autovacuum setting have in this type of
scenario ?
Will performance be improved by disabling this ?
no, when you use temporary tables intensively - you must vacuum pg_catalog intensively - vaccum should be more aggressive and faster then a bloating of system catalog. 
Probably you should to do REINDEX some system indexes more often than is usually.
Regards
Pavel
Pavel
I assume that a vacuum process will be required on at least the pg_catalog
due to dropping of temp tables during the process.
Regards
gmb
--
View this message in context: http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815126.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
Thanks for the replies. I tried a couple of alternative methods over the weekend in the hope of improving performance, but unfortunately to no avail. One of these was to have the processing of the 500K xml files shared between multiple threads ( multiple connections ). In an attempt to "force" the dropping of the temp tables , each thread creates its own connection, run the function with XML payload and the disconnects. The impression I got was that the avg time per transaction still increases as the process progresses. My one concern with this method was locking , which I'm unfortunately quite unfamiliar with. Is it possible that locking could be a key problem when following this multi-thread approach ? Regards gmb -- View this message in context: http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815175.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
2014-08-18 7:33 GMT+02:00 gmb <gmbouwer@gmail.com>:
 
 
Thanks for the replies.
I tried a couple of alternative methods over the weekend in the hope of
improving performance, but unfortunately to no avail.
One of these was to have the processing of the 500K xml files shared between
multiple threads ( multiple connections ).
In an attempt to "force" the dropping of the temp tables , each thread
creates its own connection, run the function with XML payload and the
disconnects.
The impression I got was that the avg time per transaction still increases
as the process progresses.
default temp_buffers = 8MB - so with ~ 10..20 clients all is done via IO, what is relative slow. Changes of system tables are not fast too on system with high load.
My one concern with this method was locking , which I'm unfortunately quite
unfamiliar with.
Is it possible that locking could be a key problem when following this
multi-thread approach ?
You can write PostgreSQL extension in C - and store XML only in memory. 
Temp tables are best when you do some queries or when you need indexes, but it is terrible slow cache.
else - Postgres is good as database and very slow as cache. It is good for prototyping and for less or middle load servers. For any other use different software 
Maybe memcached, 
Regards
Pavel
Pavel
Regards
gmb
--
View this message in context: http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815175.htmlSent from the PostgreSQL - sql mailing list archive at Nabble.com.
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
On Mon, Aug 18, 2014 at 8:45 AM, Pavel Stehule [via PostgreSQL] <[hidden email]> wrote:
2014-08-18 7:33 GMT+02:00 gmb <[hidden email]>:Thanks for the replies.
I tried a couple of alternative methods over the weekend in the hope of
improving performance, but unfortunately to no avail.
One of these was to have the processing of the 500K xml files shared between
multiple threads ( multiple connections ).
In an attempt to "force" the dropping of the temp tables , each thread
creates its own connection, run the function with XML payload and the
disconnects.
The impression I got was that the avg time per transaction still increases
as the process progresses.default temp_buffers = 8MB - so with ~ 10..20 clients all is done via IO, what is relative slow. Changes of system tables are not fast too on system with high load.My one concern with this method was locking , which I'm unfortunately quite
unfamiliar with.
Is it possible that locking could be a key problem when following this
multi-thread approach ?You can write PostgreSQL extension in C - and store XML only in memory.Temp tables are best when you do some queries or when you need indexes, but it is terrible slow cache.else - Postgres is good as database and very slow as cache. It is good for prototyping and for less or middle load servers. For any other use different softwareMaybe memcached,Regards
Pavel
Regards
gmb
--
View this message in context: http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815175.htmlSent from the PostgreSQL - sql mailing list archive at Nabble.com.Sent via pgsql-sql mailing list ([hidden email])
--http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815187.htmlIf you reply to this email, your message will be added to the discussion below:
View this message in context: Re: High memory usage / performance issue ( temp tables ? )
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
Just to make sure I understand this correct : > You can write PostgreSQL extension in C - and store XML only in memory.Meaning that I write function(s) which processes the XML using C which will allow me more option with processing theXML ? Will this allow me to cache the payload data in memory for the connection and not only for a single function ? Ifnot , this approach will not work , unless I rewrite a lot of the code / functions. The current solution makes use of aseparate function for each different part of the XML. Reason for this is that a lot of validation is also done on the data.> Temp tables are best when you do some queries or when you need indexes, but > it is terrible slow cache. >> else - Postgres is good as database and very slow as cache. It is good for > prototyping and for less or middleload servers. For any other use > different software So your point is that the use of temp tables in this kind ofscenario is not vey efficient. I'll then try to replace the temp tables with actual tables with some kind of UID. Thanksfor the response. gmb <br /><hr align="left" width="300" /> View this message in context: <a href="http://postgresql.1045698.n5.nabble.com/High-memory-usage-performance-issue-temp-tables-tp5815108p5815206.html">Re: Highmemory usage / performance issue ( temp tables ? )</a><br /> Sent from the <a href="http://postgresql.1045698.n5.nabble.com/PostgreSQL-sql-f2142323.html">PostgreSQL- sql mailing list archive</a> at Nabble.com.<br/>
2014-08-18 13:20 GMT+02:00 gmb <gmbouwer@gmail.com>:
 
Just to make sure I understand this correct : > You can write PostgreSQL extension in C - and store XML only in memory. Meaning that I write function(s) which processes the XML using C which will allow me more option with processing the XML ? Will this allow me to cache the payload data in memory for the connection and not only for a single function ? If not , this approach will not work , unless I rewrite a lot of the code / functions. The current solution makes use of a separate function for each different part of the XML. Reason for this is that a lot of validation is also done on the data. > Temp tables are best when you do some queries or when you need indexes, but > it is terrible slow cache. > > else - Postgres is good as database and very slow as cache. It is good for > prototyping and for less or middle load servers. For any other use > different software So your point is that the use of temp tables in this kind of scenario is not vey efficient. I'll then try to replace the temp tables with actual tables with some kind of UID. Thanks for the response. gmb
Postgres has own memory management 
data can live in shared memory context (across connections), session memory context, transaction memory context or query memory context.
Regards
Pavel
Pavel
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.