Обсуждение: High memory usage with savepoints & encoding differences
I'm running into a situation where postmaster is consuming significantly more memory than I would expect. This only seems to happen when I combine savepoints with differences between client and database encoding. I originally discovered this while running some Java code which uses JDBC (the postgres JDBC driver always sets client_encoding to UNICODE) to connect to a latin1 encoded database, but it's reproducible with psql as well. Here's a script which generates sql that triggers the unexpected behavior: dadams@postgres:/tmp > cat pg_savepoint.bash #!/bin/bash echo "SET client_encoding TO $1;" echo "BEGIN WORK;" for ((i=0; i<$2; i++)) do echo "SAVEPOINT x;" echo "RELEASE SAVEPOINT x;" done echo "COMMIT WORK;" Given these databases: dadams@postgres:/tmp > psql -l List of databases Name | Owner | Encoding -------------+----------+---------- postgres | postgres | UTF8 template0 | postgres | UTF8 template1 | postgres | UTF8 test_latin1 | dadams | LATIN1 test_utf8 | dadams | UTF8 (5 rows) All memory usages are those reported by top as VIRT. If I run "./pg_savepoint.bash UNICODE 100000 | psql -q test_utf8" or "./pg_savepoint.bash latin1 100000 | psql -q test_latin1", the memory usage of postmaster remains relatively constant at about 50mb. But when I run "./pg_savepoint.bash UNICODE 100000 | psql -q test_latin1" or "./pg_savepoint.bash latin1 100000 | psql -q test_utf8", the postmaster quickly consumes hundreds of mbs, topping out at 800-900mb. If I omit the SAVEPOINT/RELEASE SAVEPOINT statements and, instead, do repeated INSERTs into a temporary table, the memory usage remains reasonable (~50mb), regardless of differences between client and database encoding. This doesn't seem to happen in all cases of mixed encodings. Mixing win1252 and latin1 seems to be fine. It seems that only UNICODE/UTF8 <-> single byte charsets triggers this, although I wasn't exhaustive in my testing. I've tested this with 8.3.7 and 8.3.5, running on CentOS 5.2 and 4 update 6, respectively. Postgres was installed from the official RPMs downloaded from http://www.postgresql.org/ftp/binary/. I'm seeing the same behavior on enterprisedb's 8.3.5 One-click installer Postgres for Windows (although, using CP1252 instead of LATIN1). postgresql.conf is how initdb created it. Is this expected behavior? Or am I missing something? This seems really weird. thanks, dylan
"Dylan Adams" <dadams@bybaxter.com> writes: > I'm running into a situation where postmaster is consuming significantly > more memory than I would expect. This only seems to happen when I > combine savepoints with differences between client and database > encoding. I originally discovered this while running some Java code > which uses JDBC (the postgres JDBC driver always sets client_encoding to > UNICODE) to connect to a latin1 encoded database, but it's reproducible > with psql as well. I think this example is pretty artificial. The fundamental reason memory is increasing is that each subtransaction can require some state storage. In the example the per-subtransaction CurTransactionContexts are not getting used for anything except encoding conversion on the SAVEPOINT/RELEASE command completion messages --- but if you were doing any real work in the subtransactions then most likely there would be additional stuff there, so I'm not excited about trying to suppress this particular symptom. The bottom line is that lots and lots of subtransactions isn't a very good thing for performance, especially with a couple of network round trips for each one. Consider pushing whatever work is involved here into a server-side function. regards, tom lane
Tom Lane writes: > I think this example is pretty artificial. The fundamental reason > memory is increasing is that each subtransaction can require > some state > storage. In the example the per-subtransaction CurTransactionContexts > are not getting used for anything except encoding conversion on the > SAVEPOINT/RELEASE command completion messages --- but if you > were doing > any real work in the subtransactions then most likely there would be > additional stuff there, so I'm not excited about trying to suppress > this particular symptom. I was under the impression that the majority of the storage was free'd when the savepoint was RELEASEd. I was trying to reduce this down to the simplest repeatable example. In the actual code, the savepoint was bookending an INSERT. It seemed this was a reasonable approach, given the discussion around adding automatic savepoint logic to the JDBC driver: http://archives.postgresql.org/pgsql-jdbc/2005-01/msg00131.php http://archives.postgresql.org/pgsql-jdbc/2007-04/msg00085.php The actual code (including INSERT) is fine if the database it's executing against is UTF8 encoded, Unfortunately we can't convert to that, due to some other legacy applications that can't handle multibyte encodings. I am pretty surprised how much memory the transcoding requires. > The bottom line is that lots and lots of subtransactions isn't a very > good thing for performance, especially with a couple of network round > trips for each one. Consider pushing whatever work is involved here > into a server-side function. This was encountered as part of a migration process from an existing DBMS. We decided to go with savepoints to replicate the "error occurred during transaction" behavior of the previous database platform. In this case, it'll just mean reworking the code to check for duplicates rather than having Postgres do it for us. Thanks again! dylan