Обсуждение: Temporary tables and disk activity
Dear All, I sent a message last weekend asking about temporary tables being written to disk but didn't get any replies. I'm sure there is someone out there who knows something about this - please help! Here is the question again: Looking at vmstat output on my database server I have been suprised to see lots of disk writes going on while it is doing what should be exclusively read-only transactions. I see almost no disk reads as the database concerned is small enough to fit into the OS disk cache. I suspect that it might be something to do with temporary tables. There are a couple of places where I create temporary tables to "optimise" queries by factoring out what would otherwise be duplicate work. The amount of data being written is of the right order of magnitude for this to be the cause. I fear that perhaps Postgresql is flushing these tables to disk, even though they will be dropped before the end of the transaction. Is this a possibility? What issues should I be aware of with temporary tables? Are there any other common causes of lots of disk writes within read-only transactions? Is there any debug output that I can look at to track this down? Thanks in advance for any help that you can offer. Regards, Phil Endecott.
I don't think temporary tables have any special rules regarding disk writes, so I'd expect them ot get written out like everything else. The database doesn't know you're going to delete them later. Are the tables big? On Thu, Dec 09, 2004 at 10:10:21PM +0000, Phil Endecott wrote: > Dear All, > > I sent a message last weekend asking about temporary tables being > written to disk but didn't get any replies. I'm sure there is someone > out there who knows something about this - please help! Here is the > question again: > > > Looking at vmstat output on my database server I have been suprised to > see lots of disk writes going on while it is doing what should be > exclusively read-only transactions. I see almost no disk reads as the > database concerned is small enough to fit into the OS disk cache. > > I suspect that it might be something to do with temporary tables. There > are a couple of places where I create temporary tables to "optimise" > queries by factoring out what would otherwise be duplicate work. The > amount of data being written is of the right order of magnitude for this > to be the cause. I fear that perhaps Postgresql is flushing these tables > to disk, even though they will be dropped before the end of the > transaction. Is this a possibility? What issues should I be aware of > with temporary tables? Are there any other common causes of lots of disk > writes within read-only transactions? Is there any debug output that I > can look at to track this down? > > Thanks in advance for any help that you can offer. > > Regards, > > Phil Endecott. > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Вложения
Martijn van Oosterhout <kleptog@svana.org> writes: > I don't think temporary tables have any special rules regarding disk > writes, so I'd expect them ot get written out like everything else. They'll be written out from PG's internal buffers, but IIRC they will never be fsync'd, and they definitely aren't WAL-logged. (These statements hold true in 8.0, but not sure how far back.) In principle, therefore, the kernel could hold temp table data in its own disk buffers and never write it out to disk until the file is deleted. In practice, of course, the kernel doesn't know the data is transient and will probably push it out whenever it has nothing else to do. One of the things on the TODO list is making the size of temp-table buffers user-configurable. (Temp table buffers are per-backend, they are not part of the shared buffer arena.) With a large temp-table arena we'd never need to write to the kernel in the first place. Right now you could manually increase the #define that sets it, but it would not pay to make it very large because the management algorithms are very stupid (linear scans). That has to be fixed first :-( regards, tom lane
Tom Lane wrote: > They [temporary tables] > will be written out from PG's internal buffers, but IIRC they will > never be fsync'd, and they definitely aren't WAL-logged. (These > statements hold true in 8.0, but not sure how far back.) > > In principle, therefore, the kernel could hold temp table data in its > own disk buffers and never write it out to disk until the file is > deleted. In practice, of course, the kernel doesn't know the data is > transient and will probably push it out whenever it has nothing else to > do. That makes sense. I suspect that I am seeing writes every 5 seconds, which looks like bdflush / update. But my connections normally only last for a second at most. In this case, surely the table would normally have been deleted before the kernel decided to write anything. This is with 7.4.2 on linux 2.4.26. Does anyone have any experience with this type of situation? Is there any kernel-tweaking I can play with? Regards, Phil.
Phil Endecott <spam_from_postgresql_general@chezphil.org> writes: > Tom Lane wrote: >> In principle, therefore, the kernel could hold temp table data in its >> own disk buffers and never write it out to disk until the file is >> deleted. In practice, of course, the kernel doesn't know the data is >> transient and will probably push it out whenever it has nothing else to >> do. > That makes sense. I suspect that I am seeing writes every 5 seconds, > which looks like bdflush / update. > But my connections normally only last for a second at most. In this > case, surely the table would normally have been deleted before the > kernel decided to write anything. That does seem a bit odd, then. Can you strace a typical backend session and see if it's doing anything to force a disk write? (I'm too lazy to go check right now whether 7.4 handled temp tables exactly the same as CVS tip does. I think it's the same but I might be wrong.) regards, tom lane
Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > I don't think temporary tables have any special rules regarding disk > > writes, so I'd expect them ot get written out like everything else. > > They'll be written out from PG's internal buffers, but IIRC they will > never be fsync'd, and they definitely aren't WAL-logged. (These > statements hold true in 8.0, but not sure how far back.) > > In principle, therefore, the kernel could hold temp table data in its > own disk buffers and never write it out to disk until the file is > deleted. In practice, of course, the kernel doesn't know the data is > transient and will probably push it out whenever it has nothing else to > do. > > One of the things on the TODO list is making the size of temp-table > buffers user-configurable. (Temp table buffers are per-backend, they > are not part of the shared buffer arena.) With a large temp-table arena > we'd never need to write to the kernel in the first place. Right now > you could manually increase the #define that sets it, but it would not > pay to make it very large because the management algorithms are very > stupid (linear scans). That has to be fixed first :-( I assume you mean your TODO list because the official one has no mention of this. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> One of the things on the TODO list is making the size of temp-table >> buffers user-configurable. (Temp table buffers are per-backend, they >> are not part of the shared buffer arena.) With a large temp-table arena >> we'd never need to write to the kernel in the first place. Right now >> you could manually increase the #define that sets it, but it would not >> pay to make it very large because the management algorithms are very >> stupid (linear scans). That has to be fixed first :-( > I assume you mean your TODO list because the official one has no mention > of this. Doesn't it? We've surely discussed the problem enough times, eg http://archives.postgresql.org/pgsql-hackers/2002-08/msg00380.php http://archives.postgresql.org/pgsql-hackers/2002-09/msg01368.php or for that matter here's Vadim complaining about it seven years ago: http://archives.postgresql.org/pgsql-hackers/1997-12/msg00215.php regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> One of the things on the TODO list is making the size of temp-table > >> buffers user-configurable. (Temp table buffers are per-backend, they > >> are not part of the shared buffer arena.) With a large temp-table arena > >> we'd never need to write to the kernel in the first place. Right now > >> you could manually increase the #define that sets it, but it would not > >> pay to make it very large because the management algorithms are very > >> stupid (linear scans). That has to be fixed first :-( > > > I assume you mean your TODO list because the official one has no mention > > of this. > > Doesn't it? We've surely discussed the problem enough times, eg > http://archives.postgresql.org/pgsql-hackers/2002-08/msg00380.php > http://archives.postgresql.org/pgsql-hackers/2002-09/msg01368.php > or for that matter here's Vadim complaining about it seven years ago: > http://archives.postgresql.org/pgsql-hackers/1997-12/msg00215.php OK, added: * Allow the size of the buffer cache used by temporary objects to be specified as a GUC variable Larger local buffer cache sizes requires more efficient handling of local cache lookups. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Hi Tom, I thought of a quicker way to investiage this than strace and did an ls -lt in the data directory and looked up the tables that seem to change on every transaction in pg_class. They are the catalog tables: # ls -lt /var/lib/postgres/data/base/17142/ total 530108 -rw------- 1 postgres postgres 6488064 Dec 13 18:44 1259 -rw------- 1 postgres postgres 3670016 Dec 13 18:44 1247 -rw------- 1 postgres postgres 38715392 Dec 13 18:44 1249 -rw------- 1 postgres postgres 3317760 Dec 13 18:44 16390 -rw------- 1 postgres postgres 13467648 Dec 13 18:44 16599 -rw------- 1 postgres postgres 16957440 Dec 13 18:44 16610 -rw------- 1 postgres postgres 4808704 Dec 13 18:44 16613 -rw------- 1 postgres postgres 17072128 Dec 13 18:44 16624 -rw------- 1 postgres postgres 14352384 Dec 13 18:44 16625 -rw------- 1 postgres postgres 483328 Dec 13 18:44 16630 -rw------- 1 postgres postgres 2228224 Dec 13 18:44 16652 -rw------- 1 postgres postgres 5742592 Dec 13 18:44 16653 -rw------- 1 postgres postgres 63578112 Dec 13 18:44 16609 -rw------- 1 postgres postgres 13787136 Dec 13 18:44 16614 -rw------- 1 postgres postgres 483328 Dec 13 18:44 16629 => select pc.relfilenode, pc.relname, pn.nspname from pg_class pc join pg_namespace pn on (pc.relnamespace=pn.oid) where pc.relfilenode in ('1259','1247','1249','16390','16599','16610','16613','16624','16625','16630','16652','16653','16609','16614','16629'); relfilenode | relname | nspname -------------+---------------------------------+------------ 16599 | pg_depend | pg_catalog 16390 | pg_index | pg_catalog 1259 | pg_class | pg_catalog 1249 | pg_attribute | pg_catalog 1247 | pg_type | pg_catalog 16653 | pg_type_typname_nsp_index | pg_catalog 16652 | pg_type_oid_index | pg_catalog 16630 | pg_index_indexrelid_index | pg_catalog 16629 | pg_index_indrelid_index | pg_catalog 16625 | pg_depend_reference_index | pg_catalog 16624 | pg_depend_depender_index | pg_catalog 16614 | pg_class_relname_nsp_index | pg_catalog 16613 | pg_class_oid_index | pg_catalog 16610 | pg_attribute_relid_attnum_index | pg_catalog 16609 | pg_attribute_relid_attnam_index | pg_catalog (15 rows) Does this make sense? I imagine that the temporary table is being added to these tables and then removed again. I do have quite a large number of tables in the database; I have one schema per user and of the order of 20 tables per user and 200 users. I can imagine that in a system with fewer tables this would be insignificant, yet in my case it seems to be writing of the order of a megabyte in each 5-second update. I should mention that I ANALYSE the temporary table after creating it and before using it for anything; I'm not sure if this does any good but I put it in as it "couldn't do any harm". Any thoughts? Regards, Phil. Tom Lane wrote: > Phil Endecott <spam_from_postgresql_general@chezphil.org> writes: > >>Tom Lane wrote: >> >>>In principle, therefore, the kernel could hold temp table data in its >>>own disk buffers and never write it out to disk until the file is >>>deleted. In practice, of course, the kernel doesn't know the data is >>>transient and will probably push it out whenever it has nothing else to >>>do. > > >>That makes sense. I suspect that I am seeing writes every 5 seconds, >>which looks like bdflush / update. > > >>But my connections normally only last for a second at most. In this >>case, surely the table would normally have been deleted before the >>kernel decided to write anything. > > > That does seem a bit odd, then. Can you strace a typical backend > session and see if it's doing anything to force a disk write? > > (I'm too lazy to go check right now whether 7.4 handled temp tables > exactly the same as CVS tip does. I think it's the same but I might > be wrong.) > > regards, tom lane > >
Phil Endecott <spam_from_postgresql_general@chezphil.org> writes: > Does this make sense? I imagine that the temporary table is being added > to these tables and then removed again. Yes, a temp table has the same catalog infrastructure as a regular table, so creation and deletion of a temp table will cause some activity in those catalogs. I thought you were concerned about the data within the temp table, though. > I do have quite a large number of tables in the database; I have one > schema per user and of the order of 20 tables per user and 200 users. I > can imagine that in a system with fewer tables this would be > insignificant, yet in my case it seems to be writing of the order of a > megabyte in each 5-second update. That seems like a lot. How often do you create/delete temp tables? > I should mention that I ANALYSE the temporary table after creating it > and before using it for anything; I'm not sure if this does any good > but I put it in as it "couldn't do any harm". This is a good idea (if you analyze after filling the table) ... but it will cause catalog traffic too, because again the pg_statistic rows go into the regular pg_statistic catalog. regards, tom lane
Tom Lane wrote: > Phil Endecott <spam_from_postgresql_general@chezphil.org> writes: > >>Does this make sense? I imagine that the temporary table is being added >>to these tables and then removed again. > > > Yes, a temp table has the same catalog infrastructure as a regular > table, so creation and deletion of a temp table will cause some activity > in those catalogs. I thought you were concerned about the data within > the temp table, though. I'm concerned about unnecessary disk activity, whatever its cause. I guessed that it was the temp table contents. >>I do have quite a large number of tables in the database; I have one >>schema per user and of the order of 20 tables per user and 200 users. I >>can imagine that in a system with fewer tables this would be >>insignificant, yet in my case it seems to be writing of the order of a >>megabyte in each 5-second update. > > > That seems like a lot. How often do you create/delete temp tables? Only once or twice per 5-sec update period. I agree that it sounds like a lot which makes me think this could all be a red herring; I suspect that there is something else going on as well as this temp table stuff (possibly nothing to do with postgresql). But FYI this is treefic.com, a family tree website. Have a look at, for example, http://treefic.com/treefic/royal92?a=tree_page&root_id=10286&direction=up The first step in building that diagram is to find the ancestors of the root individual. I have a pl_pgsql function that itteratively finds all of the ancestors, progressively adding them to a temporary table. So it will create, populate, read and then drop one table for each page that it generates. This is reasonably fast; overall speed is not limited by postgres. What would happen if I were to rollback at the end of the transaction, rather than committing (having made no changes)? Would that eliminate some or all of the catalog writes? Many thanks for helping me understand this. Regards, Phil.
Phil Endecott <spam_from_postgresql_general@chezphil.org> writes: > What would happen if I were to rollback at the end of the transaction, > rather than committing (having made no changes)? Would that eliminate > some or all of the catalog writes? It would avoid fsync'ing the changes at commit time, but not really reduce the write volume per se. regards, tom lane