Обсуждение: Temporary tables and disk activity

Поиск
Список
Период
Сортировка

Temporary tables and disk activity

От
Phil Endecott
Дата:
Dear All,

I sent a message last weekend asking about temporary tables being
written to disk but didn't get any replies.  I'm sure there is someone
out there who knows something about this - please help!  Here is the
question again:


Looking at vmstat output on my database server I have been suprised to
see lots of disk writes going on while it is doing what should be
exclusively read-only transactions. I see almost no disk reads as the
database concerned is small enough to fit into the OS disk cache.

I suspect that it might be something to do with temporary tables. There
are a couple of places where I create temporary tables to "optimise"
queries by factoring out what would otherwise be duplicate work. The
amount of data being written is of the right order of magnitude for this
to be the cause. I fear that perhaps Postgresql is flushing these tables
to disk, even though they will be dropped before the end of the
transaction. Is this a possibility? What issues should I be aware of
with temporary tables? Are there any other common causes of lots of disk
writes within read-only transactions? Is there any debug output that I
can look at to track this down?

Thanks in advance for any help that you can offer.

Regards,

Phil Endecott.




Re: Temporary tables and disk activity

От
Martijn van Oosterhout
Дата:
I don't think temporary tables have any special rules regarding disk
writes, so I'd expect them ot get written out like everything else. The
database doesn't know you're going to delete them later.

Are the tables big?

On Thu, Dec 09, 2004 at 10:10:21PM +0000, Phil Endecott wrote:
> Dear All,
>
> I sent a message last weekend asking about temporary tables being
> written to disk but didn't get any replies.  I'm sure there is someone
> out there who knows something about this - please help!  Here is the
> question again:
>
>
> Looking at vmstat output on my database server I have been suprised to
> see lots of disk writes going on while it is doing what should be
> exclusively read-only transactions. I see almost no disk reads as the
> database concerned is small enough to fit into the OS disk cache.
>
> I suspect that it might be something to do with temporary tables. There
> are a couple of places where I create temporary tables to "optimise"
> queries by factoring out what would otherwise be duplicate work. The
> amount of data being written is of the right order of magnitude for this
> to be the cause. I fear that perhaps Postgresql is flushing these tables
> to disk, even though they will be dropped before the end of the
> transaction. Is this a possibility? What issues should I be aware of
> with temporary tables? Are there any other common causes of lots of disk
> writes within read-only transactions? Is there any debug output that I
> can look at to track this down?
>
> Thanks in advance for any help that you can offer.
>
> Regards,
>
> Phil Endecott.
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faqs/FAQ.html

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Вложения

Re: Temporary tables and disk activity

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> I don't think temporary tables have any special rules regarding disk
> writes, so I'd expect them ot get written out like everything else.

They'll be written out from PG's internal buffers, but IIRC they will
never be fsync'd, and they definitely aren't WAL-logged.  (These
statements hold true in 8.0, but not sure how far back.)

In principle, therefore, the kernel could hold temp table data in its
own disk buffers and never write it out to disk until the file is
deleted.  In practice, of course, the kernel doesn't know the data is
transient and will probably push it out whenever it has nothing else to
do.

One of the things on the TODO list is making the size of temp-table
buffers user-configurable.  (Temp table buffers are per-backend, they
are not part of the shared buffer arena.)  With a large temp-table arena
we'd never need to write to the kernel in the first place.  Right now
you could manually increase the #define that sets it, but it would not
pay to make it very large because the management algorithms are very
stupid (linear scans).  That has to be fixed first :-(

            regards, tom lane

Re: Temporary tables and disk activity

От
Phil Endecott
Дата:
Tom Lane wrote:
> They [temporary tables]
 > will be written out from PG's internal buffers, but IIRC they will
> never be fsync'd, and they definitely aren't WAL-logged.  (These
> statements hold true in 8.0, but not sure how far back.)
>
> In principle, therefore, the kernel could hold temp table data in its
> own disk buffers and never write it out to disk until the file is
> deleted.  In practice, of course, the kernel doesn't know the data is
> transient and will probably push it out whenever it has nothing else to
> do.

That makes sense.  I suspect that I am seeing writes every 5 seconds,
which looks like bdflush / update.

But my connections normally only last for a second at most.  In this
case, surely the table would normally have been deleted before the
kernel decided to write anything.  This is with 7.4.2 on linux 2.4.26.
Does anyone have any experience with this type of situation?  Is there
any kernel-tweaking I can play with?

Regards,

Phil.


Re: Temporary tables and disk activity

От
Tom Lane
Дата:
Phil Endecott <spam_from_postgresql_general@chezphil.org> writes:
> Tom Lane wrote:
>> In principle, therefore, the kernel could hold temp table data in its
>> own disk buffers and never write it out to disk until the file is
>> deleted.  In practice, of course, the kernel doesn't know the data is
>> transient and will probably push it out whenever it has nothing else to
>> do.

> That makes sense.  I suspect that I am seeing writes every 5 seconds,
> which looks like bdflush / update.

> But my connections normally only last for a second at most.  In this
> case, surely the table would normally have been deleted before the
> kernel decided to write anything.

That does seem a bit odd, then.  Can you strace a typical backend
session and see if it's doing anything to force a disk write?

(I'm too lazy to go check right now whether 7.4 handled temp tables
exactly the same as CVS tip does.  I think it's the same but I might
be wrong.)

            regards, tom lane

Re: Temporary tables and disk activity

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > I don't think temporary tables have any special rules regarding disk
> > writes, so I'd expect them ot get written out like everything else.
>
> They'll be written out from PG's internal buffers, but IIRC they will
> never be fsync'd, and they definitely aren't WAL-logged.  (These
> statements hold true in 8.0, but not sure how far back.)
>
> In principle, therefore, the kernel could hold temp table data in its
> own disk buffers and never write it out to disk until the file is
> deleted.  In practice, of course, the kernel doesn't know the data is
> transient and will probably push it out whenever it has nothing else to
> do.
>
> One of the things on the TODO list is making the size of temp-table
> buffers user-configurable.  (Temp table buffers are per-backend, they
> are not part of the shared buffer arena.)  With a large temp-table arena
> we'd never need to write to the kernel in the first place.  Right now
> you could manually increase the #define that sets it, but it would not
> pay to make it very large because the management algorithms are very
> stupid (linear scans).  That has to be fixed first :-(

I assume you mean your TODO list because the official one has no mention
of this.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Temporary tables and disk activity

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> One of the things on the TODO list is making the size of temp-table
>> buffers user-configurable.  (Temp table buffers are per-backend, they
>> are not part of the shared buffer arena.)  With a large temp-table arena
>> we'd never need to write to the kernel in the first place.  Right now
>> you could manually increase the #define that sets it, but it would not
>> pay to make it very large because the management algorithms are very
>> stupid (linear scans).  That has to be fixed first :-(

> I assume you mean your TODO list because the official one has no mention
> of this.

Doesn't it?  We've surely discussed the problem enough times, eg
http://archives.postgresql.org/pgsql-hackers/2002-08/msg00380.php
http://archives.postgresql.org/pgsql-hackers/2002-09/msg01368.php
or for that matter here's Vadim complaining about it seven years ago:
http://archives.postgresql.org/pgsql-hackers/1997-12/msg00215.php

            regards, tom lane

Re: Temporary tables and disk activity

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> One of the things on the TODO list is making the size of temp-table
> >> buffers user-configurable.  (Temp table buffers are per-backend, they
> >> are not part of the shared buffer arena.)  With a large temp-table arena
> >> we'd never need to write to the kernel in the first place.  Right now
> >> you could manually increase the #define that sets it, but it would not
> >> pay to make it very large because the management algorithms are very
> >> stupid (linear scans).  That has to be fixed first :-(
>
> > I assume you mean your TODO list because the official one has no mention
> > of this.
>
> Doesn't it?  We've surely discussed the problem enough times, eg
> http://archives.postgresql.org/pgsql-hackers/2002-08/msg00380.php
> http://archives.postgresql.org/pgsql-hackers/2002-09/msg01368.php
> or for that matter here's Vadim complaining about it seven years ago:
> http://archives.postgresql.org/pgsql-hackers/1997-12/msg00215.php

OK, added:

* Allow the size of the buffer cache used by temporary objects to be
  specified as a GUC variable

  Larger local buffer cache sizes requires more efficient handling of
  local cache lookups.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Temporary tables and disk activity

От
Phil Endecott
Дата:
Hi Tom,

I thought of a quicker way to investiage this than strace and did an ls
-lt in the data directory and looked up the tables that seem to change
on every transaction in pg_class.  They are the catalog tables:

# ls -lt /var/lib/postgres/data/base/17142/
total 530108
-rw-------    1 postgres postgres  6488064 Dec 13 18:44 1259
-rw-------    1 postgres postgres  3670016 Dec 13 18:44 1247
-rw-------    1 postgres postgres 38715392 Dec 13 18:44 1249
-rw-------    1 postgres postgres  3317760 Dec 13 18:44 16390
-rw-------    1 postgres postgres 13467648 Dec 13 18:44 16599
-rw-------    1 postgres postgres 16957440 Dec 13 18:44 16610
-rw-------    1 postgres postgres  4808704 Dec 13 18:44 16613
-rw-------    1 postgres postgres 17072128 Dec 13 18:44 16624
-rw-------    1 postgres postgres 14352384 Dec 13 18:44 16625
-rw-------    1 postgres postgres   483328 Dec 13 18:44 16630
-rw-------    1 postgres postgres  2228224 Dec 13 18:44 16652
-rw-------    1 postgres postgres  5742592 Dec 13 18:44 16653
-rw-------    1 postgres postgres 63578112 Dec 13 18:44 16609
-rw-------    1 postgres postgres 13787136 Dec 13 18:44 16614
-rw-------    1 postgres postgres   483328 Dec 13 18:44 16629

=> select pc.relfilenode, pc.relname, pn.nspname from pg_class pc join
pg_namespace pn on (pc.relnamespace=pn.oid) where pc.relfilenode in
('1259','1247','1249','16390','16599','16610','16613','16624','16625','16630','16652','16653','16609','16614','16629');

  relfilenode |             relname             |  nspname
-------------+---------------------------------+------------
        16599 | pg_depend                       | pg_catalog
        16390 | pg_index                        | pg_catalog
         1259 | pg_class                        | pg_catalog
         1249 | pg_attribute                    | pg_catalog
         1247 | pg_type                         | pg_catalog
        16653 | pg_type_typname_nsp_index       | pg_catalog
        16652 | pg_type_oid_index               | pg_catalog
        16630 | pg_index_indexrelid_index       | pg_catalog
        16629 | pg_index_indrelid_index         | pg_catalog
        16625 | pg_depend_reference_index       | pg_catalog
        16624 | pg_depend_depender_index        | pg_catalog
        16614 | pg_class_relname_nsp_index      | pg_catalog
        16613 | pg_class_oid_index              | pg_catalog
        16610 | pg_attribute_relid_attnum_index | pg_catalog
        16609 | pg_attribute_relid_attnam_index | pg_catalog
(15 rows)

Does this make sense?  I imagine that the temporary table is being added
to these tables and then removed again.

I do have quite a large number of tables in the database; I have one
schema per user and of the order of 20 tables per user and 200 users.  I
can imagine that in a system with fewer tables this would be
insignificant, yet in my case it seems to be writing of the order of a
megabyte in each 5-second update.

I should mention that I ANALYSE the temporary table after creating it
and before using it for anything;  I'm not sure if this does any good
but I put it in as it "couldn't do any harm".

Any thoughts?

Regards,

Phil.


Tom Lane wrote:
> Phil Endecott <spam_from_postgresql_general@chezphil.org> writes:
>
>>Tom Lane wrote:
>>
>>>In principle, therefore, the kernel could hold temp table data in its
>>>own disk buffers and never write it out to disk until the file is
>>>deleted.  In practice, of course, the kernel doesn't know the data is
>>>transient and will probably push it out whenever it has nothing else to
>>>do.
>
>
>>That makes sense.  I suspect that I am seeing writes every 5 seconds,
>>which looks like bdflush / update.
>
>
>>But my connections normally only last for a second at most.  In this
>>case, surely the table would normally have been deleted before the
>>kernel decided to write anything.
>
>
> That does seem a bit odd, then.  Can you strace a typical backend
> session and see if it's doing anything to force a disk write?
>
> (I'm too lazy to go check right now whether 7.4 handled temp tables
> exactly the same as CVS tip does.  I think it's the same but I might
> be wrong.)
>
>             regards, tom lane
>
>



Re: Temporary tables and disk activity

От
Tom Lane
Дата:
Phil Endecott <spam_from_postgresql_general@chezphil.org> writes:
> Does this make sense?  I imagine that the temporary table is being added
> to these tables and then removed again.

Yes, a temp table has the same catalog infrastructure as a regular
table, so creation and deletion of a temp table will cause some activity
in those catalogs.  I thought you were concerned about the data within
the temp table, though.

> I do have quite a large number of tables in the database; I have one
> schema per user and of the order of 20 tables per user and 200 users.  I
> can imagine that in a system with fewer tables this would be
> insignificant, yet in my case it seems to be writing of the order of a
> megabyte in each 5-second update.

That seems like a lot.  How often do you create/delete temp tables?

> I should mention that I ANALYSE the temporary table after creating it
> and before using it for anything;  I'm not sure if this does any good
> but I put it in as it "couldn't do any harm".

This is a good idea (if you analyze after filling the table) ... but it
will cause catalog traffic too, because again the pg_statistic rows go
into the regular pg_statistic catalog.

            regards, tom lane

Re: Temporary tables and disk activity

От
Phil Endecott
Дата:
Tom Lane wrote:
> Phil Endecott <spam_from_postgresql_general@chezphil.org> writes:
>
>>Does this make sense?  I imagine that the temporary table is being added
>>to these tables and then removed again.
>
>
> Yes, a temp table has the same catalog infrastructure as a regular
> table, so creation and deletion of a temp table will cause some activity
> in those catalogs.  I thought you were concerned about the data within
> the temp table, though.

I'm concerned about unnecessary disk activity, whatever its cause. I
guessed that it was the temp table contents.

>>I do have quite a large number of tables in the database; I have one
>>schema per user and of the order of 20 tables per user and 200 users.  I
>>can imagine that in a system with fewer tables this would be
>>insignificant, yet in my case it seems to be writing of the order of a
>>megabyte in each 5-second update.
>
>
> That seems like a lot.  How often do you create/delete temp tables?

Only once or twice per 5-sec update period.  I agree that it sounds like
a lot which makes me think this could all be a red herring;  I suspect
that there is something else going on as well as this temp table stuff
(possibly nothing to do with postgresql).  But FYI this is treefic.com,
a family tree website.  Have a look at, for example,
http://treefic.com/treefic/royal92?a=tree_page&root_id=10286&direction=up
The first step in building that diagram is to find the ancestors of the
root individual.  I have a pl_pgsql function that itteratively finds all
of the ancestors, progressively adding them to a temporary table.  So it
will create, populate, read and then drop one table for each page that
it generates.  This is reasonably fast; overall speed is not limited by
postgres.

What would happen if I were to rollback at the end of the transaction,
rather than committing (having made no changes)?  Would that eliminate
some or all of the catalog writes?

Many thanks for helping me understand this.

Regards,

Phil.


Re: Temporary tables and disk activity

От
Tom Lane
Дата:
Phil Endecott <spam_from_postgresql_general@chezphil.org> writes:
> What would happen if I were to rollback at the end of the transaction,
> rather than committing (having made no changes)?  Would that eliminate
> some or all of the catalog writes?

It would avoid fsync'ing the changes at commit time, but not really
reduce the write volume per se.

            regards, tom lane