Обсуждение: performance of temporary vs. regular tables

От:
Joachim Worringen
Дата:

Greetings,

in
http://archives.postgresql.org/message-id/1056648218.7041.11.camel@jester,
it is stated that the performance of temporary tables is "the same as a
regular table but without
WAL on the table contents.".

I have a datamining-type application which makes heavy use of temporary
tables to stage (potentially large amounts of) data between different
operations. WAL is write-ahead

To effectively multi-thread this application, I (think I) need to switch
from temporary to regular tables, because
- the concurrent threads need to use different connections, not cursors,
to effectively operate concurrently
- temporary tables are not visible across connections (as they are
across cursors of the same connection)

Thus, I wonder how much this will affect performance. Access on the
temporary table is inserting (millions of) rows once in a single
transaction, potentially update them all once within a single
transaction, then select on them once or more.

Of course, eventually loosing the data in these tables is not a problem
at all. The threads are synchronized above the SQL level.

Thanks for any input on how to maximize performance for this applicaiton.

  Joachim


От:
Grzegorz Jaśkiewicz
Дата:

temporary tables are handled pretty much like the regular table. The
magic happens on schema level, new schema is setup for connection, so
that it can access its own temporary tables.
Temporary tables also are not autovacuumed.
And that's pretty much the most of the differences.

От:
Joachim Worringen
Дата:

Am 25.05.2010 10:49, schrieb Grzegorz Jaśkiewicz:
> temporary tables are handled pretty much like the regular table. The
> magic happens on schema level, new schema is setup for connection, so
> that it can access its own temporary tables.
> Temporary tables also are not autovacuumed.
> And that's pretty much the most of the differences.

Thanks. So, the Write-Ahead-Logging (being used or not) does not matter?

And, is there anything like RAM-only tables? I really don't care whether
the staging data is lost on the rare event of a machine crash, or
whether the query crashes due to lack of memory (I make sure there's
enough w/o paging) - I only care about performance here.

  Joachim


От:
Thom Brown
Дата:

2010/5/25 Joachim Worringen <>:
> Am 25.05.2010 10:49, schrieb Grzegorz Jaśkiewicz:
>>
>> temporary tables are handled pretty much like the regular table. The
>> magic happens on schema level, new schema is setup for connection, so
>> that it can access its own temporary tables.
>> Temporary tables also are not autovacuumed.
>> And that's pretty much the most of the differences.
>
> Thanks. So, the Write-Ahead-Logging (being used or not) does not matter?
>
> And, is there anything like RAM-only tables? I really don't care whether the
> staging data is lost on the rare event of a machine crash, or whether the
> query crashes due to lack of memory (I make sure there's enough w/o paging)
> - I only care about performance here.
>
>  Joachim
>

I think can create a tablespace on a ram disk, and create a table there.

Thom

От:
Joachim Worringen
Дата:

Am 25.05.2010 11:15, schrieb Thom Brown:
> 2010/5/25 Joachim Worringen<>:
>> And, is there anything like RAM-only tables? I really don't care whether the
>> staging data is lost on the rare event of a machine crash, or whether the
>> query crashes due to lack of memory (I make sure there's enough w/o paging)
>> - I only care about performance here.
>>
>>   Joachim
>>
>
> I think can create a tablespace on a ram disk, and create a table there.

True, but I think this makes the database server configuration more
complex (which is acceptable), and may add dependencies between the
server configuration and the SQL statements for the selection of
tablespace name (which would be a problem)?

But I am a tablespace-novice and will look into this "workaround".

  thanks, Joachim


От:
Grzegorz Jaśkiewicz
Дата:

WAL does the same thing to DB journaling does to the FS.
Plus allows you to roll back (PITR).

As for the RAM, it will be in ram as long as OS decides to keep it in
RAM cache, and/or its in the shared buffers memory.
Unless you have a lot of doubt about the two, I don't think it makes
too much sens to setup ramdisk table space yourself. But try it, and
see yourself.
Make sure that you have logic in place, that would set it up, before
postgresql starts up, in case you'll reboot, or something.

От:
Joachim Worringen
Дата:

Am 25.05.2010 11:38, schrieb Grzegorz Jaśkiewicz:
> WAL does the same thing to DB journaling does to the FS.
> Plus allows you to roll back (PITR).
>
> As for the RAM, it will be in ram as long as OS decides to keep it in
> RAM cache, and/or its in the shared buffers memory.

Or until I commit the transaction? I have not completely disabled
sync-to-disk in my setup, as there are of course situations where new
data comes into the database that needs to be stored in a safe manner.

> Unless you have a lot of doubt about the two, I don't think it makes
> too much sens to setup ramdisk table space yourself. But try it, and
> see yourself.
> Make sure that you have logic in place, that would set it up, before
> postgresql starts up, in case you'll reboot, or something.

That's what I thought about when mentioning "increased setup
complexity". Simply adding a keyword like "NONPERSISTENT" to the table
creation statement would be preferred...

  Joachim


От:
Andres Freund
Дата:

On Tuesday 25 May 2010 11:00:24 Joachim Worringen wrote:
> Am 25.05.2010 10:49, schrieb Grzegorz Jaśkiewicz:
> > temporary tables are handled pretty much like the regular table. The
> > magic happens on schema level, new schema is setup for connection, so
> > that it can access its own temporary tables.
> > Temporary tables also are not autovacuumed.
> > And that's pretty much the most of the differences.
>
> Thanks. So, the Write-Ahead-Logging (being used or not) does not matter?
It does matter quite significantly in my experience. Both from an io and a cpu
overhead perspective.

Andres

От:
Joachim Worringen
Дата:

Am 25.05.2010 12:41, schrieb Andres Freund:
> On Tuesday 25 May 2010 11:00:24 Joachim Worringen wrote:
>> Thanks. So, the Write-Ahead-Logging (being used or not) does not matter?
> It does matter quite significantly in my experience. Both from an io and a cpu
> overhead perspective.

O.k., looks as if I have to make my own experience... I'll let you know
if possible.

  Joachim



От:
Grzegorz Jaśkiewicz
Дата:

WAL matters in performance. Hence why it is advisable to have it on a
separate drive :)

От:
Joachim Worringen
Дата:

On 05/26/2010 06:03 PM, Joachim Worringen wrote:
> Am 25.05.2010 12:41, schrieb Andres Freund:
>> On Tuesday 25 May 2010 11:00:24 Joachim Worringen wrote:
>>> Thanks. So, the Write-Ahead-Logging (being used or not) does not matter?
>> It does matter quite significantly in my experience. Both from an io
>> and a cpu
>> overhead perspective.
>
> O.k., looks as if I have to make my own experience... I'll let you know
> if possible.

As promised, I did a tiny benchmark - basically, 8 empty tables are
filled with 100k rows each within 8 transactions (somewhat typically for
my application). The test machine has 4 cores, 64G RAM and RAID1 10k
drives for data.

# INSERTs into a TEMPORARY table:
[joachim@testsrv scaling]$ time pb query -d scaling_qry_1.xml

real    3m18.242s
user    1m59.074s
sys     1m51.001s

# INSERTs into a standard table:
[joachim@testsrv scaling]$ time pb query -d scaling_qry_1.xml

real    3m35.090s
user    2m5.295s
sys     2m2.307s

Thus, there is a slight hit of about 10% (which may even be within
meausrement variations) - your milage will vary.

  Joachim


От:
Rob Wultsch
Дата:

On Fri, May 28, 2010 at 4:04 AM, Joachim Worringen
<> wrote:
> On 05/26/2010 06:03 PM, Joachim Worringen wrote:
>>
>> Am 25.05.2010 12:41, schrieb Andres Freund:
>>>
>>> On Tuesday 25 May 2010 11:00:24 Joachim Worringen wrote:
>>>>
>>>> Thanks. So, the Write-Ahead-Logging (being used or not) does not matter?
>>>
>>> It does matter quite significantly in my experience. Both from an io
>>> and a cpu
>>> overhead perspective.
>>
>> O.k., looks as if I have to make my own experience... I'll let you know
>> if possible.
>
> As promised, I did a tiny benchmark - basically, 8 empty tables are filled
> with 100k rows each within 8 transactions (somewhat typically for my
> application). The test machine has 4 cores, 64G RAM and RAID1 10k drives for
> data.
>
> # INSERTs into a TEMPORARY table:
> [joachim@testsrv scaling]$ time pb query -d scaling_qry_1.xml
>
> real    3m18.242s
> user    1m59.074s
> sys     1m51.001s
>
> # INSERTs into a standard table:
> [joachim@testsrv scaling]$ time pb query -d scaling_qry_1.xml
>
> real    3m35.090s
> user    2m5.295s
> sys     2m2.307s
>
> Thus, there is a slight hit of about 10% (which may even be within
> meausrement variations) - your milage will vary.
>
>  Joachim
>
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
I think it would be interesting to create a ram disk and insert into
it. In the MySQL community even thought MyISAM has fallen out of use
the Memory table (based on MyISAM) is still somewhat used.


--
Rob Wultsch


От:
"Pierre C"
Дата:

> As promised, I did a tiny benchmark - basically, 8 empty tables are
> filled with 100k rows each within 8 transactions (somewhat typically for
> my application). The test machine has 4 cores, 64G RAM and RAID1 10k
> drives for data.
>
> # INSERTs into a TEMPORARY table:
> [joachim@testsrv scaling]$ time pb query -d scaling_qry_1.xml
>
> real    3m18.242s
> user    1m59.074s
> sys     1m51.001s
>
> # INSERTs into a standard table:
> [joachim@testsrv scaling]$ time pb query -d scaling_qry_1.xml
>
> real    3m35.090s
> user    2m5.295s
> sys     2m2.307s
>
> Thus, there is a slight hit of about 10% (which may even be within
> meausrement variations) - your milage will vary.

Usually WAL causes a much larger performance hit than this.

Since the following command :

CREATE TABLE tmp AS SELECT n FROM generate_series(1,1000000) AS n

which inserts 1M rows takes 1.6 seconds on my desktop, your 800k rows
INSERT taking more than 3 minutes is a bit suspicious unless :

- you got huge fields that need TOASTing ; in this case TOAST compression
will eat a lot of CPU and you're benchmarking TOAST, not the rest of the
system
- you got some non-indexed foreign key
- some other reason ?

От:
Joachim Worringen
Дата:

Am 02.06.2010 12:03, schrieb Pierre C:
> Usually WAL causes a much larger performance hit than this.
>
> Since the following command :
>
> CREATE TABLE tmp AS SELECT n FROM generate_series(1,1000000) AS n
>
> which inserts 1M rows takes 1.6 seconds on my desktop, your 800k rows
> INSERT taking more than 3 minutes is a bit suspicious unless :
>
> - you got huge fields that need TOASTing ; in this case TOAST
> compression will eat a lot of CPU and you're benchmarking TOAST, not the
> rest of the system
> - you got some non-indexed foreign key
> - some other reason ?

Yes, the "other" reason is that I am not issueing a single SQL command,
but import data from plain ASCII files through the Pyhton-based
framework into the database.

The difference between your measurement and my measurent is the upper
potential of improvement for my system (which has, on the other hand,
the advantage of being a bit more powerful and flexible than a single
SQL statement....;-) )

  Joachim


От:
"Pierre C"
Дата:

> Yes, the "other" reason is that I am not issueing a single SQL command,
> but import data from plain ASCII files through the Pyhton-based
> framework into the database.
>
> The difference between your measurement and my measurent is the upper
> potential of improvement for my system (which has, on the other hand,
> the advantage of being a bit more powerful and flexible than a single
> SQL statement....;-) )

Ah, in that case ... ;)

You could give pypy a try, sometimes it's a lot slower, sometimes it's a
lot faster.