Обсуждение: Temp tables being written to disk. Avoidable?

Поиск
Список
Период
Сортировка

Temp tables being written to disk. Avoidable?

От
Paul McGarry
Дата:
Hi,

I have an application where I am using fulltextindex to
create a searchable index for a table and wish to return
pages of results, ranked in order of relevance. So for any
particular page view I need to know:
a) Total number of results relevant to search (so I can display   page [1] 2 3 4 5 6)
b) The results to be ranked by 'relevance' to search terms.
c) The details for a part of that range (ie 15 or so entries   per page.

Database structure is something like:

======
txdev400003=# \d entry_fti             Table "entry_fti"Attribute |         Type          | Modifier 
-----------+-----------------------+----------string    | character varying(25) | id        | oid                   | 
Index: entry_fti_idx

txdev400003=# \d entry                             Table "entry"   Attribute     |   Type   |               Modifier
               
 
------------------+----------+------------------------------------entry_id         | integer  | default
nextval('ent_id_seq'::text)name            | text     | description_html | text     | fti_text         | text     | 
 
Indices: entry_oid_idx,        entry_pkey
======
(The entry table is simplified here, real one has 24 columns).

My original plan was to do two/three queries, ie:
======
SELECT COUNT (entry_fti.id) as rating , id 
INTO TEMP TABLE searchtemp 
FROM entry_fti  
WHERE (entry_fti.string ~'^word1' OR entry_fti.string ~'^word2') 
GROUP BY id

SELECT entry.*, searchtemp.rating 
FROM searchtemp, entry 
WHERE entry.oid=searchtemp.id 
ORDER BY rating DESC 
LIMIT 15 OFFSET 0

SELECT count(*) FROM searchtemp; 
(optional, depending on number of rows returned by previous query)
======

This seemed to be the cheapest way to get all the information I need.
However, I noticed a disk access each time I did the search. I fiddled
with the SHARED_BUFFERS and SORT_MEM and discovered they weren't the
problem but that the SELECT INTO was causing the disk access. If I have
multiple searches occuring concurrently I imagine this becoming a 
problem. Can I avoid having the temp table written to disk (unless
required by running out of memory)? The temp table has a tiny lifespan
and needs to be accessed for each subsequent query so there's no
point in writing it to disk unless completely necessary.

For the moment I have moved to a query like:
======
SELECT COUNT (entry_fti.id) as rating , entry_id, name, description_html
FROM entry_fti, entry 
WHERE (entry_fti.string ~'^word1' OR entry_fti.string ~'^word2') 
AND entry_fti.id=entry.oid 
GROUP BY id ,entry_id, name, description_html
ORDER BY rating DESC
======
which seems significantly less elegant (I need to return all rows and
do my 'OFFSET' and 'LIMIT' manually, also bearing in mind that I'm
pulling 20+ cols from the entry table, not the 3 I have here) but 
doesn't cause any disk access as the entry_fti and entry tables sit
in memory.

Hmm, I should to do some proper benchmarking on this rather than
worrying about the noises coming from the disks..

However, even if the original method does turn out to be faster,
I imagine it could be faster still without the disk writes (though
I don't know anywhere near enough about the PostgreSQL internals
for that to be anything other than a gut feeling).

--
Paul McGarry            mailto:paulm@opentec.com.au 
Systems Integrator      http://www.opentec.com.au 
Opentec Pty Ltd         http://www.iebusiness.com.au
6 Lyon Park Road        Phone: (02) 9870 4718 
North Ryde NSW 2113     Fax:   (02) 9878 1755 
--------------------------------------------------------------------
This document and any attachments are intended solely for
the named addressee(s), are confidential, and may be subject to
legal professional privilege. Please notify us (on +61-2 9878 1744)
as soon as possible if you have received this document in error.
Any confidentiality or privilege is not waived or lost because this
email has been sent to you by mistake. This document and any
attachments are subject to copyright.  No part of them should be
reproduced or distributed by any means whatsoever without the
prior consent of the copyright owner.  Opentec does not warrant
that this email and any attachments are error or virus free.
--------------------------------------------------------------------


RE: Temp tables being written to disk. Avoidable?

От
Paul McGarry
Дата:
Hi Tom,

Thanks for your response, enlightening as always.

> Not at present --- temp tables are not different from real tables,
> except for some naming shenanigans.  So creation of a temp table will
> involve some disk hits.

Ok, would it be a good idea to modify that for the future? Given that
temp tables:-
a) can't be seen by other connections.
b) are likely to be selected upon heavily close to creation time.
c) are likely to be short lived.
is there any reason to move them out to disk unless strictly
necessary (aside from that it may take a fair bit of 
re-engineering and the core developers have other more important 
and/or more interesting things to be getting on with)?

> Do you really *need* a temp table, as opposed to writing a
> sub-SELECT-in-the-FROM-clause?  ISTM that that feature takes care
> of most of the simple notational reasons for wanting a temp table.

I have rewritten the query with such a subselect and it looks
much more elegant than my previous solution:

====
SELECT rating , entry.*
FROM  entry , (SELECT COUNT(entry_fti.id) as rating, idFROM entry_ftiWHERE (entry_fti.string ~'^blu' OR
entry_fti.string~'^word2') GROUP BY id[LIMIT X OFFSET Y])  vtable
 
WHERE vtable.id=entry.oid 
ORDER BY rating DESC
====

The only I'm missing now is the bit equivalent to the
SELECT COUNT(*) FROM searchtemp.

With the temporary table I was effectively trying to
cache that subselect (without the LIMIT) and do two
queries on it, one returning the total number of
rows and one returning information relating to a
LIMITed number of those rows.

Without a temporary table (unless there's a cunning
trick I haven't thought of) it seems I either have
a choice of:

a) Not knowing how many rows match the full text query.
b) Returning the entry details for all the rows even though  I only want a small number of them.
c) Doing two queries on the entry_fti table

It'll be interesting to see which of b) c) or using a temporary
table is more expensive. I imagine that could vary heavily
on the contents of the tables and the number of concurrent
searches that are occuring.

Thanks again.

--
Paul McGarry            mailto:paulm@opentec.com.au 
Systems Integrator      http://www.opentec.com.au 
Opentec Pty Ltd         http://www.iebusiness.com.au
6 Lyon Park Road        Phone: (02) 9870 4718 
North Ryde NSW 2113     Fax:   (02) 9878 1755  
--------------------------------------------------------------------
This document and any attachments are intended solely for
the named addressee(s), are confidential, and may be subject to
legal professional privilege. Please notify us (on +61-2 9878 1744)
as soon as possible if you have received this document in error.
Any confidentiality or privilege is not waived or lost because this
email has been sent to you by mistake. This document and any
attachments are subject to copyright.  No part of them should be
reproduced or distributed by any means whatsoever without the
prior consent of the copyright owner.  Opentec does not warrant
that this email and any attachments are error or virus free.
--------------------------------------------------------------------


Re: Temp tables being written to disk. Avoidable?

От
Bruce Momjian
Дата:
> Hi Tom,
> 
> Thanks for your response, enlightening as always.
> 
> > Not at present --- temp tables are not different from real tables,
> > except for some naming shenanigans.  So creation of a temp table will
> > involve some disk hits.
> 
> Ok, would it be a good idea to modify that for the future? Given that
> temp tables:-
> a) can't be seen by other connections.
> b) are likely to be selected upon heavily close to creation time.
> c) are likely to be short lived.
> is there any reason to move them out to disk unless strictly
> necessary (aside from that it may take a fair bit of 
> re-engineering and the core developers have other more important 
> and/or more interesting things to be getting on with)?

If the temp table doesn't fit in memory, we will have to put it in
backing store somewhere, and a disk is the logical place, right?  I
don't see a huge advantage of putting it in memory.  We could prevent
WAL writes for temp tables.  That would help.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: Temp tables being written to disk. Avoidable?

От
Paul McGarry
Дата:
Howdy,

> If the temp table doesn't fit in memory, we will have to put it in
> backing store somewhere, and a disk is the logical place, right?  I
> don't see a huge advantage of putting it in memory.  We could prevent
> WAL writes for temp tables.  That would help.

Yes, if it won't fit in memory then clearly it needs to be put out 
to disk. I just thought if it could stay in memory then there is no
real reason to bother the disk at all. 

I do realise that it's probably vastly easier said than done though,
the more you start treating temp tables as a special case the more 
complex looking after them will get.

Just idle speculation on my part really, I just happen to be sitting
next to a server with a very noisy disk which remains idle most of
the time as selects are far more common in my app than insert/updates
The only exception to that is this temp table, which is been using
fairly frequently and makes the thing grind away.


--
Paul McGarry            mailto:paulm@opentec.com.au 
Systems Integrator      http://www.opentec.com.au 
Opentec Pty Ltd         http://www.iebusiness.com.au
6 Lyon Park Road        Phone: (02) 9870 4718 
North Ryde NSW 2113     Fax:   (02) 9878 1755  

--------------------------------------------------------------------
This document and any attachments are intended solely for
the named addressee(s), are confidential, and may be subject to
legal professional privilege. Please notify us (on +61-2 9878 1744)
as soon as possible if you have received this document in error.
Any confidentiality or privilege is not waived or lost because this
email has been sent to you by mistake. This document and any
attachments are subject to copyright.  No part of them should be
reproduced or distributed by any means whatsoever without the
prior consent of the copyright owner.  Opentec does not warrant
that this email and any attachments are error or virus free.
--------------------------------------------------------------------


Re: Temp tables being written to disk. Avoidable?

От
Tom Lane
Дата:
Paul McGarry <PaulM@opentec.com.au> writes:
> Can I avoid having the temp table written to disk

Not at present --- temp tables are not different from real tables,
except for some naming shenanigans.  So creation of a temp table will
involve some disk hits.

Do you really *need* a temp table, as opposed to writing a
sub-SELECT-in-the-FROM-clause?  ISTM that that feature takes care
of most of the simple notational reasons for wanting a temp table.
        regards, tom lane


Re: Temp tables being written to disk. Avoidable? [Another TO-DO]

От
"Chris Ruprecht"
Дата:
(Maybe this thread is dead now, but here goes anyway)
Suggestion: have an environment variable or a PostgreSQL parameter to
indicate where to write the temp-table to. This way, you could easily direct
it to a RAM disk (if small enough) or to a drive other than your main
database. Default could be $TMP.

Best regards,
Chris



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com