Temp tables being written to disk. Avoidable?

Поиск
Список
Период
Сортировка
От Paul McGarry
Тема Temp tables being written to disk. Avoidable?
Дата
Msg-id F3FC09F8BCF8D411B73B00D0B7BD06E60F7069@otnrnsx1.opentec.com.au
обсуждение исходный текст
Ответы Re: Temp tables being written to disk. Avoidable?
Список pgsql-sql
Hi,

I have an application where I am using fulltextindex to
create a searchable index for a table and wish to return
pages of results, ranked in order of relevance. So for any
particular page view I need to know:
a) Total number of results relevant to search (so I can display   page [1] 2 3 4 5 6)
b) The results to be ranked by 'relevance' to search terms.
c) The details for a part of that range (ie 15 or so entries   per page.

Database structure is something like:

======
txdev400003=# \d entry_fti             Table "entry_fti"Attribute |         Type          | Modifier 
-----------+-----------------------+----------string    | character varying(25) | id        | oid                   | 
Index: entry_fti_idx

txdev400003=# \d entry                             Table "entry"   Attribute     |   Type   |               Modifier
               
 
------------------+----------+------------------------------------entry_id         | integer  | default
nextval('ent_id_seq'::text)name            | text     | description_html | text     | fti_text         | text     | 
 
Indices: entry_oid_idx,        entry_pkey
======
(The entry table is simplified here, real one has 24 columns).

My original plan was to do two/three queries, ie:
======
SELECT COUNT (entry_fti.id) as rating , id 
INTO TEMP TABLE searchtemp 
FROM entry_fti  
WHERE (entry_fti.string ~'^word1' OR entry_fti.string ~'^word2') 
GROUP BY id

SELECT entry.*, searchtemp.rating 
FROM searchtemp, entry 
WHERE entry.oid=searchtemp.id 
ORDER BY rating DESC 
LIMIT 15 OFFSET 0

SELECT count(*) FROM searchtemp; 
(optional, depending on number of rows returned by previous query)
======

This seemed to be the cheapest way to get all the information I need.
However, I noticed a disk access each time I did the search. I fiddled
with the SHARED_BUFFERS and SORT_MEM and discovered they weren't the
problem but that the SELECT INTO was causing the disk access. If I have
multiple searches occuring concurrently I imagine this becoming a 
problem. Can I avoid having the temp table written to disk (unless
required by running out of memory)? The temp table has a tiny lifespan
and needs to be accessed for each subsequent query so there's no
point in writing it to disk unless completely necessary.

For the moment I have moved to a query like:
======
SELECT COUNT (entry_fti.id) as rating , entry_id, name, description_html
FROM entry_fti, entry 
WHERE (entry_fti.string ~'^word1' OR entry_fti.string ~'^word2') 
AND entry_fti.id=entry.oid 
GROUP BY id ,entry_id, name, description_html
ORDER BY rating DESC
======
which seems significantly less elegant (I need to return all rows and
do my 'OFFSET' and 'LIMIT' manually, also bearing in mind that I'm
pulling 20+ cols from the entry table, not the 3 I have here) but 
doesn't cause any disk access as the entry_fti and entry tables sit
in memory.

Hmm, I should to do some proper benchmarking on this rather than
worrying about the noises coming from the disks..

However, even if the original method does turn out to be faster,
I imagine it could be faster still without the disk writes (though
I don't know anywhere near enough about the PostgreSQL internals
for that to be anything other than a gut feeling).

--
Paul McGarry            mailto:paulm@opentec.com.au 
Systems Integrator      http://www.opentec.com.au 
Opentec Pty Ltd         http://www.iebusiness.com.au
6 Lyon Park Road        Phone: (02) 9870 4718 
North Ryde NSW 2113     Fax:   (02) 9878 1755 
--------------------------------------------------------------------
This document and any attachments are intended solely for
the named addressee(s), are confidential, and may be subject to
legal professional privilege. Please notify us (on +61-2 9878 1744)
as soon as possible if you have received this document in error.
Any confidentiality or privilege is not waived or lost because this
email has been sent to you by mistake. This document and any
attachments are subject to copyright.  No part of them should be
reproduced or distributed by any means whatsoever without the
prior consent of the copyright owner.  Opentec does not warrant
that this email and any attachments are error or virus free.
--------------------------------------------------------------------


В списке pgsql-sql по дате отправления:

Предыдущее
От: "Josh Berkus"
Дата:
Сообщение: Re: Deadlocks? What happened to MVCC?
Следующее
От: "Robert J. Sanford, Jr."
Дата:
Сообщение: result sets from functions...