Обсуждение: Temp tables being written to disk. Avoidable?
Hi, I have an application where I am using fulltextindex to create a searchable index for a table and wish to return pages of results, ranked in order of relevance. So for any particular page view I need to know: a) Total number of results relevant to search (so I can display page [1] 2 3 4 5 6) b) The results to be ranked by 'relevance' to search terms. c) The details for a part of that range (ie 15 or so entries per page. Database structure is something like: ====== txdev400003=# \d entry_fti Table "entry_fti"Attribute | Type | Modifier -----------+-----------------------+----------string | character varying(25) | id | oid | Index: entry_fti_idx txdev400003=# \d entry Table "entry" Attribute | Type | Modifier ------------------+----------+------------------------------------entry_id | integer | default nextval('ent_id_seq'::text)name | text | description_html | text | fti_text | text | Indices: entry_oid_idx, entry_pkey ====== (The entry table is simplified here, real one has 24 columns). My original plan was to do two/three queries, ie: ====== SELECT COUNT (entry_fti.id) as rating , id INTO TEMP TABLE searchtemp FROM entry_fti WHERE (entry_fti.string ~'^word1' OR entry_fti.string ~'^word2') GROUP BY id SELECT entry.*, searchtemp.rating FROM searchtemp, entry WHERE entry.oid=searchtemp.id ORDER BY rating DESC LIMIT 15 OFFSET 0 SELECT count(*) FROM searchtemp; (optional, depending on number of rows returned by previous query) ====== This seemed to be the cheapest way to get all the information I need. However, I noticed a disk access each time I did the search. I fiddled with the SHARED_BUFFERS and SORT_MEM and discovered they weren't the problem but that the SELECT INTO was causing the disk access. If I have multiple searches occuring concurrently I imagine this becoming a problem. Can I avoid having the temp table written to disk (unless required by running out of memory)? The temp table has a tiny lifespan and needs to be accessed for each subsequent query so there's no point in writing it to disk unless completely necessary. For the moment I have moved to a query like: ====== SELECT COUNT (entry_fti.id) as rating , entry_id, name, description_html FROM entry_fti, entry WHERE (entry_fti.string ~'^word1' OR entry_fti.string ~'^word2') AND entry_fti.id=entry.oid GROUP BY id ,entry_id, name, description_html ORDER BY rating DESC ====== which seems significantly less elegant (I need to return all rows and do my 'OFFSET' and 'LIMIT' manually, also bearing in mind that I'm pulling 20+ cols from the entry table, not the 3 I have here) but doesn't cause any disk access as the entry_fti and entry tables sit in memory. Hmm, I should to do some proper benchmarking on this rather than worrying about the noises coming from the disks.. However, even if the original method does turn out to be faster, I imagine it could be faster still without the disk writes (though I don't know anywhere near enough about the PostgreSQL internals for that to be anything other than a gut feeling). -- Paul McGarry mailto:paulm@opentec.com.au Systems Integrator http://www.opentec.com.au Opentec Pty Ltd http://www.iebusiness.com.au 6 Lyon Park Road Phone: (02) 9870 4718 North Ryde NSW 2113 Fax: (02) 9878 1755 -------------------------------------------------------------------- This document and any attachments are intended solely for the named addressee(s), are confidential, and may be subject to legal professional privilege. Please notify us (on +61-2 9878 1744) as soon as possible if you have received this document in error. Any confidentiality or privilege is not waived or lost because this email has been sent to you by mistake. This document and any attachments are subject to copyright. No part of them should be reproduced or distributed by any means whatsoever without the prior consent of the copyright owner. Opentec does not warrant that this email and any attachments are error or virus free. --------------------------------------------------------------------
Hi Tom, Thanks for your response, enlightening as always. > Not at present --- temp tables are not different from real tables, > except for some naming shenanigans. So creation of a temp table will > involve some disk hits. Ok, would it be a good idea to modify that for the future? Given that temp tables:- a) can't be seen by other connections. b) are likely to be selected upon heavily close to creation time. c) are likely to be short lived. is there any reason to move them out to disk unless strictly necessary (aside from that it may take a fair bit of re-engineering and the core developers have other more important and/or more interesting things to be getting on with)? > Do you really *need* a temp table, as opposed to writing a > sub-SELECT-in-the-FROM-clause? ISTM that that feature takes care > of most of the simple notational reasons for wanting a temp table. I have rewritten the query with such a subselect and it looks much more elegant than my previous solution: ==== SELECT rating , entry.* FROM entry , (SELECT COUNT(entry_fti.id) as rating, idFROM entry_ftiWHERE (entry_fti.string ~'^blu' OR entry_fti.string~'^word2') GROUP BY id[LIMIT X OFFSET Y]) vtable WHERE vtable.id=entry.oid ORDER BY rating DESC ==== The only I'm missing now is the bit equivalent to the SELECT COUNT(*) FROM searchtemp. With the temporary table I was effectively trying to cache that subselect (without the LIMIT) and do two queries on it, one returning the total number of rows and one returning information relating to a LIMITed number of those rows. Without a temporary table (unless there's a cunning trick I haven't thought of) it seems I either have a choice of: a) Not knowing how many rows match the full text query. b) Returning the entry details for all the rows even though I only want a small number of them. c) Doing two queries on the entry_fti table It'll be interesting to see which of b) c) or using a temporary table is more expensive. I imagine that could vary heavily on the contents of the tables and the number of concurrent searches that are occuring. Thanks again. -- Paul McGarry mailto:paulm@opentec.com.au Systems Integrator http://www.opentec.com.au Opentec Pty Ltd http://www.iebusiness.com.au 6 Lyon Park Road Phone: (02) 9870 4718 North Ryde NSW 2113 Fax: (02) 9878 1755 -------------------------------------------------------------------- This document and any attachments are intended solely for the named addressee(s), are confidential, and may be subject to legal professional privilege. Please notify us (on +61-2 9878 1744) as soon as possible if you have received this document in error. Any confidentiality or privilege is not waived or lost because this email has been sent to you by mistake. This document and any attachments are subject to copyright. No part of them should be reproduced or distributed by any means whatsoever without the prior consent of the copyright owner. Opentec does not warrant that this email and any attachments are error or virus free. --------------------------------------------------------------------
> Hi Tom, > > Thanks for your response, enlightening as always. > > > Not at present --- temp tables are not different from real tables, > > except for some naming shenanigans. So creation of a temp table will > > involve some disk hits. > > Ok, would it be a good idea to modify that for the future? Given that > temp tables:- > a) can't be seen by other connections. > b) are likely to be selected upon heavily close to creation time. > c) are likely to be short lived. > is there any reason to move them out to disk unless strictly > necessary (aside from that it may take a fair bit of > re-engineering and the core developers have other more important > and/or more interesting things to be getting on with)? If the temp table doesn't fit in memory, we will have to put it in backing store somewhere, and a disk is the logical place, right? I don't see a huge advantage of putting it in memory. We could prevent WAL writes for temp tables. That would help. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Howdy, > If the temp table doesn't fit in memory, we will have to put it in > backing store somewhere, and a disk is the logical place, right? I > don't see a huge advantage of putting it in memory. We could prevent > WAL writes for temp tables. That would help. Yes, if it won't fit in memory then clearly it needs to be put out to disk. I just thought if it could stay in memory then there is no real reason to bother the disk at all. I do realise that it's probably vastly easier said than done though, the more you start treating temp tables as a special case the more complex looking after them will get. Just idle speculation on my part really, I just happen to be sitting next to a server with a very noisy disk which remains idle most of the time as selects are far more common in my app than insert/updates The only exception to that is this temp table, which is been using fairly frequently and makes the thing grind away. -- Paul McGarry mailto:paulm@opentec.com.au Systems Integrator http://www.opentec.com.au Opentec Pty Ltd http://www.iebusiness.com.au 6 Lyon Park Road Phone: (02) 9870 4718 North Ryde NSW 2113 Fax: (02) 9878 1755 -------------------------------------------------------------------- This document and any attachments are intended solely for the named addressee(s), are confidential, and may be subject to legal professional privilege. Please notify us (on +61-2 9878 1744) as soon as possible if you have received this document in error. Any confidentiality or privilege is not waived or lost because this email has been sent to you by mistake. This document and any attachments are subject to copyright. No part of them should be reproduced or distributed by any means whatsoever without the prior consent of the copyright owner. Opentec does not warrant that this email and any attachments are error or virus free. --------------------------------------------------------------------
Paul McGarry <PaulM@opentec.com.au> writes: > Can I avoid having the temp table written to disk Not at present --- temp tables are not different from real tables, except for some naming shenanigans. So creation of a temp table will involve some disk hits. Do you really *need* a temp table, as opposed to writing a sub-SELECT-in-the-FROM-clause? ISTM that that feature takes care of most of the simple notational reasons for wanting a temp table. regards, tom lane
(Maybe this thread is dead now, but here goes anyway) Suggestion: have an environment variable or a PostgreSQL parameter to indicate where to write the temp-table to. This way, you could easily direct it to a RAM disk (if small enough) or to a drive other than your main database. Default could be $TMP. Best regards, Chris _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com