Обсуждение: experiments in query optimization
Hi everyone, I've been trying to reduce both memory usage and runtime for a query. Comments/suggestions gratefully received. Details are at http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf See particularly Section 1 - Background and Discussion. If you want a text version, see http://bulldog.duhs.duke.edu/~faheem/snppy/opt.tex For background see http://bulldog.duhs.duke.edu/~faheem/snppy/diag.pdf (text version http://bulldog.duhs.duke.edu/~faheem/snppy/diag.tex) and http://bulldog.duhs.duke.edu/~faheem/snppy/snppy.pdf Please CC any replies to me at the above email address. Thanks. Regards, Faheem.
On Thu, Mar 25, 2010 at 3:57 PM, Faheem Mitha <faheem@email.unc.edu> wrote: > > Hi everyone, > > I've been trying to reduce both memory usage and runtime for a query. > Comments/suggestions gratefully received. Details are at > > http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf > > See particularly Section 1 - Background and Discussion. > > If you want a text version, see > > http://bulldog.duhs.duke.edu/~faheem/snppy/opt.tex > > For background see > > http://bulldog.duhs.duke.edu/~faheem/snppy/diag.pdf (text version > http://bulldog.duhs.duke.edu/~faheem/snppy/diag.tex) and > http://bulldog.duhs.duke.edu/~faheem/snppy/snppy.pdf > > Please CC any replies to me at the above email address. Thanks. Didn't you (or someone) post about these queries before? It's not really too clear to me from reading this what specific questions you're trying to answer. One random thought: WHERE row_number() = 1 is not too efficient. Try using LIMIT or DISTINCT ON instead. If you're concerned about memory usage, try reducing work_mem; you've probably got it set to something huge. You might need to create some indices, too. ...Robert
On Mon, 29 Mar 2010, Robert Haas wrote: > On Thu, Mar 25, 2010 at 3:57 PM, Faheem Mitha <faheem@email.unc.edu> wrote: >> >> Hi everyone, >> >> I've been trying to reduce both memory usage and runtime for a query. >> Comments/suggestions gratefully received. Details are at >> >> http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf >> >> See particularly Section 1 - Background and Discussion. >> >> If you want a text version, see >> >> http://bulldog.duhs.duke.edu/~faheem/snppy/opt.tex >> >> For background see >> >> http://bulldog.duhs.duke.edu/~faheem/snppy/diag.pdf (text version >> http://bulldog.duhs.duke.edu/~faheem/snppy/diag.tex) and >> http://bulldog.duhs.duke.edu/~faheem/snppy/snppy.pdf >> >> Please CC any replies to me at the above email address. Thanks. > > Didn't you (or someone) post about these queries before? I did write to the list about an earlier version of these queries, yes. In fact you replied to that message. > It's not really too clear to me from reading this what specific > questions you're trying to answer. Quote from opt.{tex/pdf}, Section 1: "If I have to I can use Section~\ref{ped_hybrid} and Section~\ref{tped_hybrid}, but I am left wondering why I get the performance I do out of the earlier versions. Specifically, why is Section~\ref{ped_bigjoin} so much slower than Section~\ref{ped_trunc}, and why does the memory usage in Section~\ref{ped_phenoout} blow up relative to Section~\ref{ped_bigjoin} and Section~\ref{ped_trunc}?" > One random thought: WHERE row_number() = 1 is not too efficient. > Try using LIMIT or DISTINCT ON instead. Possibly. However, the CTE that uses WHERE row_number() = 1 doesn't dominate the runtime or memory usage, so I'm not too concerned about it. > If you're concerned about memory usage, try reducing work_mem; you've > probably got it set to something huge. work_mem = 1 GB (see diag.{tex/pdf}). The point isn't that I'm using so much memory. Again, my question is, why are these changes affecting memory usage so drastically? > You might need to create some indices, too. Ok. To what purpose? This query picks up everything from the tables and the planner does table scans, so conventional wisdom and indeed my experience, says that indexes are not going to be so useful. Regards, Faheem.
On Mon, Mar 29, 2010 at 2:31 PM, Faheem Mitha <faheem@email.unc.edu> wrote: >> It's not really too clear to me from reading this what specific >> questions you're trying to answer. > > Quote from opt.{tex/pdf}, Section 1: > > "If I have to I can use Section~\ref{ped_hybrid} and > Section~\ref{tped_hybrid}, but I am left wondering why I get the performance > I do out of the earlier versions. Specifically, why is > Section~\ref{ped_bigjoin} so much slower than Section~\ref{ped_trunc}, and > why does the memory usage in Section~\ref{ped_phenoout} blow up relative to > Section~\ref{ped_bigjoin} and Section~\ref{ped_trunc}?" Here and in the document, you refer to section numbers for the "hybrid" version but I don't see where you define what the "hybrid" version actually is. And the differences between your queries are not real clear either - first you say you took out pheno and sex because they weren't necessary, but then you decide to put them back. I don't know what that means. If they're not necessary, leave them out. >> One random thought: WHERE row_number() = 1 is not too efficient. >> Try using LIMIT or DISTINCT ON instead. > > Possibly. However, the CTE that uses > > WHERE row_number() = 1 > > doesn't dominate the runtime or memory usage, so I'm not too concerned > about it. Hmm, you might be right. >> If you're concerned about memory usage, try reducing work_mem; you've >> probably got it set to something huge. > > work_mem = 1 GB (see diag.{tex/pdf}). > > The point isn't that I'm using so much memory. Again, my question is, why > are these changes affecting memory usage so drastically? Well each sort or hash can use an amount of memory that is limited from above by work_mem. So if you write the query in a way that involves more sorts or hashes, each one can add up to 1GB to your memory usage, plus overhead. However, it doesn't look like any of your queries including 30 sorts or hashes, so I'm thinking that the RSS number probably also includes some of the shared memory that has been mapped into each backend's address space. RSS is not a terribly reliable number when dealing with shared memory; it's hard to say what that really means. >> You might need to create some indices, too. > > Ok. To what purpose? This query picks up everything from the tables and the > planner does table scans, so conventional wisdom and indeed my experience, > says that indexes are not going to be so useful. Well, a hash join is not usually the first thing that pops to mind when dealing with a table that has 825 million rows (geno). I don't know if a nested loop with inner-indexscan would be faster, but it would almost certainly use less memory. ...Robert
On Mon, 29 Mar 2010, Robert Haas wrote: > On Mon, Mar 29, 2010 at 2:31 PM, Faheem Mitha <faheem@email.unc.edu> wrote: >>> It's not really too clear to me from reading this what specific >>> questions you're trying to answer. >> >> Quote from opt.{tex/pdf}, Section 1: >> >> "If I have to I can use Section~\ref{ped_hybrid} and >> Section~\ref{tped_hybrid}, but I am left wondering why I get the performance >> I do out of the earlier versions. Specifically, why is >> Section~\ref{ped_bigjoin} so much slower than Section~\ref{ped_trunc}, and >> why does the memory usage in Section~\ref{ped_phenoout} blow up relative to >> Section~\ref{ped_bigjoin} and Section~\ref{ped_trunc}?" > > Here and in the document, you refer to section numbers for the > "hybrid" version but I don't see where you define what the "hybrid" > version actually is. It is defined later in the file. I don't know if you are looking at the pdf, but if so, it is Section 2.4 (for the hybrid PED query). In the text file, I guess the easist way would be to grep for the label ped_hybrid. > And the differences between your queries are not real clear either - > first you say you took out pheno and sex because they weren't necessary, > but then you decide to put them back. I don't know what that means. > If they're not necessary, leave them out. I don't see where I say that pheno and sex weren't necessary. In fact, the word 'necessary' does not appear in the opt document. I took them out to see how it would affect performance. Which is does, dramatically. I say "So, I decided to remove the joins to tables corresponding to the patient data, namely pheno and sex, and the runtime dropped to 150 min, while the memory stayed around 5G." Maybe I wasn't being sufficiently explicit here. Perhaps "So, I decided to remove the joins to tables corresponding to the patient data, namely pheno and sex, to see how it would affect performance..." would have been better. >>> One random thought: WHERE row_number() = 1 is not too efficient. >>> Try using LIMIT or DISTINCT ON instead. >> >> Possibly. However, the CTE that uses >> >> WHERE row_number() = 1 >> >> doesn't dominate the runtime or memory usage, so I'm not too concerned >> about it. > > Hmm, you might be right. > >>> If you're concerned about memory usage, try reducing work_mem; you've >>> probably got it set to something huge. >> >> work_mem = 1 GB (see diag.{tex/pdf}). >> >> The point isn't that I'm using so much memory. Again, my question is, why >> are these changes affecting memory usage so drastically? > > Well each sort or hash can use an amount of memory that is limited > from above by work_mem. So if you write the query in a way that > involves more sorts or hashes, each one can add up to 1GB to your > memory usage, plus overhead. However, it doesn't look like any of > your queries including 30 sorts or hashes, so I'm thinking that the > RSS number probably also includes some of the shared memory that has > been mapped into each backend's address space. RSS is not a terribly > reliable number when dealing with shared memory; it's hard to say what > that really means. >>> You might need to create some indices, too. >> Ok. To what purpose? This query picks up everything from the tables and the >> planner does table scans, so conventional wisdom and indeed my experience, >> says that indexes are not going to be so useful. > Well, a hash join is not usually the first thing that pops to mind when > dealing with a table that has 825 million rows (geno). I don't know if > a nested loop with inner-indexscan would be faster, but it would almost > certainly use less memory. Can you provide an illustration of what you mean? I don't know what a "nested loop with inner-indexscan" is in this context. Regards, Faheem.
Faheem Mitha <faheem@email.unc.edu> wrote: >> If you're concerned about memory usage, try reducing work_mem; >> you've probably got it set to something huge. > > work_mem = 1 GB (see diag.{tex/pdf}). > > The point isn't that I'm using so much memory. Again, my question > is, why are these changes affecting memory usage so drastically? Because the planner looks at a very wide variety of plans, some of which may use many allocations of work_mem size, and some of which don't. The costs are compared and the lowest cost one is chosen. If you are close to the "tipping point" then even a very small change might affect which is chosen. It pays to keep the work_mem setting sane so that unexpected plan changes don't cause problems. Look at the plans and their costs to get a feel for what's being chosen and why. Although it's a very bad idea to use these in production, you can often shift the plan to something you *think* would be better using the enable_* settings, to see what the planner thinks such a plan will cost and where it thinks the cost would be; that can help in tuning the settings. >> You might need to create some indices, too. > > Ok. To what purpose? This query picks up everything from the > tables and the planner does table scans, so conventional wisdom > and indeed my experience, says that indexes are not going to be so > useful. There are situations where scanning the entire table to build up a hash table is more expensive than using an index. Why not test it? -Kevin
On Tue, 30 Mar 2010, Kevin Grittner wrote: > Faheem Mitha <faheem@email.unc.edu> wrote: > >>> If you're concerned about memory usage, try reducing work_mem; >>> you've probably got it set to something huge. >> >> work_mem = 1 GB (see diag.{tex/pdf}). >> >> The point isn't that I'm using so much memory. Again, my question >> is, why are these changes affecting memory usage so drastically? > > Because the planner looks at a very wide variety of plans, some of > which may use many allocations of work_mem size, and some of which > don't. The costs are compared and the lowest cost one is chosen. If > you are close to the "tipping point" then even a very small change > might affect which is chosen. It pays to keep the work_mem setting > sane so that unexpected plan changes don't cause problems. Sure, but define sane setting, please. I guess part of the point is that I'm trying to keep memory low, and it seems this is not part of the planner's priorities. That it, it does not take memory usage into consideration when choosing a plan. If that it wrong, let me know, but that is my understanding. > Look at the plans and their costs to get a feel for what's being > chosen and why. Although it's a very bad idea to use these in > production, you can often shift the plan to something you *think* > would be better using the enable_* settings, to see what the planner > thinks such a plan will cost and where it thinks the cost would be; > that can help in tuning the settings. Right. You mean to close off certain options to the planner using 'Planner Method Configuration'. I suppose one can also use 'Planner Cost Constants' to alter plan behaviour. I haven't tried changing these. >>> You might need to create some indices, too. >> >> Ok. To what purpose? This query picks up everything from the >> tables and the planner does table scans, so conventional wisdom >> and indeed my experience, says that indexes are not going to be so >> useful. > > There are situations where scanning the entire table to build up a > hash table is more expensive than using an index. Why not test it? Certainly, but I don't know what you and Robert have in mind, and I'm not experienced enough to make an educated guess. I'm open to specific suggestions. Regards, Faheem.
On thing which I haven't really mentioned in this thread or in my writeup, is that the planners value for the number of rows in geno is way off base some of the time. It is around 800 million, it thinks it is 100 million. I don't know if this is significant or not, or what to do about it. eg. in the ped_bigjoin EXPLAIN ANALYZE VERBOSE: -> Sort (cost=56855882.72..57144683.54 rows=115520330 width=42) (actual time=23027732.092..37113627.380 rows=823086774 loops=1) Output: (CASE WHEN (hapmap.geno.snpval_id = (-1)) THEN '0 0'::text WHEN (hapmap.geno.snpval_id = 0) THEN (((dedup_patient_anno.allelea_id)::text || ' '::text) || (dedup_patient_anno.allelea_id)::text) WHEN (hapmap.geno.snpval_id = 1) THEN (((dedup_patient_anno.allelea_id)::text || ' '::text) || (dedup_patient_anno.alleleb_id)::text) WHEN (hapmap.geno.snpval_id = 2) THEN (((dedup_patient_anno.alleleb_id)::text || ' '::text) || (dedup_patient_anno.alleleb_id)::text) ELSE NULL::text END), hapmap.geno.idlink_id, hapmap.geno.anno_id, pheno.patientid, pheno.phenotype, sex.code Faheem. On Tue, 30 Mar 2010, Kevin Grittner wrote: > Faheem Mitha <faheem@email.unc.edu> wrote: > >>> If you're concerned about memory usage, try reducing work_mem; >>> you've probably got it set to something huge. >> >> work_mem = 1 GB (see diag.{tex/pdf}). >> >> The point isn't that I'm using so much memory. Again, my question >> is, why are these changes affecting memory usage so drastically? > > Because the planner looks at a very wide variety of plans, some of > which may use many allocations of work_mem size, and some of which > don't. The costs are compared and the lowest cost one is chosen. If > you are close to the "tipping point" then even a very small change > might affect which is chosen. It pays to keep the work_mem setting > sane so that unexpected plan changes don't cause problems. > > Look at the plans and their costs to get a feel for what's being > chosen and why. Although it's a very bad idea to use these in > production, you can often shift the plan to something you *think* > would be better using the enable_* settings, to see what the planner > thinks such a plan will cost and where it thinks the cost would be; > that can help in tuning the settings. > >>> You might need to create some indices, too. >> >> Ok. To what purpose? This query picks up everything from the >> tables and the planner does table scans, so conventional wisdom >> and indeed my experience, says that indexes are not going to be so >> useful. > > There are situations where scanning the entire table to build up a > hash table is more expensive than using an index. Why not test it? > > -Kevin > >
On Tue, Mar 30, 2010 at 12:30 PM, Faheem Mitha <faheem@email.unc.edu> wrote: > Sure, but define sane setting, please. I guess part of the point is that I'm > trying to keep memory low, and it seems this is not part of the planner's > priorities. That it, it does not take memory usage into consideration when > choosing a plan. If that it wrong, let me know, but that is my > understanding. I don't understand quite why you're confused here. We've already explained to you that the planner will not employ a plan that uses more than the amount of memory defined by work_mem for each sort or hash. Typical settings for work_mem are between 1MB and 64MB. 1GB is enormous. >>>> You might need to create some indices, too. >>> >>> Ok. To what purpose? This query picks up everything from the >>> tables and the planner does table scans, so conventional wisdom >>> and indeed my experience, says that indexes are not going to be so >>> useful. >> >> There are situations where scanning the entire table to build up a >> hash table is more expensive than using an index. Why not test it? > > Certainly, but I don't know what you and Robert have in mind, and I'm not > experienced enough to make an educated guess. I'm open to specific > suggestions. Try creating an index on geno on the columns that are being used for the join. ...Robert
On Tue, 30 Mar 2010, Faheem Mitha wrote: >>> work_mem = 1 GB (see diag.{tex/pdf}). > > Sure, but define sane setting, please. I guess part of the point is that I'm > trying to keep memory low You're trying to keep memory usage low, but you have work_mem set to 1GB? Matthew -- "Prove to thyself that all circuits that radiateth and upon which thou worketh are grounded, lest they lift thee to high-frequency potential and cause thee to radiate also. " -- The Ten Commandments of Electronics
On Wed, 31 Mar 2010, Matthew Wakeling wrote: > On Tue, 30 Mar 2010, Faheem Mitha wrote: >>>> work_mem = 1 GB (see diag.{tex/pdf}). >> >> Sure, but define sane setting, please. I guess part of the point is that >> I'm trying to keep memory low > > You're trying to keep memory usage low, but you have work_mem set to 1GB? I'm trying to keep both runtime and memory usage low. I assume that with lower levels of memory, the runtime would be longer, other things being equal. Regards, Faheem.
[If Kevin Grittner reads this, please fix your email address. I am getting bounces from your email address.] On Tue, 30 Mar 2010, Robert Haas wrote: > On Tue, Mar 30, 2010 at 12:30 PM, Faheem Mitha <faheem@email.unc.edu> wrote: >> Sure, but define sane setting, please. I guess part of the point is that I'm >> trying to keep memory low, and it seems this is not part of the planner's >> priorities. That it, it does not take memory usage into consideration when >> choosing a plan. If that it wrong, let me know, but that is my >> understanding. > > I don't understand quite why you're confused here. We've already > explained to you that the planner will not employ a plan that uses > more than the amount of memory defined by work_mem for each sort or > hash. > Typical settings for work_mem are between 1MB and 64MB. 1GB is enormous. I don't think I am confused. To be clear, when I said "it does not take memory usage into consideration' I was talking about overall memory usage. Let me summarize: The planner will choose the plan with the minimum total cost, with the constraint that the number of memory used for each of certain steps is less than work_mem. In other words with k such steps it can use at most k(plan)*work_mem memory where k(plan) denotes that k is a function of the plan. (I'm assuming here that memory is not shared between the different steps). However, k(plan)*work_mem is not itself bounded. I fail to see how reducing work_mem significantly would help me. This would mean that the current plans I am using would likely be ruled out, and I would be left with plans which, by definition, would have larger cost and so longer run times. The current runtimes are already quite long - for the PED query, the best I can do with work_mem=1 GB is 2 1/2 hrs, and that is after splitting the query into two pieces. I might actually be better off *increasing* the memory, since then the planner would have more flexibility to choose plans where the individual steps might require more memory, but the overall memory sum might be lower. >>>>> You might need to create some indices, too. >>>> >>>> Ok. To what purpose? This query picks up everything from the >>>> tables and the planner does table scans, so conventional wisdom >>>> and indeed my experience, says that indexes are not going to be so >>>> useful. >>> >>> There are situations where scanning the entire table to build up a >>> hash table is more expensive than using an index. Why not test it? >> >> Certainly, but I don't know what you and Robert have in mind, and I'm not >> experienced enough to make an educated guess. I'm open to specific >> suggestions. > > Try creating an index on geno on the columns that are being used for the join. Ok, I'll try that. I guess the cols in question on geno are idlink_id and anno_id. I thought that I already had indexes on them, but no. Maybe I had indexes, but removed them. If I understand the way this works, if you request, say an INNER JOIN, the planner can choose different ways/algorithms to do this, as in http://en.wikipedia.org/wiki/Join_(SQL)#Nested_loops . It may choose a hash join, or an nested loop join or something else, based on cost. If the indexes don't exist that may make the inner loop join more expensive, so tip the balance in favor of using a hash join. However, I have no way to control which option it chooses, short of disabling eg. the hash join option, which is not an option for production usage anyway. Correct? Regards, Faheem.
On Wed, Mar 31, 2010 at 6:10 AM, Faheem Mitha <faheem@email.unc.edu> wrote: > > [If Kevin Grittner reads this, please fix your email address. I am getting > bounces from your email address.] > > On Tue, 30 Mar 2010, Robert Haas wrote: > >> On Tue, Mar 30, 2010 at 12:30 PM, Faheem Mitha <faheem@email.unc.edu> >> wrote: >>> >>> Sure, but define sane setting, please. I guess part of the point is that >>> I'm >>> trying to keep memory low, and it seems this is not part of the planner's >>> priorities. That it, it does not take memory usage into consideration >>> when >>> choosing a plan. If that it wrong, let me know, but that is my >>> understanding. >> >> I don't understand quite why you're confused here. We've already >> explained to you that the planner will not employ a plan that uses >> more than the amount of memory defined by work_mem for each sort or >> hash. > >> Typical settings for work_mem are between 1MB and 64MB. 1GB is enormous. > > I don't think I am confused. To be clear, when I said "it does not take > memory usage into consideration' I was talking about overall memory usage. > Let me summarize: > > The planner will choose the plan with the minimum total cost, with the > constraint that the number of memory used for each of certain steps is less > than work_mem. In other words with k such steps it can use at most > > k(plan)*work_mem > > memory where k(plan) denotes that k is a function of the plan. (I'm assuming > here that memory is not shared between the different steps). However, > k(plan)*work_mem is not itself bounded. I fail to see how reducing work_mem > significantly would help me. This would mean that the current plans I am > using would likely be ruled out, and I would be left with plans which, by > definition, would have larger cost and so longer run times. The current > runtimes are already quite long - for the PED query, the best I can do with > work_mem=1 GB is 2 1/2 hrs, and that is after splitting the query into two > pieces. > > I might actually be better off *increasing* the memory, since then the > planner would have more flexibility to choose plans where the individual > steps might require more memory, but the overall memory sum might be lower. OK, your understanding is correct. >>>>>> You might need to create some indices, too. >>>>> >>>>> Ok. To what purpose? This query picks up everything from the >>>>> tables and the planner does table scans, so conventional wisdom >>>>> and indeed my experience, says that indexes are not going to be so >>>>> useful. >>>> >>>> There are situations where scanning the entire table to build up a >>>> hash table is more expensive than using an index. Why not test it? >>> >>> Certainly, but I don't know what you and Robert have in mind, and I'm not >>> experienced enough to make an educated guess. I'm open to specific >>> suggestions. >> >> Try creating an index on geno on the columns that are being used for the >> join. > > Ok, I'll try that. I guess the cols in question on geno are idlink_id and > anno_id. I thought that I already had indexes on them, but no. Maybe I had > indexes, but removed them. > > If I understand the way this works, if you request, say an INNER JOIN, the > planner can choose different ways/algorithms to do this, as in > http://en.wikipedia.org/wiki/Join_(SQL)#Nested_loops . It may choose a hash > join, or an nested loop join or something else, based on cost. If the > indexes don't exist that may make the inner loop join more expensive, so tip > the balance in favor of using a hash join. However, I have no way to control > which option it chooses, short of disabling eg. the hash join option, which > is not an option for production usage anyway. Correct? Yep. ...Robert
On Wed, 31 Mar 2010, Faheem Mitha wrote: > On Tue, 30 Mar 2010, Robert Haas wrote: > >> On Tue, Mar 30, 2010 at 12:30 PM, Faheem Mitha <faheem@email.unc.edu> >>>>>> You might need to create some indices, too. >>>>> >>>>> Ok. To what purpose? This query picks up everything from the >>>>> tables and the planner does table scans, so conventional wisdom >>>>> and indeed my experience, says that indexes are not going to be so >>>>> useful. >>>> >>>> There are situations where scanning the entire table to build up a >>>> hash table is more expensive than using an index. Why not test it? >>> >>> Certainly, but I don't know what you and Robert have in mind, and I'm not >>> experienced enough to make an educated guess. I'm open to specific >>> suggestions. >> >> Try creating an index on geno on the columns that are being used for the >> join. > > Ok, I'll try that. I guess the cols in question on geno are idlink_id and > anno_id. I thought that I already had indexes on them, but no. Maybe I had > indexes, but removed them. Looking at this more closely, idlink_id and anno_id are primary keys, so already have indexes on them, so my understanding (from the docs) is there is no purpose in creating them. That's why I removed the indexes that were there (back last August, actually, according to my logs). Anyway, doesn't look there is anything I can do here. Does anyone have additions or corrections to this? Regards, Faheem.
On Thu, Apr 1, 2010 at 7:46 AM, Faheem Mitha <faheem@email.unc.edu> wrote:
When you do a join, you typically have a foreign key in one table referencing a primary key in another table. While designating a foreign key does put a constraint on the key to ensure referential integrity, it does not put an index on the column that is being designated as a foreign key. If I understand correctly, the scan done as the inner loop of the nested loop scan for the join is going to be your foreign key column, not your primary key column. Thus, if you have no index on the foreign key column, you will be forced to do a sequential table scan to do the join. In that case the hash-based join will almost certainly be faster (especially for such a large number of rows). If you put an index on the foreign key, then the inner scan can be an index scan and that might turn out to be faster than building the hash indexes on all the table rows.
Somebody can correct me if I'm wrong.
Looking at this more closely, idlink_id and anno_id are primary keys, so already have indexes on them, so my understanding (from the docs) is there is no purpose in creating them. That's why I removed the indexes that were there (back last August, actually, according to my logs). Anyway, doesn't look there is anything I can do here. Does anyone have additions or corrections to this?
When you do a join, you typically have a foreign key in one table referencing a primary key in another table. While designating a foreign key does put a constraint on the key to ensure referential integrity, it does not put an index on the column that is being designated as a foreign key. If I understand correctly, the scan done as the inner loop of the nested loop scan for the join is going to be your foreign key column, not your primary key column. Thus, if you have no index on the foreign key column, you will be forced to do a sequential table scan to do the join. In that case the hash-based join will almost certainly be faster (especially for such a large number of rows). If you put an index on the foreign key, then the inner scan can be an index scan and that might turn out to be faster than building the hash indexes on all the table rows.
Somebody can correct me if I'm wrong.
--
Eliot Gable
"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower
"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower
"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero
Hi Eliot, Thanks for the comment. On Thu, 1 Apr 2010, Eliot Gable wrote: > On Thu, Apr 1, 2010 at 7:46 AM, Faheem Mitha <faheem@email.unc.edu> wrote: > Looking at this more closely, idlink_id and anno_id are primary keys, so > already have indexes on them, so my understanding (from the docs) is > there is no purpose in creating them. That's why I removed the indexes > that were there (back last August, actually, according to my logs). > Anyway, doesn't look there is anything I can do here. Does anyone have > additions or corrections to this? > When you do a join, you typically have a foreign key in one table > referencing a primary key in another table. While designating a foreign > key does put a constraint on the key to ensure referential integrity, it > does not put an index on the column that is being designated as a > foreign key. If I understand correctly, the scan done as the inner loop > of the nested loop scan for the join is going to be your foreign key > column, not your primary key column. Thus, if you have no index on the > foreign key column, you will be forced to do a sequential table scan to > do the join. In that case the hash-based join will almost certainly be > faster (especially for such a large number of rows). If you put an index > on the foreign key, then the inner scan can be an index scan and that > might turn out to be faster than building the hash indexes on all the > table rows. > Somebody can correct me if I'm wrong. I had set the foreign keys in question (on the geno table) to be primary keys. This is because this setup is basically a glorified spreadsheet, and I don't want more than one cell corresponding to a particular tuple of idlink.id and anno.id (the conceptual rows and cols). Since a primary key defines an index, I thought putting indexes on idlink_id and anno_id was redundant. However, it looks like (unsurprisingly) the index corresponding to the primary key is across both columns, which may not be what is wanted for the aforesaid join. Ie. ALTER TABLE ONLY geno ADD CONSTRAINT geno_pkey PRIMARY KEY (idlink_id, anno_id) (As a side comment, with respect to the indexes on the other side of the joins, in one case, we have idlink.id = geno.idlink_id, and idlink.id is a primary key too. In the other, namely geno.anno_id = dedup_patient_anno.id, dedup_patient_anno is a CTE, so no index on dedup_patient_anno.id. But maybe indexes aren't needed there.) Here is the join SELECT decode_genotype(geno.snpval_id, %(allelea)s, %(alleleb)s) AS g, geno.idlink_id, geno.anno_id FROM geno INNER JOIN dedup_patient_anno ON geno.anno_id = dedup_patient_anno.id INNER JOIN idlink ON geno.idlink_id = idlink.id ORDER BY idlink_id, anno_id Here is the table dump. **************************************************************** -- Name: geno; Type: TABLE; Schema: hapmap; Owner: snp; Tablespace: -- CREATE TABLE geno ( idlink_id integer NOT NULL, anno_id integer NOT NULL, snpval_id integer NOT NULL ) WITH (autovacuum_enabled=true); ALTER TABLE hapmap.geno OWNER TO snp; -- -- Name: geno_pkey; Type: CONSTRAINT; Schema: hapmap; Owner: snp; Tablespace: -- ALTER TABLE ONLY geno ADD CONSTRAINT geno_pkey PRIMARY KEY (idlink_id, anno_id); (!!!!) -- -- Name: geno_anno_id_fkey; Type: FK CONSTRAINT; Schema: hapmap; Owner: snp -- ALTER TABLE ONLY geno ADD CONSTRAINT geno_anno_id_fkey FOREIGN KEY (anno_id) REFERENCES anno(id) ON UPDATE CASCADE ON DELETE CASCADE; -- -- Name: geno_idlink_id_fkey; Type: FK CONSTRAINT; Schema: hapmap; Owner: snp -- ALTER TABLE ONLY geno ADD CONSTRAINT geno_idlink_id_fkey FOREIGN KEY (idlink_id) REFERENCES idlink(id) ON UPDATE CASCADE ON DELETE CASCADE; -- -- Name: geno_snpval_id_fkey; Type: FK CONSTRAINT; Schema: hapmap; Owner: snp -- ALTER TABLE ONLY geno ADD CONSTRAINT geno_snpval_id_fkey FOREIGN KEY (snpval_id) REFERENCES snpval(val) ON UPDATE CASCADE ON DELETE CASCADE; ************************************************************************* So, should I add indexes on the individual foreign key cols idlink_id and anno_id after all? Regards, Faheem. > -- > Eliot Gable > > "We do not inherit the Earth from our ancestors: we borrow it from our > children." ~David Brower > "I decided the words were too conservative for me. We're not borrowing > from our children, we're stealing from them--and it's not even > considered to be a crime." ~David Brower Nice quotes. > "Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; > not live to eat.) ~Marcus Tullius Cicero
On Thu, Apr 1, 2010 at 2:15 PM, Faheem Mitha <faheem@email.unc.edu> wrote: > I had set the foreign keys in question (on the geno table) to be primary > keys. This is because this setup is basically a glorified spreadsheet, and I > don't want more than one cell corresponding to a particular tuple of > idlink.id and anno.id (the conceptual rows and cols). Since a primary key > defines an index, I thought putting indexes on idlink_id and anno_id was > redundant. However, it looks like (unsurprisingly) the index corresponding > to the primary key is across both columns, which may not be what is wanted > for the aforesaid join Actually it is what is wanted - that is good. > So, should I add indexes on the individual foreign key cols idlink_id > and anno_id after all? I doubt that would help. The bottom line may be that you're dealing with hundreds of millions of rows here, so things are going to take a long time. Of course you can always get more/faster memory, a bigger I/O subsystem, faster processors... and it could be that with detailed study there are optimizations that could be done even without spending money, but I think I'm about tapped out on what I can do over an Internet mailing list. ...Robert
On Thu, 1 Apr 2010, Robert Haas wrote: > On Thu, Apr 1, 2010 at 2:15 PM, Faheem Mitha <faheem@email.unc.edu> wrote: >> I had set the foreign keys in question (on the geno table) to be >> primary keys. This is because this setup is basically a glorified >> spreadsheet, and I don't want more than one cell corresponding to a >> particular tuple of idlink.id and anno.id (the conceptual rows and >> cols). Since a primary key defines an index, I thought putting indexes >> on idlink_id and anno_id was redundant. However, it looks like >> (unsurprisingly) the index corresponding to the primary key is across >> both columns, which may not be what is wanted for the aforesaid join > Actually it is what is wanted - that is good. I see. >> So, should I add indexes on the individual foreign key cols idlink_id >> and anno_id after all? > > I doubt that would help. You're sure of this? > The bottom line may be that you're dealing with hundreds of millions of > rows here, so things are going to take a long time. Of course you can > always get more/faster memory, a bigger I/O subsystem, faster > processors... and it could be that with detailed study there are > optimizations that could be done even without spending money, but I > think I'm about tapped out on what I can do over an Internet mailing > list. Thanks for your assistance, Robert. It's been educational. Regards, Faheem.
On Thu, Apr 1, 2010 at 3:01 PM, Faheem Mitha <faheem@email.unc.edu> wrote:
It is always best to test and be certain.
You're sure of this?So, should I add indexes on the individual foreign key cols idlink_id
and anno_id after all?
I doubt that would help.
It is always best to test and be certain.
On Thu, 1 Apr 2010, Eliot Gable wrote: > > > On Thu, Apr 1, 2010 at 3:01 PM, Faheem Mitha <faheem@email.unc.edu> wrote: > > So, should I add indexes on the individual foreign key cols idlink_id > and anno_id after all? > > > I doubt that would help. > > > You're sure of this? > > > It is always best to test and be certain. Fair enough. I may also try disabling hash joins and see what happens... Regards, Faheem.