Обсуждение: queries with lots of UNIONed relations
I was recently asked to look into why one particular set of queries was taking a long time. The queries are all of the same form. They select the UNION of a few columns from around 100 tables. The query in particular was taking some 7-8 minutes to run. On a whim, I changed the query from this form: SELECT a, b FROM FOO_a WHERE <conditions> UNION SELECT a,b FROM FOO_b WHERE <conditions> .... to: SELECT DISTINCT a,b FROM FOO_a WHERE <conditions> UNION SELECT DISTINCT a,b FROM FOO_b WHERE <conditions> ... and the query time dropped to under a minute. In the former case, the query plan was a bitmap heap scan for each table. Then those results were Appended, Sorted, Uniqued, Sorted again, and then returned. In the latter, before Appending, each table's results were run through HashAggregate. The total number of result rows is in the 500K range. Each table holds approximately 150K matching rows (but this can vary a bit). What I'm asking is this: since adding DISTINCT to each participating member of the UNION query reduced the total number of appended rows, is there some sort of heuristic that postgresql could use to do this automatically? The 12x speedup was quite nice. -- Jon
Jon Nelson <jnelson+pgsql@jamponi.net> writes: > In the former case, the query plan was a bitmap heap scan for each > table. Then those results were Appended, Sorted, Uniqued, Sorted > again, and then returned. > In the latter, before Appending, each table's results were run through > HashAggregate. Probably the reason it did that is that each individual de-duplication looked like it would fit in work_mem, but a single de-duplication didn't. Consider raising work_mem, at least for this one query. regards, tom lane
On Thu, Jan 13, 2011 at 11:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jon Nelson <jnelson+pgsql@jamponi.net> writes: >> In the former case, the query plan was a bitmap heap scan for each >> table. Then those results were Appended, Sorted, Uniqued, Sorted >> again, and then returned. > >> In the latter, before Appending, each table's results were run through >> HashAggregate. > > Probably the reason it did that is that each individual de-duplication > looked like it would fit in work_mem, but a single de-duplication > didn't. Consider raising work_mem, at least for this one query. I raised work_mem to as high as 512MB (SET LOCAL work_mem = '512MB', within the transaction). Nice. Instead of 7-10 minutes the result is now about a minute (the same as with individual de-duplication). Your comment regarding "each individual de-duplication looked like it would fit in work_mem" doesn't really make sense, exactly. Maybe I'm misunderstanding you. What I'm asking is this: can postgresql apply a de-duplication to each member of a UNION (as I did with SELECT DISTINCT) in order to reduce the total number of rows that need to be de-duplicated when all of the rows have been Appended? The results of the various plans/tweaks are: Initial state: (work_mem = 16MB, no DISTINCT, run time of 7-10 minutes): Unique (Sort (Append ( Lots of Bitmap Heap Scans Here ) ) ) and (work_mem = 16MB, with DISTINCT, run time of ~ 1 minute): HashAggregate ( Append ( Lots Of HashAggregate( Bitmap Heap Scan ) ) ) and (work_mem = 64kB, DISTINCT, run time of *15+ minutes*): Unique (Sort ( Append ( Lots Of HashAggregate( Bitmap Heap Scan ) ) ) ) So I take from this the following: 1. if the result set fits in work_mem, hash aggregate is wicked fast. About 1 jillion times faster than Unique+Sort. 2. it would be nifty if postgresql could be taught that, in a UNION, to de-duplicate each contributory relation so as to reduce the total set of rows that need to be re-de-duplicated. It's extra work, true, and maybe there are some tricks here, but it seems to make a big difference. This is useful so that the total result set is small enough that hash aggregate might apply. NOTE: I have to have work_mem really low as a global on this machine because other queries involving the same tables (such as those that involve UNION ALL for SUM() or GROUP BY operations) cause the machine to run out of memory. Indeed, even with work_mem at 1MB I run the machine out of memory if I don't explicitly disable hashagg for some queries. Can anything be done about that? -- Jon
Jon Nelson <jnelson+pgsql@jamponi.net> writes: > Your comment regarding "each individual de-duplication looked like it > would fit in work_mem" doesn't really make sense, exactly. Maybe I'm > misunderstanding you. Yeah. What I was suggesting was to NOT add the DISTINCT's, but instead raise work_mem high enough so you get just one HashAggregation step at the top level. (Although I think this only works in 8.4 and up.) That should be faster than two levels of de-duplication. regards, tom lane
On Thu, Jan 13, 2011 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jon Nelson <jnelson+pgsql@jamponi.net> writes: >> Your comment regarding "each individual de-duplication looked like it >> would fit in work_mem" doesn't really make sense, exactly. Maybe I'm >> misunderstanding you. > > Yeah. What I was suggesting was to NOT add the DISTINCT's, but instead > raise work_mem high enough so you get just one HashAggregation step at > the top level. (Although I think this only works in 8.4 and up.) > That should be faster than two levels of de-duplication. Gave it a try -- performance either way doesn't seem to change - although the final set that has to undergo de-duplication is rather larger (WITHOUT DISTINCT) so I still run the risk of not getting Hash Aggregation. Since having the DISTINCT doesn't seem to hurt, and it avoids (potential) significant pain, I'll keep it. I still think that having UNION do de-duplication of each contributory relation is a beneficial thing to consider -- especially if postgresql thinks the uniqueness is not very high. Thanks! -- Jon
On Thu, Jan 13, 2011 at 3:12 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: > I still think that having UNION do de-duplication of each contributory > relation is a beneficial thing to consider -- especially if postgresql > thinks the uniqueness is not very high. This might be worth a TODO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jan 13, 2011 at 3:12 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: >> I still think that having UNION do de-duplication of each contributory >> relation is a beneficial thing to consider -- especially if postgresql >> thinks the uniqueness is not very high. > This might be worth a TODO. I don't believe there is any case where hashing each individual relation is a win compared to hashing them all together. If the optimizer were smart enough to be considering the situation as a whole, it would always do the latter. regards, tom lane
On Thu, Jan 13, 2011 at 5:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jan 13, 2011 at 3:12 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: >>> I still think that having UNION do de-duplication of each contributory >>> relation is a beneficial thing to consider -- especially if postgresql >>> thinks the uniqueness is not very high. > >> This might be worth a TODO. > > I don't believe there is any case where hashing each individual relation > is a win compared to hashing them all together. If the optimizer were > smart enough to be considering the situation as a whole, it would always > do the latter. You might be right, but I'm not sure. Suppose that there are 100 inheritance children, and each has 10,000 distinct values, but none of them are common between the tables. In that situation, de-duplicating each individual table requires a hash table that can hold 10,000 entries. But deduplicating everything at once requires a hash table that can hold 1,000,000 entries. Or am I all wet? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 13, 2011 at 5:41 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jan 13, 2011 at 5:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Thu, Jan 13, 2011 at 3:12 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: >>>> I still think that having UNION do de-duplication of each contributory >>>> relation is a beneficial thing to consider -- especially if postgresql >>>> thinks the uniqueness is not very high. >> >>> This might be worth a TODO. >> >> I don't believe there is any case where hashing each individual relation >> is a win compared to hashing them all together. If the optimizer were >> smart enough to be considering the situation as a whole, it would always >> do the latter. > > You might be right, but I'm not sure. Suppose that there are 100 > inheritance children, and each has 10,000 distinct values, but none of > them are common between the tables. In that situation, de-duplicating > each individual table requires a hash table that can hold 10,000 > entries. But deduplicating everything at once requires a hash table > that can hold 1,000,000 entries. > > Or am I all wet? Yeah, I'm all wet, because you'd still have to re-de-duplicate at the end. But then why did the OP get a speedup? *scratches head* -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/13/2011 4:42 PM, Robert Haas wrote: > On Thu, Jan 13, 2011 at 5:41 PM, Robert Haas<robertmhaas@gmail.com> wrote: >> On Thu, Jan 13, 2011 at 5:26 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> Robert Haas<robertmhaas@gmail.com> writes: >>>> On Thu, Jan 13, 2011 at 3:12 PM, Jon Nelson<jnelson+pgsql@jamponi.net> wrote: >>>>> I still think that having UNION do de-duplication of each contributory >>>>> relation is a beneficial thing to consider -- especially if postgresql >>>>> thinks the uniqueness is not very high. >>> >>>> This might be worth a TODO. >>> >>> I don't believe there is any case where hashing each individual relation >>> is a win compared to hashing them all together. If the optimizer were >>> smart enough to be considering the situation as a whole, it would always >>> do the latter. >> >> You might be right, but I'm not sure. Suppose that there are 100 >> inheritance children, and each has 10,000 distinct values, but none of >> them are common between the tables. In that situation, de-duplicating >> each individual table requires a hash table that can hold 10,000 >> entries. But deduplicating everything at once requires a hash table >> that can hold 1,000,000 entries. >> >> Or am I all wet? > > Yeah, I'm all wet, because you'd still have to re-de-duplicate at the > end. But then why did the OP get a speedup? *scratches head* > Because it all fix it memory and didnt swap to disk? -Andy
On Thu, Jan 13, 2011 at 5:47 PM, Andy Colson <andy@squeakycode.net> wrote: >>>> I don't believe there is any case where hashing each individual relation >>>> is a win compared to hashing them all together. If the optimizer were >>>> smart enough to be considering the situation as a whole, it would always >>>> do the latter. >>> >>> You might be right, but I'm not sure. Suppose that there are 100 >>> inheritance children, and each has 10,000 distinct values, but none of >>> them are common between the tables. In that situation, de-duplicating >>> each individual table requires a hash table that can hold 10,000 >>> entries. But deduplicating everything at once requires a hash table >>> that can hold 1,000,000 entries. >>> >>> Or am I all wet? >> >> Yeah, I'm all wet, because you'd still have to re-de-duplicate at the >> end. But then why did the OP get a speedup? *scratches head* > > Because it all fix it memory and didnt swap to disk? Doesn't make sense. The re-de-duplication at the end should use the same amount of memory regardless of whether the individual relations have already been de-duplicated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/13/2011 4:49 PM, Robert Haas wrote: > On Thu, Jan 13, 2011 at 5:47 PM, Andy Colson<andy@squeakycode.net> wrote: >>>>> I don't believe there is any case where hashing each individual relation >>>>> is a win compared to hashing them all together. If the optimizer were >>>>> smart enough to be considering the situation as a whole, it would always >>>>> do the latter. >>>> >>>> You might be right, but I'm not sure. Suppose that there are 100 >>>> inheritance children, and each has 10,000 distinct values, but none of >>>> them are common between the tables. In that situation, de-duplicating >>>> each individual table requires a hash table that can hold 10,000 >>>> entries. But deduplicating everything at once requires a hash table >>>> that can hold 1,000,000 entries. >>>> >>>> Or am I all wet? >>> >>> Yeah, I'm all wet, because you'd still have to re-de-duplicate at the >>> end. But then why did the OP get a speedup? *scratches head* >> >> Because it all fix it memory and didnt swap to disk? > > Doesn't make sense. The re-de-duplication at the end should use the > same amount of memory regardless of whether the individual relations > have already been de-duplicated. > Unless I missed something in the thread: distinctList + distinctList + ... -> [fit in mem] -> last distinct -> [fit in mem] vs. fullList + fullList + ... -> [swapped to disk] -> last distinct -> [fit in mem] -Andy
On Thu, Jan 13, 2011 at 4:49 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jan 13, 2011 at 5:47 PM, Andy Colson <andy@squeakycode.net> wrote: >>>>> I don't believe there is any case where hashing each individual relation >>>>> is a win compared to hashing them all together. If the optimizer were >>>>> smart enough to be considering the situation as a whole, it would always >>>>> do the latter. >>>> >>>> You might be right, but I'm not sure. Suppose that there are 100 >>>> inheritance children, and each has 10,000 distinct values, but none of >>>> them are common between the tables. In that situation, de-duplicating >>>> each individual table requires a hash table that can hold 10,000 >>>> entries. But deduplicating everything at once requires a hash table >>>> that can hold 1,000,000 entries. >>>> >>>> Or am I all wet? >>> >>> Yeah, I'm all wet, because you'd still have to re-de-duplicate at the >>> end. But then why did the OP get a speedup? *scratches head* >> >> Because it all fix it memory and didnt swap to disk? > > Doesn't make sense. The re-de-duplication at the end should use the > same amount of memory regardless of whether the individual relations > have already been de-duplicated. I don't believe that to be true. Assume 100 tables each of which produces 10,000 rows from this query. Furthermore, let's assume that there are 3,000 duplicates per table. Without DISTINCT: uniqify (100 * 10,000 = 1,000,000 rows) With DISTINCT: uniqify (100 * (10,000 - 3,000) = 700,000 rows) 300,000 rows times (say, 64 bytes/row) = 18.75MB. Not a lot, but more than the work_mem of 16MB. Or maybe *I'm* all wet? -- Jon
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jan 13, 2011 at 5:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I don't believe there is any case where hashing each individual relation >> is a win compared to hashing them all together. �If the optimizer were >> smart enough to be considering the situation as a whole, it would always >> do the latter. > You might be right, but I'm not sure. Suppose that there are 100 > inheritance children, and each has 10,000 distinct values, but none of > them are common between the tables. In that situation, de-duplicating > each individual table requires a hash table that can hold 10,000 > entries. But deduplicating everything at once requires a hash table > that can hold 1,000,000 entries. > Or am I all wet? If you have enough memory to de-dup them individually, you surely have enough to de-dup all at once. It is not possible for a single hashtable to have worse memory consumption than N hashtables followed by a union hashtable, and in fact if there are no common values then the latter eats twice as much space because every value appears twice in two different hashtables. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > Yeah, I'm all wet, because you'd still have to re-de-duplicate at the > end. But then why did the OP get a speedup? *scratches head* He was reporting that 2 levels of hashing was faster than sort+uniq (with the sorts swapping to disk, no doubt). One level of hashing should be faster yet, but maybe not by enough to be obvious as long as you don't start to swap. regards, tom lane
On Thu, Jan 13, 2011 at 5:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jan 13, 2011 at 5:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I don't believe there is any case where hashing each individual relation >>> is a win compared to hashing them all together. If the optimizer were >>> smart enough to be considering the situation as a whole, it would always >>> do the latter. > >> You might be right, but I'm not sure. Suppose that there are 100 >> inheritance children, and each has 10,000 distinct values, but none of >> them are common between the tables. In that situation, de-duplicating >> each individual table requires a hash table that can hold 10,000 >> entries. But deduplicating everything at once requires a hash table >> that can hold 1,000,000 entries. > >> Or am I all wet? > > If you have enough memory to de-dup them individually, you surely have > enough to de-dup all at once. It is not possible for a single hashtable > to have worse memory consumption than N hashtables followed by a union > hashtable, and in fact if there are no common values then the latter eats > twice as much space because every value appears twice in two different > hashtables. If everything were available up-front, sure. However, and please correct me if I'm wrong, but doesn't postgresql work in a fairly linear fashion, moving from table to table performing a series of operations on each? That seems to indicate that is what the plan is: Compare: for each table LOOP scan table for result rows, append to results END LOOP hash / sort + unique results versus: for each table LOOP scan table for result rows, append to table-results hash / sort+unique table-results, append to results END LOOP hash / sort + unique results In the former case, all of the result rows from all tables are appended together before the de-duplification process can start. In the latter case, only enough memory for each table's result set is necessary for de-duplification, and it would only be necessary to allocate it for that table. Is that not how this works? -- Jon
Jon Nelson <jnelson+pgsql@jamponi.net> writes: > On Thu, Jan 13, 2011 at 5:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If you have enough memory to de-dup them individually, you surely have >> enough to de-dup all at once. > If everything were available up-front, sure. > However, and please correct me if I'm wrong, but doesn't postgresql > work in a fairly linear fashion, moving from table to table performing > a series of operations on each? Doing a single sort+uniq works like that. But the alternate plan you are proposing we should consider involves building all the lower hashtables, and then reading from them to fill the upper hashtable. Max memory consumption *is* worst case here. Remember HashAggregate is incapable of swapping to disk (and if it did, you wouldn't be nearly as pleased with its performance). regards, tom lane
On 1/13/2011 5:41 PM, Robert Haas wrote: > You might be right, but I'm not sure. Suppose that there are 100 > inheritance children, and each has 10,000 distinct values, but none of > them are common between the tables. In that situation, de-duplicating > each individual table requires a hash table that can hold 10,000 > entries. But deduplicating everything at once requires a hash table > that can hold 1,000,000 entries. > > Or am I all wet? > Have you considered using Google's map-reduce framework for things like that? Union and group functions look like ideal candidates for such a thing. I am not sure whether map-reduce can be married to a relational database, but I must say that I was impressed with the speed of MongoDB. I am not suggesting that PostgreSQL should sacrifice its ACID compliance for speed, but Mongo sure does look like a speeding bullet. On the other hand, the algorithms that have been paralleled for a long time are precisely sort/merge and hash algorithms used for union and group by functions. This is what I have in mind: http://labs.google.com/papers/mapreduce.html -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
14.01.11 00:26, Tom Lane написав(ла): > Robert Haas<robertmhaas@gmail.com> writes: >> On Thu, Jan 13, 2011 at 3:12 PM, Jon Nelson<jnelson+pgsql@jamponi.net> wrote: >>> I still think that having UNION do de-duplication of each contributory >>> relation is a beneficial thing to consider -- especially if postgresql >>> thinks the uniqueness is not very high. >> This might be worth a TODO. > I don't believe there is any case where hashing each individual relation > is a win compared to hashing them all together. If the optimizer were > smart enough to be considering the situation as a whole, it would always > do the latter. > > How about cases when individual relations are already sorted? This will mean that they can be deduplicated fast and in streaming manner. Even partial sort order may help because you will need to deduplicate only groups with equal sorted fields, and this will take much less memory and be much more streaming. And if all individual deduplications are streaming and are sorted in one way - you can simply do a merge on top. Best regards, Vitalii Tymchyshyn.
On Thu, Jan 13, 2011 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jon Nelson <jnelson+pgsql@jamponi.net> writes: >> On Thu, Jan 13, 2011 at 5:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> If you have enough memory to de-dup them individually, you surely have >>> enough to de-dup all at once. > >> If everything were available up-front, sure. >> However, and please correct me if I'm wrong, but doesn't postgresql >> work in a fairly linear fashion, moving from table to table performing >> a series of operations on each? > > Doing a single sort+uniq works like that. But the alternate plan you > are proposing we should consider involves building all the lower > hashtables, and then reading from them to fill the upper hashtable. > Max memory consumption *is* worst case here. Remember HashAggregate > is incapable of swapping to disk (and if it did, you wouldn't be nearly > as pleased with its performance). That's not exactly what I'm proposing - but it is probably due to a lack of understanding some of the underlying details of how postgresql works. I guess I had assumed that the result of a HashAggregate or any other de-duplication process was a table-like structure. Regarding being pleased with hash aggregate - I am! - except when it goes crazy and eats all of the memory in the machine. I'd trade a bit of performance loss for not using up all of the memory and crashing. However, maybe I'm misunderstanding how SELECT DISTINCT works internally. In the case where a hashtable is used, does postgresql utilize table-like structure or does it remain a hashtable in memory? If it's a hashtable, couldn't the hashtable be built on-the-go rather than only after all of the underlying tuples are available? I'd love a bit more explanation as to how this works. Another example of where this might be useful: I'm currently running a SELECT DISTINCT query over some 500 million rows (120 contributory tables). I expect a de-duplicated row count of well under 10% of that 500 million, probably below 1%. The plan as it stands is to execute a series of sequential scans, appending each of the rows from each contributory table and then aggregating them. If the expected distinctness of each contributory subquery is, say, 5% then instead of aggregating over 500 million tuples the aggregation would take place over 25 million. In this case, that is a savings of 10 gigabytes, approximately. Yes, it's true, the same amount of data has to be scanned. However, the amount of data that needs to be stored (in memory or on disk) in order to provide a final de-duplication is much smaller. -- Jon
On Fri, Jan 14, 2011 at 2:11 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: > On Thu, Jan 13, 2011 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Jon Nelson <jnelson+pgsql@jamponi.net> writes: >>> On Thu, Jan 13, 2011 at 5:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> If you have enough memory to de-dup them individually, you surely have >>>> enough to de-dup all at once. >> >>> If everything were available up-front, sure. >>> However, and please correct me if I'm wrong, but doesn't postgresql >>> work in a fairly linear fashion, moving from table to table performing >>> a series of operations on each? >> >> Doing a single sort+uniq works like that. But the alternate plan you >> are proposing we should consider involves building all the lower >> hashtables, and then reading from them to fill the upper hashtable. >> Max memory consumption *is* worst case here. Remember HashAggregate >> is incapable of swapping to disk (and if it did, you wouldn't be nearly >> as pleased with its performance). > > That's not exactly what I'm proposing - but it is probably due to a > lack of understanding some of the underlying details of how postgresql > works. I guess I had assumed that the result of a HashAggregate or any > other de-duplication process was a table-like structure. And I assumed wrong, I think. I dug into the code (nodeHash.c and others) and I think I understand now why HashAggregate works only in certain circumstances, and I think I understand your comments a bit better now. Basically, HashAggregate doesn't stream unique Nodes the way nodeUnique.c does. nodeUnique basically emits Nodes and elides subsequent, identical Nodes, which is why it relies on the input being sorted. HashAggregate works only on entire input sets at once, and nodeHash.c doesn't emit Nodes at all, really. This makes me wonder if nodeHash.c and nodeHashjoin.c couldn't be modified to output Nodes in a streaming fashion. The memory requirements would not be any worse than now. Does postgresql support any sort of merge sort? If it did, then if the hashtable started consuming too much memory, it could be cleared and the nodes output from the new hashtable could be directed to another temporary file, and then a merge sort could be performed on all of the temporary files (and thus Unique could be used to affect the UNION operation). -- Jon