On 1/13/2011 5:41 PM, Robert Haas wrote:
> You might be right, but I'm not sure. Suppose that there are 100
> inheritance children, and each has 10,000 distinct values, but none of
> them are common between the tables. In that situation, de-duplicating
> each individual table requires a hash table that can hold 10,000
> entries. But deduplicating everything at once requires a hash table
> that can hold 1,000,000 entries.
>
> Or am I all wet?
>
Have you considered using Google's map-reduce framework for things like
that? Union and group functions look like ideal candidates for such a
thing. I am not sure whether map-reduce can be married to a relational
database, but I must say that I was impressed with the speed of MongoDB.
I am not suggesting that PostgreSQL should sacrifice its ACID compliance
for speed, but Mongo sure does look like a speeding bullet.
On the other hand, the algorithms that have been paralleled for a long
time are precisely sort/merge and hash algorithms used for union and
group by functions. This is what I have in mind:
http://labs.google.com/papers/mapreduce.html
--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com