Quoting Bruce Momjian <pgman@candle.pha.pa.us>:
>
> Is there a TODO anywhere in this discussion? If so, please let me
> know.
>
Umm... I don't think so. I'm not clear on what TODO means yet. 'Up for
consideration'? If a "TODO" means committing to do, I would prefer to
follow up on a remote-schema (federated server) project first.
...
> > If there were room for improvement, (and I didn't see it in the
> source)
> > it would be the logic to:
> >
> > - swap inner and outer inputs (batches) when the original inner
> turned
> > out to be too large for memory, and the corresponding outer did
> not. If
> > you implement that anyway (complicates the loops) then it's no
> trouble
> > to just hash the smaller of the two, every time; saves some CPU.
> >
> > - recursively partition batches where both inner and outer input
> batch
> > ends up being too large for memory, too; or where the required
> number of
> > batch output buffers alone is too large for working RAM. This is
> only
> > for REALLY big inputs.
> >
> > Note that you don't need a bad hash function to get skewed batch
> sizes;
> > you only need a skew distribution of the values being hashed.