Re: [HACKERS] Mariposa
От | Ross J. Reedstrom |
---|---|
Тема | Re: [HACKERS] Mariposa |
Дата | |
Msg-id | 19990802172354.B17969@wallace.ece.rice.edu обсуждение исходный текст |
Ответ на | Re: [HACKERS] Selectivity estimates paper, and Mariposa (Bruce Momjian <maillist@candle.pha.pa.us>) |
Ответы |
Re: [HACKERS] Mariposa
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: [HACKERS] Mariposa (Bruce Momjian <maillist@candle.pha.pa.us>) |
Список | pgsql-hackers |
On Mon, Aug 02, 1999 at 04:44:10PM -0400, Bruce Momjian wrote: > > We still have a directory called tioga which is also related to > Mariposa. Basically, at the time, no one understood the academic stuff, > and we had tons of bugs in general areas. We just didn't see any reason > to keep around unusual features while our existing code was so poorly > maintained from Berkeley. The right thing to do, I concur. Get the basics stable and working well, _then_ tack on the interesting stuff :-) A common complaint about us academics: we only want to do the interesting stuff. > > The mariposa remote access features looked like they were heavily done > in the executor directory. This makes sense assuming they wanted the > access to be done remotely. They also tried to fix some things while > doing Mariposa. A few of those fixes have been added over the years. > Right. As I've been able to make out so far, in Mariposa a query passes through the regular parser and single-site optimizer, then the selected plan tree is handed to a 'fragmenter' to break the work up into chunks, which are then handed around to a 'broker' which uses a microeconomic 'bid' process to parcels them out to both local and remote executors. The results from each site then go through a local 'coordinator' which merges the result sets, and hands them back to the original client. Whew! It's interesting to compare the theory describing the workings of Mariposa (such as the paper in VLDB), and the code. For the fragmenter, the paper describes basically a rational decomposition of the plan, while the code applies non-deterministic, but tuneable, methods (lots of calls to random and comparisions to user specified odds ratios). It strikes me as a bit odd to optimize the plan for a single site, then break it all apart again. My thoughts on this are to implement a two new node types: one a remote table, and one which represents access to a remote table. Remote tables have host info in them, and always be added to the plan with a remote-access node directly above them. Remote-access nodes would be seperate from their remote-table, to allow the communications cost to be slid up the plan tree, and merged with other remote-access nodes talking to the same server. This should maintain the order-agnostic nature of the optimizer. The executor will need to build SQL statements and from the sub-plans and submit them via standard network db access client librarys. First step, create a remote-table node, and teach the excutor how to get info from it. Later, add the seperable remote-access node. How insane does this sound now? Am I still a mad scientist? (...always!) Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
В списке pgsql-hackers по дате отправления: