Re: [HACKERS] CFH: Mariposa, distributed DB

Поиск
Список
Период
Сортировка
От Ross J. Reedstrom
Тема Re: [HACKERS] CFH: Mariposa, distributed DB
Дата
Msg-id 20000207165759.A25647@rice.edu
обсуждение исходный текст
Ответ на Re: [HACKERS] CFH: Mariposa, distributed DB  (Don Baccus <dhogaza@pacifier.com>)
Ответы Re: [HACKERS] CFH: Mariposa, distributed DB  (Don Baccus <dhogaza@pacifier.com>)
Список pgsql-hackers
Seems there was more than just going back to the Berkeley site that
reminded me of Mariposa. A principle new functionality in Mariposa is 
the ability to 'fragment' a class, based on a user-defined partitioning
function. The example used is a widgets class, which is partitioned on
the 'location' field (i.e., the warehouse the widget is stored in)

CREATE TABLE widgets (part_no        int4,location    char16,on_hand        int4,on_order    int4,commited    int4
) PARTITION ON LOCATION USING btchar16cmp;

Then, the table is filled with tuples, all containing locations of either
'Miami' or 'New York'.

SELECT * from widgets; 

works as expected.

Later, this table is fragmented:

SPLIT FRAGMENT widgets INTO widgets_mi, widgets_ny AT 'Miami';

Now, the original table widgets is _empty_: all the tuples with location <=
'Miami' go to widgets_mi, location > 'Miami' go to widgets_ny.

SELECT * from widgets; 

Still returns all the tuples! So, this works sort of the way Chris Bitmead
has implemented subclasses: widgets_mi and widgets_ny are subclasses of
the widgets class, so selects return everything below. They differ in
that only PARTITIONed classes can be FRAGMENTed.

The distributed part comes in with the MOVE FRAGMENT command. This
transfers the 'master' copy of a table to the designated host, so future
access to that FRAGMENT will go over the network.

There's also a COPY FRAGMENT command, that sets up a local cache of a
fragment, with a periodic update time.  These copies may be either 
READONLY, or (default) READ/WRITE. Seems updates are timed only (simple
extension would be to implement write through behavior)

All this is coming from the Mariposa User's Manual, which is an extended
version of the Postgres95 User's Manual.

As to latest vs. best effort: One defines a BidCurve, who's dimensions are
Cost and Time. A flat curve should get you that latest data. And, since
the DataBroker and Bidder are both implemented as Tcl scripts, so it
would be possible to define a bid policy that only buys the latest data,
regardless of how long it's going to take.

Oh, BTW, yes that does put _two_ interpreted Tcl scripts on the execution
path for every query. Wonder what _that'll_ do for execution time. However,
it's like planning/optimization time, in that it's spent per query, rather
than per tuple.

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005


On Mon, Feb 07, 2000 at 02:19:56PM -0800, Don Baccus wrote:
> At 12:04 AM 2/8/00 +0200, Hannu Krosing wrote:
> 
> >The site to go for information was determined by an auction where each site 
> >offered speed and cost for looking up the data. Usually the didn't also 
> >quarantee the latest data, just the "best effort".
> 
> I just glanced at the website.  They explicitly mention that they don't
> require global synchronization, because it would slow down response time
> for many things (with thousands of server, that sounds like an
> understatement).  
> 
> So, yes, it would appear they don't guarantee the latest data.
> 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Don Baccus
Дата:
Сообщение: Re: [HACKERS] CFH: Mariposa, distributed DB
Следующее
От: Peter Eisentraut
Дата:
Сообщение: psql and libpq fixes