Обсуждение: Standard replication interface?

Поиск
Список
Период
Сортировка

Standard replication interface?

От
Greg Copeland
Дата:
Reading about the pgmonitor thread and mention of gborg made me wonder
about replication and ready ability to uniformly monitor it.  Just as
pg_stat* tables exist to allow for statistic gathering and monitoring in
a uniform fashion, it occurred to me that a predefined set of views
and/or tables for all replication implementations may be worthwhile.
That way, no matter what replication method/tool is being used, as long
as it conforms to the defined replication interfaces, generic monitoring
tools can be used to keep an eye on things.

Think this has any merit?

Greg Copeland






Re: Standard replication interface?

От
Tom Lane
Дата:
Greg Copeland <greg@CopelandConsulting.Net> writes:
> ... it occurred to me that a predefined set of views
> and/or tables for all replication implementations may be worthwhile.

Do we understand replication well enough to define such a set of views?
I sure don't ...
        regards, tom lane


Re: Standard replication interface?

От
Greg Copeland
Дата:
Well, that's a different issue.  ;)

I initially wanted to get feedback to see if anyone else thought the
concept might hold some merit.

I take it from your answer you think it might...but are scratching your
head wondering exactly what it entails...

Greg


On Wed, 2002-08-14 at 22:47, Tom Lane wrote:
> Greg Copeland <greg@CopelandConsulting.Net> writes:
> > ... it occurred to me that a predefined set of views
> > and/or tables for all replication implementations may be worthwhile.
>
> Do we understand replication well enough to define such a set of views?
> I sure don't ...
>
>             regards, tom lane


Re: Standard replication interface?

От
Andrew Sullivan
Дата:
On Wed, Aug 14, 2002 at 10:15:32PM -0500, Greg Copeland wrote:

> Reading about the pgmonitor thread and mention of gborg made me wonder
> about replication and ready ability to uniformly monitor it.  Just as
> pg_stat* tables exist to allow for statistic gathering and monitoring in
> a uniform fashion, it occurred to me that a predefined set of views
> and/or tables for all replication implementations may be worthwhile. 
> That way, no matter what replication method/tool is being used, as long
> as it conforms to the defined replication interfaces, generic monitoring
> tools can be used to keep an eye on things.

That sounds like the cart is before the horse.  You need to know what
sort of replication scheme you might ever have before you could
know the statistics that you might want to know.

There are different sorts of replication schemes under consideration. 
For instance, rserv uses an asynchronous master/slave approach, which
relies on slaves that are almost dumb as chickens.  (Not quite. 
There is some data about the state of replication in the slave
database; but most of it is in the master.)  Postgres-R, on the other
hand, contemplates a distributed model wherein different database
machines participate in a pool.

So for rserv-style replication, you want to know (for instance)
average slave-update times, and whether slaves are getting behind,
and by how much, and such.  Balancing of inserts, however, is not
relevant, because you can't do that.

Postgres-R will have the opposite need: you'll want to know what sort
of load balancing you're getting, but time-to-replicate is not
relevant, because a commit on one machine is necessarily a commit
everywhere (that's why it's "eager" replication).

You probably could design a set of statistics that would cover all
cases, but only after you know what the cases were.

A

-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3                                        +1 416 646 3304
x110



Re: Standard replication interface?

От
Greg Copeland
Дата:
On Thu, 2002-08-15 at 09:47, Andrew Sullivan wrote:
> On Wed, Aug 14, 2002 at 10:15:32PM -0500, Greg Copeland wrote:
> > That way, no matter what replication method/tool is being used, as long
> > as it conforms to the defined replication interfaces, generic monitoring
> > tools can be used to keep an eye on things.
>
> That sounds like the cart is before the horse.  You need to know what
> sort of replication scheme you might ever have before you could
> know the statistics that you might want to know.

Hmmm.  Never heard of an inquiry for interest in a concept as putting
the cart before the horse.  Considering this is pretty much how things
get developed in the real world, I'm not sure what you feel is so
special about replication.

First step is always identify the need.  I'm attempting to do so.  Not
sure what you'd consider the first step to be but I can assure you,
regardless of this concept seeing the light of day, it is the first
step.  The horse is correctly positioned in front of the cart.

I also stress that I'm talking about a statistical replication
interface.  It occurred to me that you might of been confused on this
matter.  That is, a set of tables and views will allow for the
replication process to be uniformly *monitored*.  I am not talking about
a set of interfaces which all manner of replication much perform its job
through (interface with databases for replication).

>
> There are different sorts of replication schemes under consideration.

Yep.  Thus it would seemingly be ideal to have a specification which
different implementations would seek to implement.  Off of the top of my
head and for starters, a table and/or view which could can queried that
returns the tables that are being replicated sounds good to me.  Same
thing for the list of databases, the servers involved and their
associated role (master, slave, peer).

Without such a concept, there will be no standardized way to monitor
your replication.  As such, chances are one of two things will happen.
One, a single replication method will be championed and fair tools will
develop to support where all others are bastards.  Two, quality tools to
monitor replication will never materialize because each method for
monitoring is specific to the different types of implementations.
Resources will constantly be spread amongst a variety of well meaning
projects.


--Greg



Re: Standard replication interface?

От
Neil Conway
Дата:
Andrew Sullivan <andrew@libertyrms.info> writes:
> On Wed, Aug 14, 2002 at 10:15:32PM -0500, Greg Copeland wrote:
> > Reading about the pgmonitor thread and mention of gborg made me wonder
> > about replication and ready ability to uniformly monitor it.  Just as
> > pg_stat* tables exist to allow for statistic gathering and monitoring in
> > a uniform fashion, it occurred to me that a predefined set of views
> > and/or tables for all replication implementations may be worthwhile. 
> > That way, no matter what replication method/tool is being used, as long
> > as it conforms to the defined replication interfaces, generic monitoring
> > tools can be used to keep an eye on things.
> 
> That sounds like the cart is before the horse.

That's exactly what I was going to say -- I'd prefer that any
interested parties concentrate on producing a *really good*
replication implementation, which might eventually be integrated into
PostgreSQL itself.

Producing a "generic API" for something that really doesn't need
genericity sounds like a waste of time, IMHO.

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC



Re: Standard replication interface?

От
Greg Copeland
Дата:
On Thu, 2002-08-15 at 09:53, Neil Conway wrote:
> That's exactly what I was going to say -- I'd prefer that any
> interested parties concentrate on producing a *really good*
> replication implementation, which might eventually be integrated into
> PostgreSQL itself.
>
> Producing a "generic API" for something that really doesn't need
> genericity sounds like a waste of time, IMHO.
>
> Cheers,
>
> Neil


Some how I get the impression that I've been completely misunderstood.
Somehow, people seem to of only read the subject and skipped the body
explaining the concept.

In what way would providing a generic interface to *monitor* be a "waste
of time"?  In what way would that prevent someone from "producing a
*readlly good* replication implementation"?  I utterly fail to see the
connection.

Regards,Greg Copeland


Re: Standard replication interface?

От
Neil Conway
Дата:
Greg Copeland <greg@CopelandConsulting.Net> writes:
> In what way would providing a generic interface to *monitor* be a
> "waste of time"?

As I said -- I don't really see the need for a bunch of replication
implementations, and therefore I don't see the need for a generic API
to make the whole mess (slightly) more manageable.

> In what way would that prevent someone from "producing a readlly
> good* replication implementation"?

It wouldn't -- it's just that if/when such an implementation exists
and everyone who needs replication is using it, a "generic monitoring
API" would be pointless.

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC



Re: Standard replication interface?

От
Greg Copeland
Дата:
> As I said -- I don't really see the need for a bunch of replication
> implementations, and therefore I don't see the need for a generic API
> to make the whole mess (slightly) more manageable.

I see.  So the intension of the core developers is to have one and only
one replication solution?

Greg




Re: Standard replication interface?

От
Neil Conway
Дата:
Greg Copeland <greg@CopelandConsulting.Net> writes:
> > As I said -- I don't really see the need for a bunch of replication
> > implementations, and therefore I don't see the need for a generic API
> > to make the whole mess (slightly) more manageable.
> 
> I see.  So the intension of the core developers is to have one and only
> one replication solution?

Not being a core developer, I can't comment on their intentions.

That said, I _personally_ don't see the need for more than one or two
replication implementations. You might need more than one if you
wanted to do both lazy and eager replication, for example. But you
certainly don't need 5 or 6 or however many implementations exist at
the moment.

I think the reason there are a lot of different implementations at the
moment is that each one has some pretty serious problems. So rather
than trying to reduce the problem by making it slightly easier for the
different replication solutions to inter-operate, I think it's a
better idea to solve the problem outright by improving one of the
existing replication projects to the point at which it is ready for
widespread production usage.

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC



Re: Standard replication interface?

От
Greg Copeland
Дата:
On Thu, 2002-08-15 at 13:18, Neil Conway wrote:
> That said, I _personally_ don't see the need for more than one or two
> replication implementations. You might need more than one if you
> wanted to do both lazy and eager replication, for example. But you
> certainly don't need 5 or 6 or however many implementations exist at
> the moment.

Fair enough.  Thank you for offering a complete explanation.

You're argument certainly made sense.  I wasn't aware of any single
serious effort underway which sought to finally put replication to bed,
let alone integrated into the core code base.

Sign,
Greg Copeland

Re: Standard replication interface?

От
cbbrowne@cbbrowne.com
Дата:
> --=-QQHYShMlxI2BY71i6NiO
> Content-Type: text/plain
> Content-Transfer-Encoding: quoted-printable
> 
> > As I said -- I don't really see the need for a bunch of replication
> > implementations, and therefore I don't see the need for a generic API
> > to make the whole mess (slightly) more manageable.
> 
> I see.  So the intension of the core developers is to have one and only
> one replication solution?

If the various "solutions" may be folded down into a smaller set of programs, 
perhaps, ultimately, into _one_ program, that would surely be easier to 
manage, in the codebase, than having five or six such programs.

If one program can do the job that needs to be done, and it has not been 
_clearly_ established that that is _not_ possible, then I'd think it rather 
silly to have a bunch of "replication solutions" that need to be updated any 
time a relevant change goes into the database engine.

I'd be surprised if, in the end, there truly _needed_ to be more than about 
two approaches.

Should the team plan to _have_ a mess?  I'd think not.
--
(concatenate 'string "cbbrowne" "@ntlug.org")
http://cbbrowne.com/info/linuxdistributions.html
"We don't understand the  software, and sometimes we don't  understand
the hardware, but we can *see* the blinking lights!"  -- Unknown




Re: Standard replication interface?

От
Tom Lane
Дата:
Neil Conway <nconway@klamath.dyndns.org> writes:
> Greg Copeland <greg@CopelandConsulting.Net> writes:
>> I see.  So the intension of the core developers is to have one and only
>> one replication solution?

> Not being a core developer, I can't comment on their intentions.

Well, I am, but I'm only speaking for myself here:

I think there's definitely a need for at least two replication
implementations: sync and async.  The space of requirements is wide
enough that there's not a one-size-fits-all solution.  You might care
to look at Darren Johnson's OSCON slides for more about this:
http://conferences.oreillynet.com/cs/os2002/view/e_sess/3280
I think there is room for several replication solutions for Postgres
(three or four, maybe).

It's difficult to say what will wind up in our core distribution.
A tightly linked implementation like Postgres-R is really impractical
as an add-on: you need enough mods of the core code that it'd be a
nightmare to try to maintain if it's not integrated into the regular
CVS tree.  So assuming that the Postgres-R project gets to the point
of usefulness, I'd vote in favor of integrating it.  On the other hand,
it's possible to do good stuff without touching the core code at all
(cf. PostgreSQL Inc's rserv) and in that case there may or may not be
any interest in integrating the code.  It's really gonna depend mostly
on the wishes of the people who develop the replication solutions,
I think.

I can foresee a time when there are one or two replication solutions
that are included in the base distribution and others are available
separately.  In fact, counting contrib/rserv that more or less describes
the state of affairs today.  What we need is more work on the available
solutions to improve their quality and general usefulness.

As for the point at hand: I'm fairly dubious that a common monitoring
API will be very useful, considering how different the possible
replication approaches are.  If Greg can prove me wrong, fine.  But
I don't want to see us artificially constraining replication solutions
by insisting that they meet some prespecified API.
        regards, tom lane


Re: Standard replication interface?

От
Greg Copeland
Дата:
On Thu, 2002-08-15 at 15:36, Tom Lane wrote:
> Well, I am, but I'm only speaking for myself here:
>

Fair enough.

> I think there is room for several replication solutions for Postgres
> (three or four, maybe).

If the ideal solution count is merely one with a maybe on two then I
tend to concur that any specification along these lines would *mostly*
be a waste.  On the other hand, if we can count three or more possible
replication solutions, IMHO, there seemingly would be merit is providing
some sort of defacto monitoring interface.

Seems the current difficulty is forecasting the future in this regard.
Perhaps other core developers would care to chime in and share their
vision?

> CVS tree.  So assuming that the Postgres-R project gets to the point
> of usefulness, I'd vote in favor of integrating it.  On the other hand,

I guess I should ask.  Do the developers foresee immediate usability
from this project or are we looking at something that's a year+ away?  I
don't think I have a problem helping guide what could be an interim
solution if the interim window were large enough.  In theory, monitoring
tools developed between now and the closing of the window could largely
continue to function without change.  That, of course, assumes that even
the end-run solutions would implement the interface as well.

The return on such a concept is that it allows generic monitoring tools
to mature while providing value now and in the future.  The end result
should be a stronger, more powerful tool base which matures while other
technologies are still being developed.

Another question along this line is, once something rolls into a core
position, does that obsolete all other existing implementations or
merely become the defacto in a bag of solutions?  Tom seems to hint at
the later.  If the answer is the former then that seemingly argues not
to worry about this...unless the window for usefulness and/or inclusion
is rather large.

> As for the point at hand: I'm fairly dubious that a common monitoring
> API will be very useful, considering how different the possible

Well, all replication scenarios have a lot in common.  They should,
after all, they are all doing the same thing.  Since the different
strategies for accomplishing replication are well understood, it seems
well within reason to assume that someone can put their brain around
this.

I can also imagine that the specification includes requirements as well
as optional facilities.  Certainly capability queries would further iron
out any gaps between differing solutions/implementations.

> replication approaches are.  If Greg can prove me wrong, fine.  But
> I don't want to see us artificially constraining replication solutions
> by insisting that they meet some prespecified API.

Hmmm.  I'm not sure how it would act as a constraining force.  To me,
this implies that any such specification would fail to evolve and could
not be revised based on feedback.  IMO, most specifications are regarded
as living documents.  While I can see that some specifications are set
in stone, I certainly am not so bold as to assert my crystal ball even
came with batteries.  ;)  That is, I assume some level of revision to an
initial specification would be required following real-world use.


Regards,
Greg Copeland



Re: Standard replication interface?

От
Tom Lane
Дата:
Greg Copeland <greg@copelandconsulting.net> writes:
> I guess I should ask.  Do the developers foresee immediate usability
> from [Postgres-R] or are we looking at something that's a year+ away?

Darren Johnson would be the man to answer that, but from what he said
at OSCON it sounded like we'd be seeing something useful by the end of
the year, with all the usual caveats about time actually being available
to work on it.

>> As for the point at hand: I'm fairly dubious that a common monitoring
>> API will be very useful, considering how different the possible

> Well, all replication scenarios have a lot in common.  They should,=20
> after all, they are all doing the same thing.

The end goal is approximately the same, but the mechanisms are totally
different, and that means that what you want to monitor is totally
different.

Perhaps the problem is that you're using the wrong word, and that what
you would like to standardize is not monitoring but administrative
functions.  For example, I'd classify selecting tables to be replicated
as an admin task.  Monitoring to me means something like "how much data
is in the queue to be pushed out to slave X?", which is a question that
already presupposes a heck of a lot about the implementation.

I could agree with a set of guidelines that say stuff like "if your
mechanism is capable of selecting individual tables to replicate,
then here's the preferred way to control that feature."  But I'm not
sure that there's enough common functionality for monitoring (in the
above sense) to be worth standardizing.
        regards, tom lane