Обсуждение: Some newbie questions

Поиск
Список
Период
Сортировка

Some newbie questions

От
M2Y
Дата:
Hello,

Could you plz answer the following questions of a newbie:

What is a good way to start understanding backend(postgres) code? Is
there any documentation available especially for developers?

What is commit log and why it is needed?

Why does a replication solution need log shipping and why cant we just
ship the transaction statements to a standby node?

to be continued  ... ;)

Thanks,
Srinivas


Re: Some newbie questions

От
Shane Ambler
Дата:
M2Y wrote:
> Hello,
> 
> Could you plz answer the following questions of a newbie:
> 
> What is a good way to start understanding backend(postgres) code? Is 
> there any documentation available especially for developers?

Most of the developer info is within comments in the code itself.
Another place to start is http://www.postgresql.org/developer/coding

> What is commit log and why it is needed?

To achieve ACID (Atomic, Consistent, Isolatable, Durable)
The changes needed to complete a transaction are saved to the commit log
and flushed to disk, then the data files are changed. If the power goes
out during the data file modifications the commit log can be used to
complete the changes without losing any data.

> Why does a replication solution need log shipping and why cant we 
> just ship the transaction statements to a standby node?

Depends on what you wish to achieve. They are two ways to a similar
solution.
Log shipping is part of the core code with plans to make the duplicate
server be able to satisfy select queries.
Statement based replication is offered by other options such as slony.

Each has advantages and disadvantages. Transaction logs are part of
normal operation and can be copied to another server in the background
without adding load or delays to the master server.

Statement based replication has added complexity of waiting for the
slaves to duplicate the transaction and handling errors from a slave
applying the transaction. They also tend to have restrictions when it
comes to replicating DDL changes - implemented as triggers run from
INSERT/UPDATE not from CREATE/ALTER TABLE.



-- 

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz


Re: Some newbie questions

От
M2Y
Дата:
Thanks Shane for your response...

On Sep 7, 11:52 pm, pgsql@Sheeky.Biz (Shane Ambler) wrote:
> > What is a good way to start understanding backend(postgres) code? Is
> > there any documentation available especially for developers?
>
> Most of the developer info is within comments in the code itself.
> Another place to start ishttp://www.postgresql.org/developer/coding
>
I have seen this link. But, I am looking(or hoping) for any design doc
or technical doc which details what is happening under the hoods as it
will save a lot of time to catchup the main stream.

> > What is commit log and why it is needed?
>
> To achieve ACID (Atomic, Consistent, Isolatable, Durable)
> The changes needed to complete a transaction are saved to the commit log
> and flushed to disk, then the data files are changed. If the power goes
> out during the data file modifications the commit log can be used to
> complete the changes without losing any data.

This, I think, is transaction log or XLog. My question is about CLog
in which two bits are there for each transaction which will denote the
status of transaction. Since there is XLog from which we can determine
what changes we have to redo and undo, what is the need for this CLog.

>
> > Why does a replication solution need log shipping and why cant we
> > just ship the transaction statements to a standby node?
>
> Depends on what you wish to achieve. They are two ways to a similar
> solution.
> Log shipping is part of the core code with plans to make the duplicate
> server be able to satisfy select queries.
> Statement based replication is offered by other options such as slony.
>
> Each has advantages and disadvantages. Transaction logs are part of
> normal operation and can be copied to another server in the background
> without adding load or delays to the master server.
>
> Statement based replication has added complexity of waiting for the
> slaves to duplicate the transaction and handling errors from a slave
> applying the transaction. They also tend to have restrictions when it
> comes to replicating DDL changes - implemented as triggers run from
> INSERT/UPDATE not from CREATE/ALTER TABLE.

I agree. Assuming that both master and backup are running same
versions of the server and both are in sync, why cant we just send the
command statements to standby in the main backend loop(before parsing)
and let the standby ignore the SELECT kind of statements.

I am a beginner ... plz forgive my ignorance and plz provide some
clarity so that I can understand the system better.

Thanks,
Srinivas


Re: Some newbie questions

От
Tom Lane
Дата:
M2Y <mailtoyahoo@gmail.com> writes:
> On Sep 7, 11:52�pm, pgsql@Sheeky.Biz (Shane Ambler) wrote:
>> Most of the developer info is within comments in the code itself.
>> Another place to start ishttp://www.postgresql.org/developer/coding
>> 
> I have seen this link. But, I am looking(or hoping) for any design doc
> or technical doc which details what is happening under the hoods as it
> will save a lot of time to catchup the main stream.

Well, you should certainly not neglect
http://developer.postgresql.org/pgdocs/postgres/internals.html

Also note that many subtrees of the source code contain README files
with assorted overview material.
        regards, tom lane


Re: Some newbie questions

От
Alvaro Herrera
Дата:
M2Y escribió:

> On Sep 7, 11:52 pm, pgsql@Sheeky.Biz (Shane Ambler) wrote:
> > > What is a good way to start understanding backend(postgres) code? Is
> > > there any documentation available especially for developers?

> > > What is commit log and why it is needed?
> >
> > To achieve ACID (Atomic, Consistent, Isolatable, Durable)
> > The changes needed to complete a transaction are saved to the commit log
> > and flushed to disk, then the data files are changed. If the power goes
> > out during the data file modifications the commit log can be used to
> > complete the changes without losing any data.
> 
> This, I think, is transaction log or XLog. My question is about CLog
> in which two bits are there for each transaction which will denote the
> status of transaction. Since there is XLog from which we can determine
> what changes we have to redo and undo, what is the need for this CLog.

That's correct -- what Shane is describing is the transaction log
(usually know here as WAL).  However, this xlog is write-only (except in
the case of a crash); clog is read-write, and must be fast to query
since it's used very frequently to determine visibility of each tuple.
Perhaps what you need to read is the chapter on our MVCC implementation,
which relies heavily on clog.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Some newbie questions

От
Greg Smith
Дата:
On Sun, 7 Sep 2008, M2Y wrote:

> Why does a replication solution need log shipping and why cant we just
> ship the transaction statements to a standby node?

Here's one of the classic examples of why that doesn't work:

create table x (d decimal);
insert into x values (random());

If you execute those same statements on two different nodes, they will end 
up with different values for the random number and therefore the nodes 
won't match anymore.  A similar issue shows up if you use functions that 
check the current system time, that will be slightly different between the 
two:  even if the clocks are perfectly synced, by the time the standy 
received the transaction it will be later than the original.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD