Re: Parallell Optimizer

Поиск
Список
Период
Сортировка
От Hannu Krosing
Тема Re: Parallell Optimizer
Дата
Msg-id 51B7B012.9050805@2ndQuadrant.com
обсуждение исходный текст
Ответ на Re: Parallell Optimizer  (Tatsuo Ishii <ishii@postgresql.org>)
Ответы Re: Parallell Optimizer  (Tatsuo Ishii <ishii@postgresql.org>)
Список pgsql-hackers
On 06/12/2013 01:01 AM, Tatsuo Ishii wrote:
>>>> Please explain what you mean by the word "true" used here.
>>> In another word, "eager replication".
>> Do you mean something along these lines :
>>
>> "Most synchronous or eager replication solutions do conflict prevention,
>> while asynchronous solutions have to do conflict resolution. For instance,
>> if a record is changed on two nodes simultaneously, an eager replication
>> system would detect the conflict before confirming the commit and abort
>> one of the transactions. A lazy replication system would allow both
>> transactions to commit and run a conflict resolution during
>> resynchronization. "
>>
>> ?
> No, I'm not talking about conflict resolution.
>
> From http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf:
> ----------------------------------------------
> Eager or Lazy Replication?
>  Eager replication:
>  keep all replicas synchronized by updating all
>  replicas in a single transaction
Ok, so you are talking about distributed transactions ?

In our current master-slave replication, how would it be different from
current synchronous replication ?

Or does it make sense only in case of multimaster replication ?

The main problems with "keep all replicas synchronized by updating all
replicas in a single transaction"
are performance and reliability.

That is, the write performance has to be less than for single server and
failure of a single replica brings down the whole cluster.

>
>  Lazy replication:
>  asynchronously propagate replica updates to
>  other nodes after replicating transaction commits
> ----------------------------------------------
>
> Parallel query execution needs to assume that each node synchronized
> in a commit, otherwise the summary of each query result executed on
> each node is meaningless.
>
>> IMO it is possible to do this "easily" once BDR has reached the state
>> where you
>> can do streaming apply.
>> That is, you replay actions on other hosts as they
>> are logged, not after the transaction commits. Doing it this way you can
>> wait
>> any action to successfully complete a full circle before committing it
>> in source.
>>
>> Currently main missing part in doing this is autonomous transactions.
>> It can in theory be done by opening an extra backend for each incoming
>> transaction but you will need really big number of backends and also you
>> have extra overhead from interprocess communications.
> Thanks for a thought about the conflict resolution in BDR.
>
> BTW, if we seriously think about implementing the parallel query
> execution, we need to find a way to distribute data among each node,
> that requires partial copy of table. I thinl that would a big
> challenge for WAL based replication.
Moving partial query results around is completely different problem from
replication.

We should not mix these.

If on the other hand think about sharding (that is having a table
partitioned
between nodes) then this can be done in BDR.

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Parallell Optimizer
Следующее
От: Dean Rasheed
Дата:
Сообщение: Re: how to find out whether a view is updatable