Re: BUG #13910: synchronous replication not sync data occasionally

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BUG #13910: synchronous replication not sync data occasionally
Дата
Msg-id CAEepm=1gJJRNKuSK9Tn7dy=XTuEOV2p8zgKDXFnsMG2bUMbAKA@mail.gmail.com
обсуждение исходный текст
Ответ на BUG #13910: synchronous replication not sync data occasionally  (wangzilong@yeezon.com)
Список pgsql-bugs
On Thu, Feb 4, 2016 at 1:45 PM,  <wangzilong@yeezon.com> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      13910
> Logged by:          chris
> Email address:      wangzilong@yeezon.com
> PostgreSQL version: 9.5.0
> Operating system:   CentOS 6
> Description:
>
> I am suffering this problem since postgresql v9.3.
> Most of the time, synchronous replication works great, occasionally, the
> standby is not synchronous.
>
> Demo Code:
> env: ruby 2.2.2, centos 6, postgresql95-devel
> require 'sequel'
> require 'pg'
>
> Sequel::Model.plugin(:schema)
> Sequel::Model.plugin(:validation_helpers)
> Sequel::Model.raise_on_save_failure = true
> Sequel::Model.db = DB = Sequel.postgres(:host => '192.168.2.126',
>                                          :user => 'test',
>                                          :password => '123456',
>                                          :database => 'test',
>                                          :pool_timeout => 30,
>
> :servers=>{:read_only=>{:host=>'192.168.2.113'}})
> / *
> CREATE TABLE public.test_data
> (
>   id integer NOT NULL DEFAULT nextval('test_data_id_seq'::regclass),
>   field_one text,
>   field_two text,
>   field_three text,
>   field_four text,
>   field_five text,
>   CONSTRAINT test_data_pkey PRIMARY KEY (id)
> )
> WITH (
>   OIDS=FALSE
> );
> ALTER TABLE public.test_data
>   OWNER TO test;
> * /
> class TestData < Sequel::Model(:test_data)
> end
> (1..10000).each do |a|
>   id = ''
>   #DB.transaction do
>     n = TestData.new(:field_one => 'hello')
>     n.save
>     id = n.id
>   #end
>   d = TestData.first(:id => id)
>   if d.nil?
>     puts "#{id} first nil catch"
>     (1..100).each do |i|
>       t = TestData.first(:id => id)
>       if t.nil?
>         puts "#{id} time:#{i.to_s} nil catch"
>       else
>         break
>       end
>     end
>   end
> end
>
> puts 'done'
>
> If write transaction is synchronous to standby, as the document shows, the
> program should output 'done'.
> But, there always 1 or 2 record, are not synchronous. Retry them several
> times could get correct data.
> There is no error in program or postgresql.
>
> The program shows, master does not confirm the write transaction synchronous
> to standby, and returns the result. It's not correct according to the
> document.

I don't speak Ruby but I think you are saying that you are committing
a transaction on the primary and testing if you can see it immediately
afterwards on a hot standby that you have configured for synchronous
replication.  Postgres's existing replication doesn't support that:
synchronous replication means that COMMIT won't return control to you
until the data is on disk in the WAL on the standby, so you can't lose
it, but that doesn't mean it's visible to queries yet.  It becomes
visible slightly later when the recovery process applies the WAL
records.  That means that it's useful for making sure that you don't
tell a customer that they've successfully bought $1 million worth of
shares before you have written down that fact in two data centres, but
it isn't useful for load balancing reads that require up to date
information.

This is a problem that I'm interested in, and have written a proposal
to address over here (though this is just one of several ways to solve
the problem):

https://commitfest.postgresql.org/9/487/

Today if you want to guarantee visibility of a transaction you've
heard about (because you just did it and it returned control, or
because some other process told you about it), you might consider
something like this:

1.  After you commit on the primary, ask for the latest LSN position
with pg_current_xlog_location().

2.  Before reading anything on the standby, ask for the latest applied
LSN position with pg_last_xlog_replay_location().  If the location
reported in 2 is < the location reported in 1, then wait for a bit and
try step 2 again.

--
Thomas Munro
http://www.enterprisedb.com

В списке pgsql-bugs по дате отправления:

Предыдущее
От: wangzilong@yeezon.com
Дата:
Сообщение: BUG #13910: synchronous replication not sync data occasionally
Следующее
От: hein@bitechsystems.co.za
Дата:
Сообщение: BUG #13909: String concat error with CITEXT after 9.5.0 upgrade.