Hot Standby conflict resolution handling

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Hot Standby conflict resolution handling
Дата
Msg-id CABOikdP48wU561h5FvWL_qvvAz=GAiAjmMsfk30YVXU_MqED0Q@mail.gmail.com
обсуждение исходный текст
Ответы Re: Hot Standby conflict resolution handling
Список pgsql-hackers
<br />I was trying some simple queries on a Hot Standby with streaming replication.<br /><br />On standby, I do
this:<br/>postgres=# begin transaction isolation level repeatable read;<br />BEGIN<br />postgres=# explain verbose
selectcount(b) from test WHERE a > 100000;<br /><br clear="all" />On master, I insert a bunch of tuples in the table
andrun VACUUM ANALYZE.<br />postgres=# INSERT INTO test VALUES (generate_series(110001,120000), 'foo', 1);<br />INSERT
010000<br />postgres=# VACUUM ANALYZE test;<br /> VACUUM<br /><br />After max_standby_streaming_delay, the standby
startscancelling the queries. I get an error like this on the standby:<br />postgres=# explain verbose select count(b)
fromtest WHERE a > 100000;<br />FATAL:  terminating connection due to conflict with recovery<br /> DETAIL:  User
querymight have needed to see row versions that must be removed.<br />HINT:  In a moment you should be able to
reconnectto the database and repeat your command.<br />server closed the connection unexpectedly<br />     This
probablymeans the server terminated abnormally<br />    before or while processing the request.<br />The connection to
theserver was lost. Attempting reset: Succeeded.<br /><br />So I've couple questions/concerns here<br /><br />1. Why to
throwa FATAL error here ? A plain ERROR should be enough to abort the transaction. There are four places in
ProcessInterrupts()where we throw these kind of errors and three of them are FATAL.<br /><br />2911             if
(DoingCommandRead)<br/> 2912                 ereport(FATAL,<br />2913                        
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),<br/>2914                          errmsg("terminating connection due to
conflictwith recovery"),<br /> 2915                          errdetail_recovery_conflict(),<br />2916                 
errhint("Ina moment you should be able to reconnect to the"<br />2917                          " database and repeat
yourcommand.")));<br /> 2918             else         <br />2919                 ereport(ERROR,<br
/>2920                        (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),<br />2921                  errmsg("canceling
statementdue to conflict with recovery"),<br /> 2922                          errdetail_recovery_conflict()));<br /><br
/>Inthis particular test case, the backend is DoingCommandRead and that forces a FATAL error. I'm not sure why is that
required. And even if its necessary, IMHO we should add a comment explaining that. In the other two places where we
throwFATAL, one looks legitimate, but I'm not sure about the other.<br /><br />2836         else if
(RecoveryConflictPending&& RecoveryConflictRetryable)<br />2837         {<br />2838            
pgstat_report_recovery_conflict(RecoveryConflictReason);<br/>2839             ereport(FATAL,<br />
2840                    (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),<br />2841               errmsg("terminating
connectiondue to conflict with recovery"),<br />2842                      errdetail_recovery_conflict()));<br />
2843        }<br />2844         else if (RecoveryConflictPending)<br />2845         {<br />2846             /*
Currentlythere is only one non-retryable recovery conflict */<br />2847             Assert(RecoveryConflictReason ==
PROCSIG_RECOVERY_CONFLICT_DATABASE);<br/> 2848             pgstat_report_recovery_conflict(RecoveryConflictReason);<br
/>2849            ereport(FATAL,<br />2850                     (errcode(ERRCODE_DATABASE_DROPPED),<br
/>2851              errmsg("terminating connection due to conflict with recovery"),<br /> 2852                     
errdetail_recovery_conflict()));<br/>2853         }<br /><br />AFAICS the first of these should be ereport(ERROR).
Otherwiseirrespective of whether RecoveryConflictRetryable is true or false, we will always ereport(FATAL).<br /><br
/>2.For my test, the error message itself looks wrong because I did not actually remove any rows on the master. VACUUM
probablymarked a bunch of pages as all-visible and that should have triggered a conflict on the standby in order to
supportindex-only scans. IMHO we should  improve the error message to avoid any confusion. Or we can add a new
ProcSignalReasonto differentiate between a cancel due to clean up vs visibilitymap_set() operation.<br /><br />BTW, my
currentset up is using 9.2.1, but I don't see any code changes in the master. So I would assume the issues will exist
theretoo.<br /><br />Thanks,<br />Pavan<br /><br />-- <br />Pavan Deolasee<br /><a
href="http://www.linkedin.com/in/pavandeolasee"target="_blank">http://www.linkedin.com/in/pavandeolasee</a><br /> 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Master fails to build without ldap headers
Следующее
От: Hannu Krosing
Дата:
Сообщение: Re: ALTER TABLE ... NOREWRITE option