Re: Inconsistent error handling in START_REPLICATION command

Поиск
Список
Период
Сортировка
От Shulgin, Oleksandr
Тема Re: Inconsistent error handling in START_REPLICATION command
Дата
Msg-id CACACo5T+Ri140A3GqpB6SrivXD1cSHhtaymvst1J8nOZxgKKGg@mail.gmail.com
обсуждение исходный текст
Ответ на Inconsistent error handling in START_REPLICATION command  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Ответы Re: Inconsistent error handling in START_REPLICATION command  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Re: Inconsistent error handling in START_REPLICATION command  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers
On Tue, Jan 5, 2016 at 10:39 AM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
Hackers,

It looks like there's an inconsistency in error handling during START_REPLICATION command of replication protocol:

$ psql postgres://localhost/psycopg2test?replication=database
psql (9.6devel)
Type "help" for help.

psycopg2test=# IDENTIFY_SYSTEM;
      systemid       | timeline |  xlogpos  |    dbname    
---------------------+----------+-----------+--------------
 6235978519197579707 |        1 | 0/2CE0F78 | psycopg2test
(1 row)

psycopg2test=# START_REPLICATION SLOT "TEST1" LOGICAL 0/0 ("invalid" "value");
ERROR:  syntax error

1) syntax errors are reported and client can retry with corrected command:

psycopg2test=# START_REPLICATION SLOT "TEST1" LOGICAL 0/0 ("invalid" 'value');
ERROR:  replication slot name "TEST1" contains invalid character
HINT:  Replication slot names may only contain lower case letters, numbers, and the underscore character.

2) further errors are reported and we can retry:

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid" 'value');
ERROR:  replication slot "test1" does not exist

psycopg2test=# CREATE_REPLICATION_SLOT "test1" LOGICAL "test_decoding";
 slot_name | consistent_point | snapshot_name | output_plugin 
-----------+------------------+---------------+---------------
 test1     | 0/2CF5340        | 0000088C-1    | test_decoding
(1 row)

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid" 'value');
unexpected PQresultStatus: 8

The last command results in the following output sent to the server log:

ERROR:  option "invalid" = "value" is unknown
CONTEXT:  slot "test1", output plugin "test_decoding", in the startup callback

But the client has no way to learn about the error, nor can it restart with correct one (the server has entered COPY protocol mode):

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0;
PQexec not allowed during COPY BOTH

I recall Craig Rigner mentioning this issue in context of the pglogical_output plugin, but I thought that was something to do with the startup packet.  The behavior above doesn't strike me as very consistent, we should be able to detect and report errors during output plugin startup and let the client retry with the corrected command as we do for syntax or other errors.

I didn't look in the code yet, but if someone knows off top of the head the reason to this behavior, I'd be glad to learn it.

Looks like the attached patch fixes it for me:

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid" 'value');
ERROR:  option "invalid" = "value" is unknown
CONTEXT:  slot "test1", output plugin "test_decoding", in the startup callback

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0;
unexpected PQresultStatus: 8

There was a similar problem with PHYSICAL replication--if the requested start LSN is in the future, the error is not reported to the client and connection is unusable:

psycopg2test=# START_REPLICATION SLOT "test2" PHYSICAL 1FF0/0;
unexpected PQresultStatus: 8

psycopg2test=# START_REPLICATION SLOT "test2" PHYSICAL 0/0;
PQexec not allowed during COPY BOTH

After the patch:

psycopg2test=# START_REPLICATION SLOT "test2" PHYSICAL 1FF0/0;
ERROR:  requested starting point 1FF0/0 is ahead of the WAL flush position of this server 0/2DE6C28

psycopg2test=# START_REPLICATION SLOT "test2" PHYSICAL 0/0;
unexpected PQresultStatus: 8


Actually, would it make sense to move the block that sends CopyBothResponse message to the WalSndLoop() instead?  I think you can't get any closer than that...

--
Alex

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Christoph Berg
Дата:
Сообщение: Re: Building pg_xlogdump reproducibly
Следующее
От: Haribabu Kommi
Дата:
Сообщение: Re: Function and view to retrieve WAL receiver status