Re: [HACKERS] patch proposal

Поиск
Список
Период
Сортировка
От Venkata B Nagothi
Тема Re: [HACKERS] patch proposal
Дата
Msg-id CAEyp7J95sMKJam8KeMytDCsR53B5pCsvvbxTRZH8ZWuEBECheA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] patch proposal  (David Steele <david@pgmasters.net>)
Список pgsql-hackers

On Tue, Jan 31, 2017 at 6:49 AM, David Steele <david@pgmasters.net> wrote:
On 1/27/17 3:19 AM, Venkata B Nagothi wrote:

> I will be adding the tests in
> src/test/recovery/t/003_recovery_targets.pl
> <http://003_recovery_targets.pl>. My tests would look more or less
> similar to recovery_target_action. I do not see any of the tests added
> for the parameter recovery_target_action ? Anyways, i will work on
> adding tests for recovery_target_incomplete.

Do you know when those will be ready?

Apologies for the delayed response. Actually, I am working through to add the tests for both the patches and started with adding the tests to recovery_target_incomplete parameter.
I have hit an issue while adding the tests which i have explained below.
 
>
>
>     4) I'm not convinced that fatal errors by default are the way to go.
>     It's a pretty big departure from what we have now.
>
>
> I have changed the code to generate ERRORs instead of FATALs. Which is
> in the patch recoveryStartPoint.patch

I think at this point in recovery any ERROR at all will probably be
fatal.  Once you have some tests in place we'll know for sure.

Sure. I can add the tests for the first patch.

Either way, the goal would be to keep recovery from proceeding and let
the user adjust their targets.  Since this is a big behavioral change
which will need buy in from the community.

Sure. Agreed.
 
>     Also, I don't think it's very intuitive how recovery_target_incomplete
>     works.  For instance, if I set recovery_target_incomplete=promote and
>     set recovery_target_time, but pick a backup after the specified time,
>     then there will be an error "recovery_target_time %s is older..." rather
>     than a promotion.
>
>
> Yes, that is correct and that is the expected behaviour. Let me explain -

My point was that this combination of options could lead to confusion on
the part of users.  However, I'd rather leave that alone for the time
being and focus on the first patch.

Sure. 
 
> I have split the patch into two different
> pieces. One is to determine if the recovery can start at all and other
> patch deals with the incomplete recovery situation.

I think the first patch looks promising and I would rather work through
that before considering the second patch.  Once there are tests for the
first patch I will complete my review.

Sure. Will submit the first patch along with tests once i can get through the issue i am facing.

The issue i am facing while writing the tests

Below is the problem i am facing while adding the tests to 003_recovery_targets.pl

1. Test case : Test the Incomplete recovery with parameters restore_command, recovery_target_xid and recovery_target_incomplete ='shutdown' (option 'pause' and 'promote' work fine)

2. Expected test Result : Standby node successfully shuts-down if the recovery process reaches mid-way and does not reach the intended recovery target due to unavailability of WALs

3. All good. Everything works as expected. The problem is that, the test exits with the exit code 256, bails out and shows-up as "pg_ctl failed".

Is there anyway, i can ensure the test does not bail-out and exits with the exit code '0' when you do "pg_ctl start" and the server starts and shuts-down for any reason ?

If you look at the log file, pg_ctl actually starts successfully and then shuts down cleanly because intended recovery target is not reached. The messages in the log file are also appropriate. 
Since it is a clean-start and clean-shutdown (which means the test is successful), the test should return a success, which is not happening.

Below is the log (a test for the second patch). I believe Perl generates an error code 256 because pg_ctl responds with an error - "pg_ctl: could not start the server". 
I would need my test case to return a success in this scenario.  Any suggestions would be great.

[dba@buildhost ~]$  /data/PostgreSQL-Repo/postgresql/tmp_install/disk1/pginstall/pgsql-10-dev-master-xlog-2-wal/bin/pg_ctl -D /data/PostgreSQL-Repo/postgresql/src/test/recovery/tmp_check/data_pitr_3_U_oA/pgdata start
waiting for server to start....2017-02-22 12:06:41.773 AEDT [23858] LOG:  database system was interrupted while in recovery at log time 2017-02-15 16:40:41 AEDT
2017-02-22 12:06:41.773 AEDT [23858] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2017-02-22 12:06:41.773 AEDT [23858] LOG:  starting point-in-time recovery to XID 559
2017-02-22 12:06:41.782 AEDT [23858] LOG:  restored log file "000000010000000000000002" from archive
2017-02-22 12:06:41.784 AEDT [23858] LOG:  redo starts at 0/2000028
2017-02-22 12:06:41.794 AEDT [23858] LOG:  restored log file "000000010000000000000003" from archive
2017-02-22 12:06:41.814 AEDT [23858] LOG:  restored log file "000000010000000000000004" from archive
2017-02-22 12:06:41.835 AEDT [23858] LOG:  restored log file "000000010000000000000005" from archive
2017-02-22 12:06:41.858 AEDT [23858] LOG:  restored log file "000000010000000000000006" from archive
2017-02-22 12:06:41.879 AEDT [23858] LOG:  restored log file "000000010000000000000007" from archive
2017-02-22 12:06:41.891 AEDT [23858] LOG:  consistent recovery state reached at 0/8000000
2017-02-22 12:06:41.891 AEDT [23857] LOG:  database system is ready to accept read only connections
cp: cannot stat ‘/data/PostgreSQL-Repo/postgresql/src/test/recovery/tmp_check/data_master_cyOw/archives/000000010000000000000008’: No such file or directory
2017-02-22 12:06:41.893 AEDT [23858] LOG:  recovery has reached end-of-the-wal and has not reached the recovery target yet
2017-02-22 12:06:41.893 AEDT [23858] HINT:  This could be due to corrupt or missing WAL files.
        All the WAL files needed for the recovery must be available to proceed to the recovery target Or you might need to choose an earlier recovery target.
2017-02-22 12:06:41.893 AEDT [23858] LOG:  shutdown at end-of-the-wal
2017-02-22 12:06:41.893 AEDT [23857] LOG:  startup process (PID 23858) exited with exit code 2
2017-02-22 12:06:41.893 AEDT [23857] LOG:  terminating any other active server processes
2017-02-22 12:06:41.894 AEDT [23857] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

I will be facing the same problem while adding the tests for the first patch. In the first patch, the recovery will not proceed and database will shutdown if the specified recovery target is not legitimate.

To summarise, i would require the tests to exit with an exit code "0" when the pg_ctl could not the start the server or shuts down the server when "pg_ctl start" is issued in certain recovery scenarios.

Thanks,

Venkata B N
Database Consultant

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Speedup twophase transactions
Следующее
От: Venkata B Nagothi
Дата:
Сообщение: Re: [HACKERS] patch proposal