Обсуждение: [GENERAL] pg_rewind copy so much data

Поиск
Список
Период
Сортировка

[GENERAL] pg_rewind copy so much data

От
Hung Phan
Дата:
Hi,

I have tested pg_rewind (ver 9.5) with the following scenario:

- one master and one slave (total size of each server is more than 4GB)
- set wal_log_hint=on and restart both
- stop master, promote slave
- start old master again (now two servers have diverged)
- stop old master, run pg_rewind with progress option

The pg_rewind ran successfully but I saw it copied more than 4GB (4265891 kB copied). So I wonder there was very minor difference between two servesr but why did pg_rewind copy almost all data of new master?

Regards,

Hung Phan

Re: [GENERAL] pg_rewind copy so much data

От
Michael Paquier
Дата:
On Wed, Sep 13, 2017 at 12:41 PM, Hung Phan <hungphan227@gmail.com> wrote:
> I have tested pg_rewind (ver 9.5) with the following scenario:
>
> - one master and one slave (total size of each server is more than 4GB)
> - set wal_log_hint=on and restart both
> - stop master, promote slave
> - start old master again (now two servers have diverged)
> - stop old master, run pg_rewind with progress option

That's a good flow. Don't forget to run a manual checkpoint after
promotion to update the control file of the promoted standby so as
pg_rewind is able to identify the timeline difference between the
source and the target servers.

> The pg_rewind ran successfully but I saw it copied more than 4GB (4265891 kB
> copied). So I wonder there was very minor difference between two servers but
> why did pg_rewind copy almost all data of new master?

Without knowing exactly the list of things that have been registered
as things to copy from the active source to the target, it is hard to
give a conclusion. But my bet here is that you let the target server
online long enough that it had a bunch of block updated, causing more
relation blocks to be copied from the source because more efforts
would be needed to re-sync it. That's only an assumption without data
with clear numbers, numbers that could be found using the --debug
messages of pg_rewind.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_rewind copy so much data

От
Hung Phan
Дата:
Hi,

Thanks for your response. I have just replayed switching master and slave once again:

- one master and one slave (total size of each server is more than 4GB). Currently the last log of the slave is "started  streaming WAL from primary at 2/D6000000 on timeline 10".

- stop master, the slave show below logs:
          replication terminated by primary server
          End of WAL reached on timeline 10 at 2/D69304D0
          Invalid record length at 2/D69304D0
          could not connect to primary server

- promote the slave:
          receive promote request
          redo done at 2/D6930460
          selected new timeline ID: 11
          archive recovery complete
          MultiXact member wraparound protections are now enabled
          database system is ready to accept connections
          autovacuum launcher started

- start and stop old master, then run pg_rewind (all are executed immediately after promoting the slave). Logs of pg_rewind:
          servers diverged at WAL position 2/D69304D0 on timeline 10
          rewinding from last common checkpoint at 2/D6930460 on timeline 10
          reading source file list
          reading target file list
          reading WAL in target
          need to copy 4168 MB (total source directory is 4186 MB)
          4268372/4268372 kB (100%) copied
          creating backup label and updating control file
          syncing target data directory
          Done!

If I run pg_rewind with debug option, it just show additional bunch of files copied in directories like base or pg_tblspc. I claim that there is no data inserted of modified from the first step. The only difference between two server is caused by restarting old master.

Thanks and Regards,

Hung Phan



On Wed, Sep 13, 2017 at 10:48 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Sep 13, 2017 at 12:41 PM, Hung Phan <hungphan227@gmail.com> wrote:
> I have tested pg_rewind (ver 9.5) with the following scenario:
>
> - one master and one slave (total size of each server is more than 4GB)
> - set wal_log_hint=on and restart both
> - stop master, promote slave
> - start old master again (now two servers have diverged)
> - stop old master, run pg_rewind with progress option

That's a good flow. Don't forget to run a manual checkpoint after
promotion to update the control file of the promoted standby so as
pg_rewind is able to identify the timeline difference between the
source and the target servers.

> The pg_rewind ran successfully but I saw it copied more than 4GB (4265891 kB
> copied). So I wonder there was very minor difference between two servers but
> why did pg_rewind copy almost all data of new master?

Without knowing exactly the list of things that have been registered
as things to copy from the active source to the target, it is hard to
give a conclusion. But my bet here is that you let the target server
online long enough that it had a bunch of block updated, causing more
relation blocks to be copied from the source because more efforts
would be needed to re-sync it. That's only an assumption without data
with clear numbers, numbers that could be found using the --debug
messages of pg_rewind.
--
Michael

Re: [GENERAL] pg_rewind copy so much data

От
Michael Paquier
Дата:
On Wed, Sep 13, 2017 at 2:21 PM, Hung Phan <hungphan227@gmail.com> wrote:
> If I run pg_rewind with debug option, it just show additional bunch of files
> copied in directories like base or pg_tblspc. I claim that there is no data
> inserted of modified from the first step. The only difference between two
> server is caused by restarting old master.

That's exactly this kind of data that would allow to conclude where
the data comes from, and you are not showing it, so it is not possible
to make a clear conclusion. Are known the file paths, the offset
position in the file where a chunk of data is taken, and the chunk
size. If you see for example a bunch of 8kB blocks being copied, this
means that some extra load ran on the master before it was stopped,
and before running the rewind. Are you sure that you don't have large
log files for example?

I would be curious to know what is the version of Postgres 9.5 you are
using as well.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_rewind copy so much data

От
Hung Phan
Дата:
I use ver 9.5.3. I have just run again and get the debug log. It is very long so I attach in mail

On Wed, Sep 13, 2017 at 5:57 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Sep 13, 2017 at 2:21 PM, Hung Phan <hungphan227@gmail.com> wrote:
> If I run pg_rewind with debug option, it just show additional bunch of files
> copied in directories like base or pg_tblspc. I claim that there is no data
> inserted of modified from the first step. The only difference between two
> server is caused by restarting old master.

That's exactly this kind of data that would allow to conclude where
the data comes from, and you are not showing it, so it is not possible
to make a clear conclusion. Are known the file paths, the offset
position in the file where a chunk of data is taken, and the chunk
size. If you see for example a bunch of 8kB blocks being copied, this
means that some extra load ran on the master before it was stopped,
and before running the rewind. Are you sure that you don't have large
log files for example?

I would be curious to know what is the version of Postgres 9.5 you are
using as well.
--
Michael

Вложения

Re: [GENERAL] pg_rewind copy so much data

От
Hung Phan
Дата:
I use ver 9.5.3. I have just run again and get the debug log. It seems to be that I cannot send such big file to postgresql mail so I copy some parts here:

fetched file "global/pg_control", length 8192
fetched file "pg_xlog/0000000D.history", length 475
servers diverged at WAL position 2/D69820C8 on timeline 12
rewinding from last common checkpoint at 2/D6982058 on timeline 12
backup_label.old (COPY)
base/1/1247_fsm (COPY)
base/1/1247_vm (COPY)
base/1/1249_fsm (COPY)
base/1/1249_vm (COPY)
base/1/1255_fsm (COPY)
base/1/1255_vm (COPY)
base/1/1259_fsm (COPY)
base/1/1259_vm (COPY)
base/1/13125_fsm (COPY)
base/1/13125_vm (COPY)
base/1/13130_fsm (COPY)
base/1/13130_vm (COPY)
base/1/13135_fsm (COPY)
base/1/13135_vm (COPY)
base/1/13140_fsm (COPY)
base/1/13140_vm (COPY)
base/1/13145_fsm (COPY)
base/1/13145_vm (COPY)
base/1/13150_fsm (COPY)
base/1/13150_vm (COPY)
base/1/1417_vm (COPY)
base/1/1418_vm (COPY)
base/1/2328_vm (COPY)
base/1/2336_vm (COPY)
base/1/2600_fsm (COPY)


global/3592_vm (COPY)
global/4060_vm (COPY)
global/6000_vm (COPY)
global/pg_control (COPY)
global/pg_filenode.map (COPY)
global/pg_internal.init (COPY)
pg_clog/0000 (COPY)
pg_clog/0001 (COPY)
pg_clog/0002 (COPY)
pg_hba.conf (COPY)
pg_ident.conf (COPY)
pg_multixact/members/0000 (COPY)
pg_multixact/offsets/0000 (COPY)
pg_notify/0000 (COPY)
pg_stat_tmp/db_0.stat (COPY)
pg_stat_tmp/db_13294.stat (COPY)
pg_stat_tmp/global.stat (COPY)
pg_subtrans/0025 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/112 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/113 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1247 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1247_fsm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1247_vm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1249 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1249_fsm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1249_vm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1255 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1255_fsm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1255_vm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1259 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1259_fsm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1259_vm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13125 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13125_fsm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13125_vm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13127 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13129 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13130 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13130_fsm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13130_vm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13132 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13134 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13135 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13135_fsm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13135_vm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13137 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13139 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13140 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13140_fsm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13140_vm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13142 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13144 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13145 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13145_fsm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13145_vm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13147 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13149 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13150 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13150_fsm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13150_vm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13152 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13154 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13155 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13157 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/13159 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1417 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1417_vm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1418 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/1418_vm (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/16401 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/16422 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/16440 (COPY)



received chunk for file "base/1/2603_fsm", offset 0, size 24576
received chunk for file "base/1/2603_vm", offset 0, size 8192
received chunk for file "base/1/2605_fsm", offset 0, size 24576
received chunk for file "base/1/2605_vm", offset 0, size 8192
received chunk for file "base/1/2606_fsm", offset 0, size 24576
received chunk for file "base/1/2606_vm", offset 0, size 8192
received chunk for file "base/1/2607_fsm", offset 0, size 24576
received chunk for file "base/1/2607_vm", offset 0, size 8192
received chunk for file "base/1/2608_fsm", offset 0, size 24576
received chunk for file "base/1/2608_vm", offset 0, size 8192
received chunk for file "base/1/2609_fsm", offset 0, size 24576
received chunk for file "base/1/2609_vm", offset 0, size 8192
received chunk for file "base/1/2610_fsm", offset 0, size 24576
received chunk for file "base/1/2610_vm", offset 0, size 8192
received chunk for file "base/1/2612_fsm", offset 0, size 24576
received chunk for file "base/1/2612_vm", offset 0, size 8192
received chunk for file "base/1/2615_fsm", offset 0, size 24576
received chunk for file "base/1/2615_vm", offset 0, size 8192
received chunk for file "base/1/2616_fsm", offset 0, size 24576
received chunk for file "base/1/2616_vm", offset 0, size 8192



received chunk for file "pg_tblspc/16386/PG_9.5_201510051/16387/16651", offset 0, size 188416
received chunk for file "pg_tblspc/16386/PG_9.5_201510051/16387/16653", offset 0, size 188416
received chunk for file "pg_tblspc/16386/PG_9.5_201510051/16387/16655", offset 0, size 8192
received chunk for file "pg_tblspc/16386/PG_9.5_201510051/16387/16661", offset 0, size 8192
received chunk for file "pg_tblspc/16386/PG_9.5_201510051/16387/16663", offset 0, size 8192
received chunk for file "pg_tblspc/16386/PG_9.5_201510051/16387/16665", offset 0, size 1000000
received chunk for file "pg_tblspc/16386/PG_9.5_201510051/16387/16665", offset 1000000, size 1000000
received chunk for file "pg_tblspc/16386/PG_9.5_201510051/16387/16665", offset 2000000, size 1000000
received chunk for file "pg_tblspc/16386/PG_9.5_201510051/16387/16665", offset 3000000, size 1000000
received chunk for file "pg_tblspc/16386/PG_9.5_201510051/16387/16665", offset 4000000, size 1000000
received chunk for file "pg_tblspc/16386/PG_9.5_201510051/16387/16665", offset 5000000, size 1000000
received chunk for file "pg_tblspc/16386/PG_9.5_201510051/16387/16665", offset 6000000, size 1000000
received chunk for file "pg_tblspc/16386/PG_9.5_201510051/16387/16665", offset 7000000, size 1000000
received chunk for file "pg_tblspc/16386/PG_9.5_201510051/16387/16665", offset 8000000



On Wed, Sep 13, 2017 at 5:57 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Sep 13, 2017 at 2:21 PM, Hung Phan <hungphan227@gmail.com> wrote:
> If I run pg_rewind with debug option, it just show additional bunch of files
> copied in directories like base or pg_tblspc. I claim that there is no data
> inserted of modified from the first step. The only difference between two
> server is caused by restarting old master.

That's exactly this kind of data that would allow to conclude where
the data comes from, and you are not showing it, so it is not possible
to make a clear conclusion. Are known the file paths, the offset
position in the file where a chunk of data is taken, and the chunk
size. If you see for example a bunch of 8kB blocks being copied, this
means that some extra load ran on the master before it was stopped,
and before running the rewind. Are you sure that you don't have large
log files for example?

I would be curious to know what is the version of Postgres 9.5 you are
using as well.
--
Michael

Re: [GENERAL] pg_rewind copy so much data

От
Michael Paquier
Дата:
On Fri, Sep 15, 2017 at 2:57 PM, Hung Phan <hungphan227@gmail.com> wrote:
> [...]

Please do not top-post. This breaks the logic of the thread.

> I use ver 9.5.3.

You should update to the latest minor version available, there have
been quite a couple of bug fixes in Postgres since this 9.5.3.

> I have just run again and get the debug log. It is very long so I attach in mail
In this case the LSN where the promoted standby and the rewound node
diverged is clear:
servers diverged at WAL position 2/D69820C8 on timeline 12
rewinding from last common checkpoint at 2/D6982058 on timeline 12
The last segment on timeline 13 is 0000000D00000002000000E0, which may
be a recycled segment, still that's up to 160MB worth of data...

And from what I can see a lot of the data comes from WAL segments from
past timelines, close to 1.3GB. The rest is more or less completely
coming from relation files from a different tablespace than the
default, tables with OID 16665 and 16683 covering the largest part of
it. What is strange to begin with is that there are many segments from
past timelines. Those should not stick around.

Could you check if the relfilenodes of 16665 and 16683 exist on source
server but do *not* exist on the target server? When issuing a rewind,
a relation file that exists on both has no action taken on (see
process_source_file in filemap.c), and only a set of block are
registered. Based on what comes from your log file, the file is being
copied from the source to the target, not its blocks:
pg_tblspc/16386/PG_9.5_201510051/16387/16665 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/16665.1 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/16665_fsm (COPY)
And this leads to an increase of the data included in what is rewound.
So aren't you for example re-creating a new database after the standy
is promoted or something like that?
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_rewind copy so much data

От
Hung Phan
Дата:
Sorry for the late reply. I have checked those relfilenodes and they have existed.

I used tablespace to store data and it seems to be that pg_rewind copied everthing in the tablespace. Today I found an article posted by you (Michael Paquier) and you said that there was no tablespace support. If so, is there anyway to work around ?

Regards,

Hung Phan



On Fri, Sep 15, 2017 at 1:55 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Sep 15, 2017 at 2:57 PM, Hung Phan <hungphan227@gmail.com> wrote:
> [...]

Please do not top-post. This breaks the logic of the thread.

> I use ver 9.5.3.

You should update to the latest minor version available, there have
been quite a couple of bug fixes in Postgres since this 9.5.3.

> I have just run again and get the debug log. It is very long so I attach in mail
In this case the LSN where the promoted standby and the rewound node
diverged is clear:
servers diverged at WAL position 2/D69820C8 on timeline 12
rewinding from last common checkpoint at 2/D6982058 on timeline 12
The last segment on timeline 13 is 0000000D00000002000000E0, which may
be a recycled segment, still that's up to 160MB worth of data...

And from what I can see a lot of the data comes from WAL segments from
past timelines, close to 1.3GB. The rest is more or less completely
coming from relation files from a different tablespace than the
default, tables with OID 16665 and 16683 covering the largest part of
it. What is strange to begin with is that there are many segments from
past timelines. Those should not stick around.

Could you check if the relfilenodes of 16665 and 16683 exist on source
server but do *not* exist on the target server? When issuing a rewind,
a relation file that exists on both has no action taken on (see
process_source_file in filemap.c), and only a set of block are
registered. Based on what comes from your log file, the file is being
copied from the source to the target, not its blocks:
pg_tblspc/16386/PG_9.5_201510051/16387/16665 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/16665.1 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/16665_fsm (COPY)
And this leads to an increase of the data included in what is rewound.
So aren't you for example re-creating a new database after the standy
is promoted or something like that?
--
Michael

Re: [GENERAL] pg_rewind copy so much data

От
Michael Paquier
Дата:
On Fri, Sep 29, 2017 at 1:06 PM, Hung Phan <hungphan227@gmail.com> wrote:
> I used tablespace to store data and it seems to be that pg_rewind copied
> everthing in the tablespace. Today I found an article posted by you (Michael
> Paquier) and you said that there was no tablespace support. If so, is there
> anyway to work around ?

I guess you mean that article:
http://paquier.xyz/postgresql-2/postgres-module-highlight-pg_rewind-to-recycle-a-postgres-master-into-a-slave/
This is from 2013 and this refers to the first iterations of the tool.
Tablespaces are now fully supported in the upstream version, as well
as in the version on github for 9.3 and 9.4. See this commit which
added support for tablespaces:
https://github.com/vmware/pg_rewind/commit/19fb09cdcac397048f7d723c037fe6a10299a278

FWIW, I use it daily, and pg_rewind is able to detect correctly page
deltas even on non-default tablespaces. Really.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_rewind copy so much data

От
Hung Phan
Дата:
Thanks for your help. Do you have any more ideas about my case? I cannot find any reason for that. If pg_rewind just examines WAL for changed blocks from the common checkpoint, why did it copy all data ? As I understand, the information in WAL about changes cannot cover all data in the database.

On Fri, Sep 29, 2017 at 12:06 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Sep 29, 2017 at 1:06 PM, Hung Phan <hungphan227@gmail.com> wrote:
> I used tablespace to store data and it seems to be that pg_rewind copied
> everthing in the tablespace. Today I found an article posted by you (Michael
> Paquier) and you said that there was no tablespace support. If so, is there
> anyway to work around ?

I guess you mean that article:
http://paquier.xyz/postgresql-2/postgres-module-highlight-pg_rewind-to-recycle-a-postgres-master-into-a-slave/
This is from 2013 and this refers to the first iterations of the tool.
Tablespaces are now fully supported in the upstream version, as well
as in the version on github for 9.3 and 9.4. See this commit which
added support for tablespaces:
https://github.com/vmware/pg_rewind/commit/19fb09cdcac397048f7d723c037fe6a10299a278

FWIW, I use it daily, and pg_rewind is able to detect correctly page
deltas even on non-default tablespaces. Really.
--
Michael

Re: [GENERAL] pg_rewind copy so much data

От
Michael Paquier
Дата:
On Fri, Sep 29, 2017 at 6:22 PM, Hung Phan <hungphan227@gmail.com> wrote:
> Thanks for your help. Do you have any more ideas about my case? I cannot
> find any reason for that. If pg_rewind just examines WAL for changed blocks
> from the common checkpoint, why did it copy all data ? As I understand, the
> information in WAL about changes cannot cover all data in the database.

The only thing I have in mind able to create this much amount of data
using this less WAL is a CREATE DATABASE using as template an existing
database. Based on the information you are giving here this is the
best guess I can do.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general