Re: [ADMIN] Strange times in WAL files in archive directory (9.3)

Поиск
Список
Период
Сортировка
От Achilleas Mantzios
Тема Re: [ADMIN] Strange times in WAL files in archive directory (9.3)
Дата
Msg-id 58873609.4030004@matrix.gatewaynet.com
обсуждение исходный текст
Ответ на [ADMIN] Strange times in WAL files in archive directory (9.3)  (Achilleas Mantzios <achill@matrix.gatewaynet.com>)
Ответы Re: [ADMIN] Strange times in WAL files in archive directory (9.3)
Список pgsql-admin
On 24/01/2017 11:12, Achilleas Mantzios wrote:
> Hello,
> we are running PostgreSQL 9.3 on a new VM running Linux 3.16.0-4-amd64 #1 SMP, Debian 8.6, and we setup WAL archiving
yesterday.We were planing to keep a window of 15 days worth of WAL files, so we  
> thought of writing a script to delete files older than 15 days. The settings used were rather common :
> archive_mode = on
> archive_command = '/usr/bin/scp %p sma:/smadb/pgsql/pitr/%f'
> archive_timeout=240
> However, this morning, to my surprise I saw this by doing ls on the archive directory :
> # ls -l /smadb/pgsql/pitr/ | less
> total 12680296
> -rw------- 1 postgres postgres 16777216 Jan 24 03:44 000000020000070100000000
> -rw------- 1 postgres postgres 16777216 Jan 24 03:50 000000020000070100000001
> -rw------- 1 postgres postgres 16777216 Jan 24 03:54 000000020000070100000002
> -rw------- 1 postgres postgres 16777216 Jan 24 04:00 000000020000070100000003
> -rw------- 1 postgres postgres 16777216 Jan 24 04:00 000000020000070100000004
> -rw------- 1 postgres postgres 16777216 Jan 24 04:04 000000020000070100000005
> -rw------- 1 postgres postgres 16777216 Jan 24 04:09 000000020000070100000006
> -rw------- 1 postgres postgres 16777216 Jan 24 04:14 000000020000070100000007
> -rw------- 1 postgres postgres 16777216 Jan 24 04:20 000000020000070100000008
> -rw------- 1 postgres postgres 16777216 Jan 24 04:20 000000020000070100000009
> -rw------- 1 postgres postgres 16777216 Jan 24 04:25 00000002000007010000000A
> -rw------- 1 postgres postgres 16777216 Jan 24 04:29 00000002000007010000000B
> -rw------- 1 postgres postgres 16777216 Jan 24 04:34 00000002000007010000000C
> -rw------- 1 postgres postgres 16777216 Jan 24 04:39 00000002000007010000000D
> -rw------- 1 postgres postgres 16777216 Jan 24 04:39 00000002000007010000000E
> -rw------- 1 postgres postgres 16777216 Jan 24 04:44 00000002000007010000000F
> ...........
> -rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000EF
> -rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000F0
> -rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000F1  <------
> -rw------- 1 postgres postgres 16777216 Jan 23 13:04 0000000200000701000000F2 <------ time jumps
> -rw------- 1 postgres postgres 16777216 Jan 23 13:05 0000000200000701000000F3
> -rw------- 1 postgres postgres 16777216 Jan 23 13:06 0000000200000701000000F4
>
> So IMHO either the naming of the WAL filename got damaged somehow or the one of the servers got wrong time or
somethingI miss. "Jan 23 13:04" is indeed the time that we started the server for WAL  
> archiving.
> By sorting by time I get :
> ls -ltr /smadb/pgsql/pitr/ | less
> -rw------- 1 postgres postgres 16777216 Jan 23 13:04 0000000200000701000000F2
> -rw------- 1 postgres postgres 16777216 Jan 23 13:05 0000000200000701000000F3
> -rw------- 1 postgres postgres 16777216 Jan 23 13:06 0000000200000701000000F4
> -rw------- 1 postgres postgres 16777216 Jan 23 13:07 0000000200000701000000F5
> -rw------- 1 postgres postgres 16777216 Jan 23 13:08 0000000200000701000000F6
> -rw------- 1 postgres postgres 16777216 Jan 23 13:09 0000000200000701000000F7
> -rw------- 1 postgres postgres 16777216 Jan 23 13:09 0000000200000701000000F8
> -rw------- 1 postgres postgres 16777216 Jan 23 13:10 0000000200000701000000F9
> -rw------- 1 postgres postgres 16777216 Jan 23 13:11 0000000200000701000000FA
> -rw------- 1 postgres postgres 16777216 Jan 23 13:12 0000000200000701000000FB
> -rw------- 1 postgres postgres 16777216 Jan 23 13:13 0000000200000701000000FC
> ........
> -rw------- 1 postgres postgres 16777216 Jan 24 03:15 0000000200000702000000F9
> -rw------- 1 postgres postgres 16777216 Jan 24 03:19 0000000200000702000000FA
> -rw------- 1 postgres postgres 16777216 Jan 24 03:23 0000000200000702000000FB
> -rw------- 1 postgres postgres 16777216 Jan 24 03:27 0000000200000702000000FC
> -rw------- 1 postgres postgres 16777216 Jan 24 03:31 0000000200000702000000FD
> -rw------- 1 postgres postgres 16777216 Jan 24 03:35 0000000200000702000000FE
> -rw------- 1 postgres postgres 16777216 Jan 24 03:39 0000000200000702000000FF
> -rw------- 1 postgres postgres 16777216 Jan 24 03:43 000000020000070300000000
> -rw------- 1 postgres postgres 16777216 Jan 24 03:44 000000020000070100000000 <----- problem begins
> -rw------- 1 postgres postgres 16777216 Jan 24 03:47 000000020000070300000001
> -rw------- 1 postgres postgres 16777216 Jan 24 03:50 000000020000070100000001
> -rw------- 1 postgres postgres 16777216 Jan 24 03:51 000000020000070300000002
> -rw------- 1 postgres postgres 16777216 Jan 24 03:54 000000020000070100000002
> ........ (same pattern goes on 0000000200000703 mixing with 0000000200000701 )
> -rw------- 1 postgres postgres 16777216 Jan 24 09:42 0000000200000701000000EE
> -rw------- 1 postgres postgres 16777216 Jan 24 09:45 0000000200000703000000EF
> -rw------- 1 postgres postgres 16777216 Jan 24 09:47 0000000200000703000000F0
> -rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000703000000F1
> -rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000EF
> -rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000F0
> -rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000F1 <----- problem ends
> -rw------- 1 postgres postgres 16777216 Jan 24 09:50 0000000200000703000000F2
> -rw------- 1 postgres postgres 16777216 Jan 24 09:52 0000000200000703000000F3
> -rw------- 1 postgres postgres 16777216 Jan 24 09:56 0000000200000703000000F4
>
> Any thoughts about any explanation on this?
>
By scrolling up the terminal on the database server (cause the said file is gone now from pg_xlog) I saw this :
$ ls -ltr ~/data/pg_xlog/0000000200000701000000F0
-rw------- 1 postgres postgres 16777216 Jan 23 13:01 /var/lib/pgsql/data/pg_xlog/0000000200000701000000F0
$ md5sum ~/data/pg_xlog/0000000200000701000000F0
8ce1917c86d3c9e41455ec13659c2689 /var/lib/pgsql/data/pg_xlog/0000000200000701000000F0

In the archive's host I see this :
# ls -ltr /smadb/pgsql/pitr/0000000200000701000000F0
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 /smadb/pgsql/pitr/0000000200000701000000F0
# md5sum /smadb/pgsql/pitr/0000000200000701000000F0
8ce1917c86d3c9e41455ec13659c2689 /smadb/pgsql/pitr/0000000200000701000000F0

Same file, different mod times, so, this makes me think that maybe there was some flushing that the postgresql server
did,prior to deleting or recycling some old WAL files. Also I see this : 
# ls -ltr /smadb/pgsql/pitr/0000000200000701000000*
-rw------- 1 postgres postgres 16777216 Jan 23 13:04 /smadb/pgsql/pitr/0000000200000701000000F2
-rw------- 1 postgres postgres 16777216 Jan 23 13:05 /smadb/pgsql/pitr/0000000200000701000000F3
-rw------- 1 postgres postgres 16777216 Jan 23 13:06 /smadb/pgsql/pitr/0000000200000701000000F4
-rw------- 1 postgres postgres 16777216 Jan 23 13:07 /smadb/pgsql/pitr/0000000200000701000000F5
-rw------- 1 postgres postgres 16777216 Jan 23 13:08 /smadb/pgsql/pitr/0000000200000701000000F6
-rw------- 1 postgres postgres 16777216 Jan 23 13:09 /smadb/pgsql/pitr/0000000200000701000000F7
-rw------- 1 postgres postgres 16777216 Jan 23 13:09 /smadb/pgsql/pitr/0000000200000701000000F8
-rw------- 1 postgres postgres 16777216 Jan 23 13:10 /smadb/pgsql/pitr/0000000200000701000000F9
-rw------- 1 postgres postgres 16777216 Jan 23 13:11 /smadb/pgsql/pitr/0000000200000701000000FA
-rw------- 1 postgres postgres 16777216 Jan 23 13:12 /smadb/pgsql/pitr/0000000200000701000000FB
-rw------- 1 postgres postgres 16777216 Jan 23 13:13 /smadb/pgsql/pitr/0000000200000701000000FC
-rw------- 1 postgres postgres 16777216 Jan 23 13:14 /smadb/pgsql/pitr/0000000200000701000000FD
-rw------- 1 postgres postgres 16777216 Jan 23 13:15 /smadb/pgsql/pitr/0000000200000701000000FE
-rw------- 1 postgres postgres 16777216 Jan 23 13:16 /smadb/pgsql/pitr/0000000200000701000000FF
-rw------- 1 postgres postgres 16777216 Jan 24 03:44 /smadb/pgsql/pitr/000000020000070100000000 <---- time jump
-rw------- 1 postgres postgres 16777216 Jan 24 03:50 /smadb/pgsql/pitr/000000020000070100000001
-rw------- 1 postgres postgres 16777216 Jan 24 03:54 /smadb/pgsql/pitr/000000020000070100000002
-rw------- 1 postgres postgres 16777216 Jan 24 04:00 /smadb/pgsql/pitr/000000020000070100000003
-rw------- 1 postgres postgres 16777216 Jan 24 04:00 /smadb/pgsql/pitr/000000020000070100000004
......
-rw------- 1 postgres postgres 16777216 Jan 24 09:42 /smadb/pgsql/pitr/0000000200000701000000EE
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 /smadb/pgsql/pitr/0000000200000701000000EF
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 /smadb/pgsql/pitr/0000000200000701000000F0
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 /smadb/pgsql/pitr/0000000200000701000000F1

So this makes me believe that all the files before 0000000200000701000000F2 were present before WAL archiving was
setup.So I want to ask : 

1) Do you find this behavior normal?
2) From now on, can we assume a monotonic relation between file names and creation times in order to base our
maintenancescripts on this? 

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



В списке pgsql-admin по дате отправления:

Предыдущее
От: Achilleas Mantzios
Дата:
Сообщение: [ADMIN] Strange times in WAL files in archive directory (9.3)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [ADMIN] Strange times in WAL files in archive directory (9.3)