Re: [ADMIN] Too many WAL file created

Поиск
Список
Период
Сортировка
От Albe Laurenz
Тема Re: [ADMIN] Too many WAL file created
Дата
Msg-id A737B7A37273E048B164557ADEF4A58B53A086B1@ntex2010i.host.magwien.gv.at
обсуждение исходный текст
Ответ на [ADMIN] Too many WAL file created  (Gabriel Fernández Martínez<gfernandez@hotelbeds.com>)
Ответы Re: [ADMIN] Too many WAL file created
Список pgsql-admin
Gabriel Fernández Martínez wrote:
> I am working in a Project who use PostgreSQL, and our application crash time to time due a lack of
> space because the px_xlog has consume all the available space.
> 
> I have discarded a problem with the archiving process, and the main problem appear to be related with
> the amount of WAL files created per minute. I can show that we are creating more than two WAL files
> per minute like you can see:
> 
> [DEV] [root@ip-10-175-17-84 /data/node1/pg_xlog]# ls -lrt
> total 82208
> -rw-------. 1 postgres postgres 16777216 mar 10 14:15 000000010000001C000000D7
> -rw-------. 1 postgres postgres 16777216 mar 10 14:15 000000010000001C000000D8
> -rw-------. 1 postgres postgres 16777216 mar 10 14:15 000000010000001C000000D9
> -rw-------. 1 postgres postgres 16777216 mar 10 14:16 000000010000001C000000DA
> drwx------. 2 postgres postgres   286720 mar 10 14:16 archive_status
> -rw-------. 1 postgres postgres 16777216 mar 10 14:17 000000010000001C000000DB
> But I can identify the transaction responsible for this growing.
> 
> WAL parameters with non-default values in place in the database are:
> wal_level = 'archive'
> archive_mode = on
> archive_command = '/usr/bin/true'

You are right that you have discarded the archiving problem, because no archives
are written any more.

What is your backup strategy? If you are using anything else than "pg_dump",
this move has probably destroyed your backup process.

> In order to try to identify the transaction responsible for this behavior I have try to follow two
> approach:
> 
> 1º Increase database login to “all”, I collect 97.906 selects in two minutes but no
> insert/update/delete, log parameters in use has been:
>
> log_destination = 'syslog,csvlog'
> logging_collector = on                  # Enable capturing of stderr and csvlog
> log_directory = 'pg_log'                # directory where log files are written,
> log_filename = 'postgresql-%Y-%m-%d.log'
> log_line_prefix = '< %m >'                      # special values:
> log_timezone = 'Europe/Madrid'
> log_statement = 'ddl'

You should set the parameter "log_statement" to "all", then everything will be logged.

The way it is now, only statements that change tables and other objects will be logged.

>                 an example of the selct capture could be:
> 2017-03-10 13:59:57.319 CET,"app","app",11919,"10.175.17.84:33895",58c2a02f.2e8f,1,"SELECT",2017-03-10
> 13:46:39 CET,4/4352895,0,LOG,00000,"execute S_1: SELECT
>         id,
>         locked_ts
>         FROM table
>         WHERE
>         id = $1","parameters: $1 = 'TOPIC_CONSUMER'",,,,,,,,""

A SELECT statement like will not cause a change to the database, and WAL is only
written when something in the database has changed.
So SELECT statements like this one cannot be the cause of your problem.

I have two suggestions:

1) Change "log_statement" to "all" and activate the changes with "pg_ctl reload".
   That will log all statements.
   You should find data modifying statements in there.
   Sometimes a SELECT statement can also change data, particularly when a function
   is called that modifies the database.

2) There might be a long running statement that gets logged only once, but runs for
   hours or days on end.  To find such a statement, run a query like

      SELECT datname, usename, query_start, query
      FROM pg_stat_activity
      WHERE state <> 'idle';

  If you find an INSERT, UPDATE or DELETE statement that has been running for a long
  time, you have found your problem.

3) Set "log_autovacuum_min_duration" to 0 and see if anything gets logged.
   It may be that there is an autovacuum background process that is busy cleaning up
   old rows in a large table.
   If you are on PostgreSQL 9.6, you can use the "pg_stat_progress_vacuum" view to see
   currently running autovacuum processes, on older versions you have to look at the
   processes on the operating system level.

> 2º Use the pg_xlogdump, and I gather something like that:
> 
> rmgr: Heap        len (rec/tot):     14/   240, tx:   54704944, lsn: 1C/B9000038, prev 1C/B8FFFFD8,
> desc: UPDATE off 1 xmax 54704944 ; new off 13 xmax 0, blkref #0: rel 16431/598016/598033 blk 2709
> rmgr: Btree       len (rec/tot):      2/    72, tx:   54704944, lsn: 1C/B9000128, prev 1C/B9000038,
> desc: INSERT_LEAF off 26, blkref #0: rel 16431/598016/598040 blk 3560
> rmgr: Btree       len (rec/tot):      2/    72, tx:   54704944, lsn: 1C/B9000170, prev 1C/B9000128,
> desc: INSERT_LEAF off 103, blkref #0: rel 16431/598016/598042 blk 727
> rmgr: Btree       len (rec/tot):      2/    64, tx:   54704944, lsn: 1C/B90001B8, prev 1C/B9000170,
> desc: INSERT_LEAF off 268, blkref #0: rel 16431/598016/598043 blk 1441
> rmgr: Transaction len (rec/tot):      8/    34, tx:   54704944, lsn: 1C/B90001F8, prev 1C/B90001B8,
> desc: COMMIT 2017-03-10 13:58:57.494289 CET
> rmgr: Heap2       len (rec/tot):      8/    56, tx:          0, lsn: 1C/B9000220, prev 1C/B90001F8,
> desc: CLEAN remxid 54702701, blkref #0: rel 16431/598016/598033 blk 15966
> rmgr: Heap        len (rec/tot):     14/   241, tx:   54704945, lsn: 1C/B9000258, prev 1C/B9000220,
> desc: UPDATE off 33 xmax 54704945 ; new off 8 xmax 0, blkref #0: rel 16431/598016/598033 blk 15966
> rmgr: Btree       len (rec/tot):      2/  3589, tx:   54704945, lsn: 1C/B9000350, prev 1C/B9000258,
> desc: INSERT_LEAF off 3, blkref #0: rel 16431/598016/598040 blk 1567 FPW
> rmgr: Btree       len (rec/tot):      2/    72, tx:   54704945, lsn: 1C/B9001158, prev 1C/B9000350,
> desc: INSERT_LEAF off 103, blkref #0: rel 16431/598016/598042 blk 727
> rmgr: Btree       len (rec/tot):      2/    64, tx:   54704945, lsn: 1C/B90011A0, prev 1C/B9001158,
> desc: INSERT_LEAF off 269, blkref #0: rel 16431/598016/598043 blk 1441
> rmgr: Transaction len (rec/tot):      8/    34, tx:   54704945, lsn: 1C/B90011E0, prev 1C/B90011A0,
> desc: COMMIT 2017-03-10 13:58:57.546523 CET

That tells you that the table where the changes are taking place is base/598016/598033 in the
PostgreSQL data directory.  You can use the "oid2name" utility to find out which table that is.

Yours,
Laurenz Albe

В списке pgsql-admin по дате отправления:

Предыдущее
От: Poul Kristensen
Дата:
Сообщение: [ADMIN] The same role for multiple databases
Следующее
От: John Scalia
Дата:
Сообщение: Re: [ADMIN] The same role for multiple databases