Обсуждение: Don't removes/recycles WAL files at all

Поиск
Список
Период
Сортировка

Don't removes/recycles WAL files at all

От
Michael Govorun
Дата:
Please, help.

versions: 7.3.3, 7.3.4

Server don't recycles or removes WAL files at all, generating 12-20 files every
day.

-rw-------    1 postgres postgres 16777216 Jul 29 01:54 0000000000000000
-rw-------    1 postgres postgres 16777216 Jul 29 01:55 0000000000000001
-rw-------    1 postgres postgres 16777216 Jul 29 02:15 0000000000000002
-rw-------    1 postgres postgres 16777216 Jul 29 06:28 0000000000000003
-rw-------    1 postgres postgres 16777216 Jul 29 09:26 0000000000000004
-rw-------    1 postgres postgres 16777216 Jul 29 11:35 0000000000000005
-rw-------    1 postgres postgres 16777216 Jul 29 13:25 0000000000000006
-rw-------    1 postgres postgres 16777216 Jul 29 15:05 0000000000000007
-rw-------    1 postgres postgres 16777216 Jul 29 16:17 0000000000000008
-rw-------    1 postgres postgres 16777216 Jul 29 17:42 0000000000000009
-rw-------    1 postgres postgres 16777216 Jul 29 19:18 000000000000000A
-rw-------    1 postgres postgres 16777216 Jul 29 21:19 000000000000000B
-rw-------    1 postgres postgres 16777216 Jul 29 23:11 000000000000000C
-rw-------    1 postgres postgres 16777216 Jul 30 01:10 000000000000000D
-rw-------    1 postgres postgres 16777216 Jul 30 03:51 000000000000000E
-rw-------    1 postgres postgres 16777216 Jul 30 04:00 000000000000000F
-rw-------    1 postgres postgres 16777216 Jul 30 04:00 0000000000000010
-rw-------    1 postgres postgres 16777216 Jul 30 06:10 0000000000000011
-rw-------    1 postgres postgres 16777216 Jul 30 08:24 0000000000000012
-rw-------    1 postgres postgres 16777216 Jul 30 09:45 0000000000000013
-rw-------    1 postgres postgres 16777216 Jul 30 11:08 0000000000000014
-rw-------    1 postgres postgres 16777216 Jul 30 12:10 0000000000000015
-rw-------    1 postgres postgres 16777216 Jul 30 13:21 0000000000000016
-rw-------    1 postgres postgres 16777216 Jul 30 14:17 0000000000000017
-rw-------    1 postgres postgres 16777216 Jul 30 15:24 0000000000000018
-rw-------    1 postgres postgres 16777216 Jul 30 16:28 0000000000000019
-rw-------    1 postgres postgres 16777216 Jul 30 17:34 000000000000001A
-rw-------    1 postgres postgres 16777216 Jul 30 18:45 000000000000001B
-rw-------    1 postgres postgres 16777216 Jul 30 20:00 000000000000001C
-rw-------    1 postgres postgres 16777216 Jul 30 21:10 000000000000001D
-rw-------    1 postgres postgres 16777216 Jul 30 21:48 000000000000001E



I have all settings as defaults in postgresql.conf:

#checkpoint_segments = 3        # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300       # range 30-3600, in seconds
#
#commit_delay = 0               # range 0-100000, in microseconds
#commit_siblings = 5            # range 1-1000
#
#fsync = true
#wal_sync_method = fsync        # the default varies across platforms:
#                               # fsync, fdatasync, open_sync, or open_datasync
#wal_debug = 0                  # range 0-16


--
Michael Govorun

Re: Don't removes/recycles WAL files at all

От
Richard Huxton
Дата:
On Wednesday 30 July 2003 18:54, Michael Govorun wrote:
> Please, help.
>
> versions: 7.3.3, 7.3.4
>
> Server don't recycles or removes WAL files at all, generating 12-20 files
> every day.
>
> -rw-------    1 postgres postgres 16777216 Jul 29 01:54 0000000000000000
> -rw-------    1 postgres postgres 16777216 Jul 29 01:55 0000000000000001
[...etc]

First thing I'd look for is if there is an old uncommitted transaction lurking
around somewhere. If the transaction doesn't commit, the WAL will need to
keep that data around.

--
  Richard Huxton
  Archonet Ltd

Re: Don't removes/recycles WAL files at all

От
Michael Govorun
Дата:
Richard Huxton <dev@archonet.com> writes:

> On Wednesday 30 July 2003 18:54, Michael Govorun wrote:
>> Please, help.
>>
>> versions: 7.3.3, 7.3.4
>>
>> Server don't recycles or removes WAL files at all, generating 12-20 files
>> every day.
>>
>> -rw-------    1 postgres postgres 16777216 Jul 29 01:54 0000000000000000
>> -rw-------    1 postgres postgres 16777216 Jul 29 01:55 0000000000000001
> [...etc]
>
> First thing I'd look for is if there is an old uncommitted transaction lurking
> around somewhere. If the transaction doesn't commit, the WAL will need to
> keep that data around.

I have several projects/databases running on this server. How can I
know about who (or at least what db) gives me uncommited transactions?

Also it seems like vulnerability. Every uncommited transaction takes
16Mb of disk space. It frustrates.

--
Michael Govorun            __o
                         _`\<,_
         .............. (_)/'(_) ...................

I do not get viruses because I do not use MS software. If you use
Outlook then please do not put my email address in your address-book
so that WHEN you get a virus it won't use my address in the From:
field.

Re: Don't removes/recycles WAL files at all

От
Richard Huxton
Дата:
On Wednesday 30 July 2003 20:05, Michael Govorun wrote:
> Richard Huxton <dev@archonet.com> writes:
> > On Wednesday 30 July 2003 18:54, Michael Govorun wrote:
> >> Please, help.
> >>
> >> versions: 7.3.3, 7.3.4
> >>
> >> Server don't recycles or removes WAL files at all, generating 12-20
> >> files every day.
> >>
> >> -rw-------    1 postgres postgres 16777216 Jul 29 01:54 0000000000000000
> >> -rw-------    1 postgres postgres 16777216 Jul 29 01:55 0000000000000001
> >
> > [...etc]
> >
> > First thing I'd look for is if there is an old uncommitted transaction
> > lurking around somewhere. If the transaction doesn't commit, the WAL will
> > need to keep that data around.
>
> I have several projects/databases running on this server. How can I
> know about who (or at least what db) gives me uncommited transactions?

Hmm - have a look at the "Monitoring Activity" section of the Administrators
manual - between "ps" and "pg_stat_xxx" tables hopefully you can spot it.

> Also it seems like vulnerability. Every uncommited transaction takes
> 16Mb of disk space. It frustrates.

I could be completely wrong mind you - I've CCd Tom Lane since he should be
able to say if I'm talking rubbish.

Tom - is this a TODO?

--
  Richard Huxton
  Archonet Ltd

Re: Don't removes/recycles WAL files at all

От
Tom Lane
Дата:
Richard Huxton <dev@archonet.com> writes:
>> Server don't recycles or removes WAL files at all, generating 12-20
>> files every day.

There's something broken there.  Check to see why checkpoint operations
are not completing (look in the postmaster log for relevant error
messages).  Open transactions should not affect this.

            regards, tom lane

Re: Don't removes/recycles WAL files at all

От
Michael Govorun
Дата:
Richard Huxton <dev@archonet.com> writes:

> Hmm - have a look at the "Monitoring Activity" section of the Administrators
> manual - between "ps" and "pg_stat_xxx" tables hopefully you can spot it.

Thanks for reply. I'll dig into it tomorrow.

--
Michael Govorun

Re: Don't removes/recycles WAL files at all

От
Michael Govorun
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

>>> Server don't recycles or removes WAL files at all, generating 12-20
>>> files every day.
>
> There's something broken there.  Check to see why checkpoint operations
> are not completing (look in the postmaster log for relevant error
> messages).  Open transactions should not affect this.

I get such periodical errors in my log when user connects to proftpd
server which uses posgresql for authentication:

003-07-31 00:21:10 [14889]  LOG:  query: select login from accounts where login = 'testuser'
2003-07-31 00:21:10 [14889]  LOG:  query: select homedir from accounts where login = 'testuser'
2003-07-31 00:21:10 [14889]  LOG:  query: select uid from accounts where login = 'testuser'
2003-07-31 00:21:10 [14889]  LOG:  query: select gid from accounts where login = 'testuser'
2003-07-31 00:21:10 [14889]  LOG:  query: select shell from accounts where login = 'testuser'
2003-07-31 00:21:12 [14889]  LOG:  query: select password from accounts where login = 'testuser'  limit 1
2003-07-31 00:21:12 [14889]  LOG:  query: select login from accounts where 1
2003-07-31 00:21:12 [14889]  ERROR:  Argument of WHERE must be type boolean, not type integer
2003-07-31 00:21:12 [14889]  LOG:  statement: select login from accounts where 1
2003-07-31 00:21:12 [14889]  LOG:  query: select login from accounts where 1
2003-07-31 00:21:12 [14889]  ERROR:  Argument of WHERE must be type boolean, not type integer
2003-07-31 00:21:12 [14889]  LOG:  statement: select login from accounts where 1
2003-07-31 00:21:12 [14889]  LOG:  query: update accounts set login_count = login_count + 1 where login = 'testuser'

and when user disconnects:

2003-07-31 00:21:14 [14889]  LOG:  SSL SYSCALL error: EOF detected
2003-07-31 00:21:14 [14889]  LOG:  pq_recvbuf: unexpected EOF on client connection

No other errors messages in my log.

I have old version of proftpd installed in my system. I'm going to
install new version today. But I can delay this if needed (if it's
postgres bug) and look into proftpd sources.

--
Michael Govorun

Re: Don't removes/recycles WAL files at all

От
Michael Govorun
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

>>> Server don't recycles or removes WAL files at all, generating 12-20
>>> files every day.
>
> There's something broken there.  Check to see why checkpoint operations
> are not completing (look in the postmaster log for relevant error
> messages).  Open transactions should not affect this.

I've found answer! Postgresql have been compiled with aggressive
optimizations:
gcc version egcs-2.91.66
CFLAGS='-O9 -funroll-loops -ffast-math -malign-double -mcpu=pentiumpro
-march=pentiumpro -fomit-frame-pointer -fno-exceptions'

After recompiling server with -O2 WAL recycling properly. But I got
many errors after installing recompiled server and had to dump
database with "optimized" postgres and restore with -O2 version.

--
Michael Govorun

Re: Don't removes/recycles WAL files at all

От
Tom Lane
Дата:
Michael Govorun <mike@shpion.ru> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> There's something broken there.  Check to see why checkpoint operations
>> are not completing (look in the postmaster log for relevant error
>> messages).  Open transactions should not affect this.

> No other errors messages in my log.

Maybe you have a checkpoint process that's just stuck, and is not
getting anything done yet is preventing the postmaster from launching
new checkpoints?  Use ps to look for postmaster child processes that
have been around a long time.

I seem to recall a similar report of a stuck checkpoint a few weeks ago
... check the archives.

            regards, tom lane

Re: Don't removes/recycles WAL files at all

От
Tom Lane
Дата:
Michael Govorun <mike@shpion.ru> writes:
> I've found answer! Postgresql have been compiled with aggressive
> optimizations:
> gcc version egcs-2.91.66
> CFLAGS='-O9 -funroll-loops -ffast-math -malign-double -mcpu=pentiumpro
> -march=pentiumpro -fomit-frame-pointer -fno-exceptions'

> After recompiling server with -O2 WAL recycling properly. But I got
> many errors after installing recompiled server and had to dump
> database with "optimized" postgres and restore with -O2 version.

Yeah, adding or removing -malign-double would change the layout of our
on-disk data structures, so it's no surprise you'd need to dump and
reload for that.

Also, -ffast-math is known to be Evil.  I thought we had a check to
prevent that (looks ...) yeah we do; how'd you get by this check in
timestamp.c?

/*
 * gcc's -ffast-math switch breaks routines that expect exact results from
 * expressions like timeval / 3600, where timeval is double.
 */
#ifdef __FAST_MATH__
#error -ffast-math is known to break this code
#endif

As for the other stuff, it theoretically should work, but isn't
egcs-2.91.66 rather old and buggy?  Perhaps a newer gcc would work
better.

            regards, tom lane

Re: Don't removes/recycles WAL files at all

От
Michael Govorun
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Yeah, adding or removing -malign-double would change the layout of our
> on-disk data structures, so it's no surprise you'd need to dump and
> reload for that.
>
> Also, -ffast-math is known to be Evil.  I thought we had a check to
> prevent that (looks ...) yeah we do; how'd you get by this check in
> timestamp.c?
>
> /*
>  * gcc's -ffast-math switch breaks routines that expect exact results from
>  * expressions like timeval / 3600, where timeval is double.
>  */
> #ifdef __FAST_MATH__
> #error -ffast-math is known to break this code
> #endif

Thanks for info, but I did not see this error.

> As for the other stuff, it theoretically should work, but isn't
>egcs-2.91.66 rather old and buggy?  Perhaps a newer gcc would work
>better.

Yes, it's old redhat 6.2 and whole system for a long time needs
upgrade.

--
Michael Govorun