Обсуждение: Publish autovacuum informations

Поиск

Список

Период

Сортировка

Publish autovacuum informations

От

Guillaume Lelarge

Дата:

29 декабря 2014 г., 18:55:42

Hey,

There are times where I would need more informations on the autovacuum processes.

I'd love to know what each worker is currently doing. I can get something like this from the pg_stat_activity view but it doesn't give me as much informations as the WorkerInfoData struct.

I'd also love to have more informations on the contents of the tables list (how many tables still to process, which table next, what kind of processing they'll get, etc... kinda what you have in the autovac_table struct).

All in all, I want to get informations that are typically stored in shared memory, handled by the autovacuum launcher and autovacuum workers. I first thought I could get that by writing some C functions embedded in an extension. But it doesn't seem to me I can access this part of the shared memory from a C function. If I'm wrong, I'd love to get a pointer on how to do this.

Otherwise, I wonder what would be more welcome: making the shared memory structs handles available outside of the autovacuum processes (and then build an extension to decode the informations I need), or adding functions in core to get access to this information (in that case, no need for an extension)?

Thanks.

Regards.

--

Guillaume.

http://blog.guillaume.lelarge.info

http://www.dalibo.com

Re: Publish autovacuum informations

От

Tom Lane

Дата:

29 декабря 2014 г., 19:03:49

Guillaume Lelarge <guillaume@lelarge.info> writes:
> All in all, I want to get informations that are typically stored in shared
> memory, handled by the autovacuum launcher and autovacuum workers. I first
> thought I could get that by writing some C functions embedded in an
> extension. But it doesn't seem to me I can access this part of the shared
> memory from a C function. If I'm wrong, I'd love to get a pointer on how to
> do this.

> Otherwise, I wonder what would be more welcome: making the shared memory
> structs handles available outside of the autovacuum processes (and then
> build an extension to decode the informations I need), or adding functions
> in core to get access to this information (in that case, no need for an
> extension)?

Either one of those approaches would cripple our freedom to change those
data structures; which we've done repeatedly in the past and will surely
want to do again.  So I'm pretty much -1 on exposing them.
        regards, tom lane

Re: Publish autovacuum informations

От

Guillaume Lelarge

Дата:

29 декабря 2014 г., 19:07:41

2014-12-29 17:03 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:

Guillaume Lelarge <guillaume@lelarge.info> writes:
> All in all, I want to get informations that are typically stored in shared
> memory, handled by the autovacuum launcher and autovacuum workers. I first
> thought I could get that by writing some C functions embedded in an
> extension. But it doesn't seem to me I can access this part of the shared
> memory from a C function. If I'm wrong, I'd love to get a pointer on how to
> do this.

> Otherwise, I wonder what would be more welcome: making the shared memory
> structs handles available outside of the autovacuum processes (and then
> build an extension to decode the informations I need), or adding functions
> in core to get access to this information (in that case, no need for an
> extension)?

Either one of those approaches would cripple our freedom to change those
data structures; which we've done repeatedly in the past and will surely
want to do again. So I'm pretty much -1 on exposing them.

I don't see how that's going to deny us the right to change any structs. If they are in-core functions, we'll just have to update them. If they are extension functions, then the developer of those functions would simply need to update his code.

--

Guillaume.

http://blog.guillaume.lelarge.info

http://www.dalibo.com

Re: Publish autovacuum informations

От

Robert Haas

Дата:

31 декабря 2014 г., 19:20:00

On Mon, Dec 29, 2014 at 11:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Either one of those approaches would cripple our freedom to change those
> data structures; which we've done repeatedly in the past and will surely
> want to do again.  So I'm pretty much -1 on exposing them.

We could instead add a view of this information to core --
pg_stat_autovacuum, or whatever.

But to be honest, I'm more in favor of Guillaume's proposal.  I will
repeat my recent assertion that we -- you in particular -- are too
reluctant to expose internal data structures to authors of C
extensions, and that this is developer-hostile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Publish autovacuum informations

От

Tom Lane

Дата:

31 декабря 2014 г., 20:46:23

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Dec 29, 2014 at 11:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Either one of those approaches would cripple our freedom to change those
>> data structures; which we've done repeatedly in the past and will surely
>> want to do again.  So I'm pretty much -1 on exposing them.

> We could instead add a view of this information to core --
> pg_stat_autovacuum, or whatever.

> But to be honest, I'm more in favor of Guillaume's proposal.  I will
> repeat my recent assertion that we -- you in particular -- are too
> reluctant to expose internal data structures to authors of C
> extensions, and that this is developer-hostile.

Well, the core question there is whether we have a policy of not breaking
extension-visible APIs.  While we will very often do things like adding
parameters to existing functions, I think we've tended to refrain from
making wholesale semantic revisions to exposed data structures.

I'd be all right with putting the data structure declarations in a file
named something like autovacuum_private.h, especially if it carried an
annotation that "if you depend on this, don't be surprised if we break
your code in future".
        regards, tom lane

Re: Publish autovacuum informations

От

Noah Misch

Дата:

02 января 2015 г., 01:17:47

On Wed, Dec 31, 2014 at 12:46:17PM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Dec 29, 2014 at 11:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Either one of those approaches would cripple our freedom to change those
> >> data structures; which we've done repeatedly in the past and will surely
> >> want to do again.  So I'm pretty much -1 on exposing them.
> 
> > We could instead add a view of this information to core --
> > pg_stat_autovacuum, or whatever.
> 
> > But to be honest, I'm more in favor of Guillaume's proposal.  I will
> > repeat my recent assertion that we -- you in particular -- are too
> > reluctant to expose internal data structures to authors of C
> > extensions, and that this is developer-hostile.
> 
> Well, the core question there is whether we have a policy of not breaking
> extension-visible APIs.

No, we have no policy restricting backend C API changes in major releases.
Though this message is old enough to enroll in first grade, I know of no
policy decision supplanting it:
http://www.postgresql.org/message-id/8706.1230569070@sss.pgh.pa.us

> While we will very often do things like adding
> parameters to existing functions, I think we've tended to refrain from
> making wholesale semantic revisions to exposed data structures.

True.  I especially look to avoid changes that will cause extensions to build
and run, yet silently misbehave at runtime.  For example, had I reviewed the
pg_policy patch, I would have examined whether an unmodified 9.4 extension
might let a user bypass relation policy.  I oppose most header reorganization,
which breaks builds in exchange for insubstantial benefits.  I don't wish to
extend that anywhere near to the point of saying, "Your C function can't use
struct foo, because exposing struct foo in a header file would imply freezing
it."  Desire for backend API stability should not drive us to reject new
functionality.

> I'd be all right with putting the data structure declarations in a file
> named something like autovacuum_private.h, especially if it carried an
> annotation that "if you depend on this, don't be surprised if we break
> your code in future".

Such an annotation would be no more true than it is for the majority of header
files.  If including it makes you feel better, I don't object.

nm

Re: Publish autovacuum informations

От

Jim Nasby

Дата:

02 января 2015 г., 20:22:15

On 1/1/15, 4:17 PM, Noah Misch wrote:
>> I'd be all right with putting the data structure declarations in a file
>> >named something like autovacuum_private.h, especially if it carried an
>> >annotation that "if you depend on this, don't be surprised if we break
>> >your code in future".
> Such an annotation would be no more true than it is for the majority of header
> files.  If including it makes you feel better, I don't object.

We need to be careful with that. Starting to segregate things into _private headers implies that stuff in non-private
headers*is* locked down. We'd need to set clear expectations.

I do think more clarity would be good here. Right now the only distinction we have is things like SPI are spelled out
inthe docs. Other than that, the there really isn't anything to indicate how safe it is to rely on what's in the
headers.For example, I've got some code that's looking at fcinfo->flinfo->fn_expr, and I have no idea how likely that
isto get broken. Since it's a parse node, my guess is "likely", but I'm just guessing.

-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Publish autovacuum informations

От

Robert Haas

Дата:

05 января 2015 г., 19:40:26

On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd be all right with putting the data structure declarations in a file
> named something like autovacuum_private.h, especially if it carried an
> annotation that "if you depend on this, don't be surprised if we break
> your code in future".

Works for me.  I am not in general surprised when we do things that
break my code, or anyway, the code that I'm responsible for
maintaining.  But I think it makes sense to segregate this into a
separate header file so that we are clear that it is only exposed for
the benefit of extension authors, not so that other things in the core
system can touch it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Publish autovacuum informations

От

Guillaume Lelarge

Дата:

05 января 2015 г., 19:44:18

2015-01-05 17:40 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:

On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd be all right with putting the data structure declarations in a file
> named something like autovacuum_private.h, especially if it carried an
> annotation that "if you depend on this, don't be surprised if we break
> your code in future".

Works for me. I am not in general surprised when we do things that
break my code, or anyway, the code that I'm responsible for
maintaining. But I think it makes sense to segregate this into a
separate header file so that we are clear that it is only exposed for
the benefit of extension authors, not so that other things in the core
system can touch it.

I'm fine with that too. I'll try to find some time to work on that.

Thanks.

--

Guillaume.

http://blog.guillaume.lelarge.info

http://www.dalibo.com

Re: Publish autovacuum informations

От

Guillaume Lelarge

Дата:

04 июня 2015 г., 23:10:12

2015-01-05 17:44 GMT+01:00 Guillaume Lelarge <guillaume@lelarge.info>:

2015-01-05 17:40 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd be all right with putting the data structure declarations in a file
> named something like autovacuum_private.h, especially if it carried an
> annotation that "if you depend on this, don't be surprised if we break
> your code in future".

Works for me. I am not in general surprised when we do things that
break my code, or anyway, the code that I'm responsible for
maintaining. But I think it makes sense to segregate this into a
separate header file so that we are clear that it is only exposed for
the benefit of extension authors, not so that other things in the core
system can touch it.

I'm fine with that too. I'll try to find some time to work on that.

So I took a look at this this week. I discovered, with the help of a coworker, that I can already use the AutoVacuumShmem pointer and read the struct. Unfortunately, it doesn't give me as much details as I would have liked. The list of databases and tables aren't in shared memory. They are local to the process that uses them. Putting them in shared memory (if at all possible) would imply a much bigger patch than I was willing to write right now.

Thanks anyway for the help.

--

Guillaume.

http://blog.guillaume.lelarge.info

http://www.dalibo.com

Re: Publish autovacuum informations

От

Julien Rouhaud

Дата:

29 февраля 2016 г., 21:05:07

On 04/06/2015 22:10, Guillaume Lelarge wrote:
> 2015-01-05 17:44 GMT+01:00 Guillaume Lelarge <guillaume@lelarge.info
> <mailto:guillaume@lelarge.info>>:
>
>     2015-01-05 17:40 GMT+01:00 Robert Haas <robertmhaas@gmail.com
>     <mailto:robertmhaas@gmail.com>>:
>
>         On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane <tgl@sss.pgh.pa.us
>         <mailto:tgl@sss.pgh.pa.us>> wrote:
>         > I'd be all right with putting the data structure declarations in a file
>         > named something like autovacuum_private.h, especially if it carried an
>         > annotation that "if you depend on this, don't be surprised if we break
>         > your code in future".
>
>         Works for me.  I am not in general surprised when we do things that
>         break my code, or anyway, the code that I'm responsible for
>         maintaining.  But I think it makes sense to segregate this into a
>         separate header file so that we are clear that it is only
>         exposed for
>         the benefit of extension authors, not so that other things in
>         the core
>         system can touch it.
>
>
>     I'm fine with that too. I'll try to find some time to work on that.
>
>
> So I took a look at this this week. I discovered, with the help of a
> coworker, that I can already use the AutoVacuumShmem pointer and read
> the struct. Unfortunately, it doesn't give me as much details as I would
> have liked. The list of databases and tables aren't in shared memory.
> They are local to the process that uses them. Putting them in shared
> memory (if at all possible) would imply a much bigger patch than I was
> willing to write right now.
>
> Thanks anyway for the help.
>
>

Sorry to revive such an old thread.

I think some hooks in the autovacuum could be enough to have good
insight without exposing private structure.

Please find attached a patch that adds some hooks to the autovacuum, and
as an example a quick proof of concept extension that use them and allow
to see what are the autovacuum worker todo list, skipped tables and so on.

I'm not really sure about which information should be provided, so I'm
open to any suggestion to improve this.
--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

Re: Publish autovacuum informations

От

Fabrízio de Royes Mello

Дата:

29 февраля 2016 г., 22:20:35

On Mon, Feb 29, 2016 at 3:04 PM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
>
> On 04/06/2015 22:10, Guillaume Lelarge wrote:
> > 2015-01-05 17:44 GMT+01:00 Guillaume Lelarge <guillaume@lelarge.info
> > <mailto:guillaume@lelarge.info>>:
> >
> > 2015-01-05 17:40 GMT+01:00 Robert Haas <robertmhaas@gmail.com
> > <mailto:robertmhaas@gmail.com>>:
> >
> > On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane <tgl@sss.pgh.pa.us
> > <mailto:tgl@sss.pgh.pa.us>> wrote:
> > > I'd be all right with putting the data structure declarations in a file
> > > named something like autovacuum_private.h, especially if it carried an
> > > annotation that "if you depend on this, don't be surprised if we break
> > > your code in future".
> >
> > Works for me. I am not in general surprised when we do things that
> > break my code, or anyway, the code that I'm responsible for
> > maintaining. But I think it makes sense to segregate this into a
> > separate header file so that we are clear that it is only
> > exposed for
> > the benefit of extension authors, not so that other things in
> > the core
> > system can touch it.
> >
> >
> > I'm fine with that too. I'll try to find some time to work on that.
> >
> >
> > So I took a look at this this week. I discovered, with the help of a
> > coworker, that I can already use the AutoVacuumShmem pointer and read
> > the struct. Unfortunately, it doesn't give me as much details as I would
> > have liked. The list of databases and tables aren't in shared memory.
> > They are local to the process that uses them. Putting them in shared
> > memory (if at all possible) would imply a much bigger patch than I was
> > willing to write right now.
> >
> > Thanks anyway for the help.
> >
> >
>
> Sorry to revive such an old thread.
>
> I think some hooks in the autovacuum could be enough to have good
> insight without exposing private structure.
>

Interesting idea...

> Please find attached a patch that adds some hooks to the autovacuum, and
> as an example a quick proof of concept extension that use them and allow
> to see what are the autovacuum worker todo list, skipped tables and so on.
>
> I'm not really sure about which information should be provided, so I'm
> open to any suggestion to improve this.
>

I have a look at the patch and it's compile without warning and without regression.

But something goes wrong when installing the extension:

fabrizio@bagual:~/Downloads/pg_stat_autovacuum
$ pg_config
BINDIR = /data/home/fabrizio/pgsql/bin
DOCDIR = /data/home/fabrizio/pgsql/share/doc
HTMLDIR = /data/home/fabrizio/pgsql/share/doc
INCLUDEDIR = /data/home/fabrizio/pgsql/include
PKGINCLUDEDIR = /data/home/fabrizio/pgsql/include
INCLUDEDIR-SERVER = /data/home/fabrizio/pgsql/include/server
LIBDIR = /data/home/fabrizio/pgsql/lib
PKGLIBDIR = /data/home/fabrizio/pgsql/lib
LOCALEDIR = /data/home/fabrizio/pgsql/share/locale
MANDIR = /data/home/fabrizio/pgsql/share/man
SHAREDIR = /data/home/fabrizio/pgsql/share
SYSCONFDIR = /data/home/fabrizio/pgsql/etc
PGXS = /data/home/fabrizio/pgsql/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/home/fabrizio/pgsql' '--enable-cassert' '--enable-coverage' '--enable-tap-tests' '--enable-depend'
CC = gcc
CPPFLAGS = -DFRONTEND -D_GNU_SOURCE
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -fprofile-arcs -ftest-coverage
CFLAGS_SL = -fpic
LDFLAGS = -L../../src/common -Wl,--as-needed -Wl,-rpath,'/home/fabrizio/pgsql/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lz -lreadline -lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 9.6devel

fabrizio@bagual:~/Downloads/pg_stat_autovacuum
$ make USE_PGXS=1 install
/bin/mkdir -p '/data/home/fabrizio/pgsql/lib'
/bin/mkdir -p '/data/home/fabrizio/pgsql/share/extension'
/bin/mkdir -p '/data/home/fabrizio/pgsql/share/extension'
/bin/mkdir -p '/data/home/fabrizio/pgsql/share/doc/extension'
/usr/bin/install -c -m 755 pg_stat_autovacuum.so '/data/home/fabrizio/pgsql/lib/pg_stat_autovacuum.so'
/usr/bin/install -c -m 644 .//pg_stat_autovacuum.control '/data/home/fabrizio/pgsql/share/extension/'
/usr/bin/install -c -m 644 .//pg_stat_autovacuum--0.0.1.sql '/data/home/fabrizio/pgsql/share/extension/'
/usr/bin/install -c -m 644 '/data/home/fabrizio/pgsql/share/doc/extension/'
/usr/bin/install: missing destination file operand after ‘/data/home/fabrizio/pgsql/share/doc/extension/’
Try '/usr/bin/install --help' for more information.
make: *** [install] Error 1

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: Publish autovacuum informations

От

Julien Rouhaud

Дата:

29 февраля 2016 г., 22:39:09

On 29/02/2016 20:20, Fabrízio de Royes Mello wrote:
>
> On Mon, Feb 29, 2016 at 3:04 PM, Julien Rouhaud
> <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>> wrote:
>>
>> On 04/06/2015 22:10, Guillaume Lelarge wrote:
>> > 2015-01-05 17:44 GMT+01:00 Guillaume Lelarge <guillaume@lelarge.info
> <mailto:guillaume@lelarge.info>
>> > <mailto:guillaume@lelarge.info <mailto:guillaume@lelarge.info>>>:
>> >
>> >     2015-01-05 17:40 GMT+01:00 Robert Haas <robertmhaas@gmail.com
> <mailto:robertmhaas@gmail.com>
>> >     <mailto:robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>>>:
>> >
>> >         On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane
> <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>
>> >         <mailto:tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>>> wrote:
>> >         > I'd be all right with putting the data structure
> declarations in a file
>> >         > named something like autovacuum_private.h, especially if
> it carried an
>> >         > annotation that "if you depend on this, don't be surprised
> if we break
>> >         > your code in future".
>> >
>> >         Works for me.  I am not in general surprised when we do
> things that
>> >         break my code, or anyway, the code that I'm responsible for
>> >         maintaining.  But I think it makes sense to segregate this
> into a
>> >         separate header file so that we are clear that it is only
>> >         exposed for
>> >         the benefit of extension authors, not so that other things in
>> >         the core
>> >         system can touch it.
>> >
>> >
>> >     I'm fine with that too. I'll try to find some time to work on that.
>> >
>> >
>> > So I took a look at this this week. I discovered, with the help of a
>> > coworker, that I can already use the AutoVacuumShmem pointer and read
>> > the struct. Unfortunately, it doesn't give me as much details as I would
>> > have liked. The list of databases and tables aren't in shared memory.
>> > They are local to the process that uses them. Putting them in shared
>> > memory (if at all possible) would imply a much bigger patch than I was
>> > willing to write right now.
>> >
>> > Thanks anyway for the help.
>> >
>> >
>>
>> Sorry to revive such an old thread.
>>
>> I think some hooks in the autovacuum could be enough to have good
>> insight without exposing private structure.
>>
>
> Interesting idea...
>

Thanks for looking at it!

>
>> Please find attached a patch that adds some hooks to the autovacuum, and
>> as an example a quick proof of concept extension that use them and allow
>> to see what are the autovacuum worker todo list, skipped tables and so on.
>>
>> I'm not really sure about which information should be provided, so I'm
>> open to any suggestion to improve this.
>>
>
> I have a look at the patch and it's compile without warning and without
> regression.
>
> But something goes wrong when installing the extension:
>
> [...]
> /usr/bin/install -c -m 644  '/data/home/fabrizio/pgsql/share/doc/extension/'
> /usr/bin/install: missing destination file operand after
> ‘/data/home/fabrizio/pgsql/share/doc/extension/’
> Try '/usr/bin/install --help' for more information.
> make: *** [install] Error 1
>

Oups, I'm not really sure what I removed and shouldn't have. I attached
v2 of the extension with a "standard" Makefile, which I just tested and
works fine.

> Regards,
>
> --
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
>>> Timbira: http://www.timbira.com.br
>>> Blog: http://fabriziomello.github.io
>>> Linkedin: http://br.linkedin.com/in/fabriziomello
>>> Twitter: http://twitter.com/fabriziomello
>>> Github: http://github.com/fabriziomello


--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

pg_stat_autovacuum_v2.tgz

Re: Publish autovacuum informations

От

Michael Paquier

Дата:

01 марта 2016 г., 09:50:42

On Tue, Mar 1, 2016 at 4:38 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 29/02/2016 20:20, Fabrízio de Royes Mello wrote:
>>
>> On Mon, Feb 29, 2016 at 3:04 PM, Julien Rouhaud
>> <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>> wrote:
>>>
>>> On 04/06/2015 22:10, Guillaume Lelarge wrote:
>>> > 2015-01-05 17:44 GMT+01:00 Guillaume Lelarge <guillaume@lelarge.info
>> <mailto:guillaume@lelarge.info>
>>> > <mailto:guillaume@lelarge.info <mailto:guillaume@lelarge.info>>>:
>>> >
>>> >     2015-01-05 17:40 GMT+01:00 Robert Haas <robertmhaas@gmail.com
>> <mailto:robertmhaas@gmail.com>
>>> >     <mailto:robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>>>:
>>> >
>>> >         On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane
>> <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>
>>> >         <mailto:tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>>> wrote:
>>> >         > I'd be all right with putting the data structure
>> declarations in a file
>>> >         > named something like autovacuum_private.h, especially if
>> it carried an
>>> >         > annotation that "if you depend on this, don't be surprised
>> if we break
>>> >         > your code in future".
>>> >
>>> >         Works for me.  I am not in general surprised when we do
>> things that
>>> >         break my code, or anyway, the code that I'm responsible for
>>> >         maintaining.  But I think it makes sense to segregate this
>> into a
>>> >         separate header file so that we are clear that it is only
>>> >         exposed for
>>> >         the benefit of extension authors, not so that other things in
>>> >         the core
>>> >         system can touch it.
>>> >
>>> >
>>> >     I'm fine with that too. I'll try to find some time to work on that.
>>> >
>>> >
>>> > So I took a look at this this week. I discovered, with the help of a
>>> > coworker, that I can already use the AutoVacuumShmem pointer and read
>>> > the struct. Unfortunately, it doesn't give me as much details as I would
>>> > have liked. The list of databases and tables aren't in shared memory.
>>> > They are local to the process that uses them. Putting them in shared
>>> > memory (if at all possible) would imply a much bigger patch than I was
>>> > willing to write right now.
>>> >
>>> > Thanks anyway for the help.
>>> >
>>> >
>>>
>>> Sorry to revive such an old thread.
>>>
>>> I think some hooks in the autovacuum could be enough to have good
>>> insight without exposing private structure.

Instead of introducing 4 new hooks, which do not represent a general
use actually, why don't you expose a portion of this information in
shared memory as mentioned upthread? This sounds like a good approach
to me. Your extension could then scan them as needed and put that on
view or a function. This information is now private in the autovacuum
processes, exposing them would allow plugin authors to do a bunch of
fancy things I think, in a more flexible way than those hooks. And
there is no need to add more hooks should the structure of the
autovacuum code change for a reason or another in the future.
--
Michael

Re: Publish autovacuum informations

От

Julien Rouhaud

Дата:

01 марта 2016 г., 14:44:44

On 01/03/2016 07:50, Michael Paquier wrote:
> On Tue, Mar 1, 2016 at 4:38 AM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> On 29/02/2016 20:20, Fabrízio de Royes Mello wrote:
>>>
>>> On Mon, Feb 29, 2016 at 3:04 PM, Julien Rouhaud
>>> <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>> wrote:
>>>>
>>>> On 04/06/2015 22:10, Guillaume Lelarge wrote:
>>>>> 2015-01-05 17:44 GMT+01:00 Guillaume Lelarge <guillaume@lelarge.info
>>> <mailto:guillaume@lelarge.info>
>>>>> <mailto:guillaume@lelarge.info <mailto:guillaume@lelarge.info>>>:
>>>>>
>>>>>     2015-01-05 17:40 GMT+01:00 Robert Haas <robertmhaas@gmail.com
>>> <mailto:robertmhaas@gmail.com>
>>>>>     <mailto:robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>>>:
>>>>>
>>>>>         On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane
>>> <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>
>>>>>         <mailto:tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>>> wrote:
>>>>>         > I'd be all right with putting the data structure
>>> declarations in a file
>>>>>         > named something like autovacuum_private.h, especially if
>>> it carried an
>>>>>         > annotation that "if you depend on this, don't be surprised
>>> if we break
>>>>>         > your code in future".
>>>>>
>>>>>         Works for me.  I am not in general surprised when we do
>>> things that
>>>>>         break my code, or anyway, the code that I'm responsible for
>>>>>         maintaining.  But I think it makes sense to segregate this
>>> into a
>>>>>         separate header file so that we are clear that it is only
>>>>>         exposed for
>>>>>         the benefit of extension authors, not so that other things in
>>>>>         the core
>>>>>         system can touch it.
>>>>>
>>>>>
>>>>>     I'm fine with that too. I'll try to find some time to work on that.
>>>>>
>>>>>
>>>>> So I took a look at this this week. I discovered, with the help of a
>>>>> coworker, that I can already use the AutoVacuumShmem pointer and read
>>>>> the struct. Unfortunately, it doesn't give me as much details as I would
>>>>> have liked. The list of databases and tables aren't in shared memory.
>>>>> They are local to the process that uses them. Putting them in shared
>>>>> memory (if at all possible) would imply a much bigger patch than I was
>>>>> willing to write right now.
>>>>>
>>>>> Thanks anyway for the help.
>>>>>
>>>>>
>>>>
>>>> Sorry to revive such an old thread.
>>>>
>>>> I think some hooks in the autovacuum could be enough to have good
>>>> insight without exposing private structure.
> 
> Instead of introducing 4 new hooks, which do not represent a general
> use actually, why don't you expose a portion of this information in
> shared memory as mentioned upthread? This sounds like a good approach
> to me. Your extension could then scan them as needed and put that on
> view or a function. This information is now private in the autovacuum
> processes, exposing them would allow plugin authors to do a bunch of
> fancy things I think, in a more flexible way than those hooks. And
> there is no need to add more hooks should the structure of the
> autovacuum code change for a reason or another in the future.
> 

I thought exposing private structures could be a blocking issue.  I
tried to see what could be done using hooks, and one thing I like is
that we can compute the process time of each relation, or even aggregate
some statistics.  Having the vacuum time is something that we can
actually only obtain by setting log_autovacuum_min_duration and parsing
the logs, and I don't think it would be possible to do this by just
exposing current private structure.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: Publish autovacuum informations

От

Fabrízio de Royes Mello

Дата:

01 марта 2016 г., 15:47:45

On Tue, Mar 1, 2016 at 8:44 AM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
>
> On 01/03/2016 07:50, Michael Paquier wrote:
> > On Tue, Mar 1, 2016 at 4:38 AM, Julien Rouhaud
> > <julien.rouhaud@dalibo.com> wrote:
> >> On 29/02/2016 20:20, Fabrízio de Royes Mello wrote:
> >>>
> >>> On Mon, Feb 29, 2016 at 3:04 PM, Julien Rouhaud
> >>> <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>> wrote:
> >>>>
> >>>> On 04/06/2015 22:10, Guillaume Lelarge wrote:
> >>>>> 2015-01-05 17:44 GMT+01:00 Guillaume Lelarge <guillaume@lelarge.info
> >>> <mailto:guillaume@lelarge.info>
> >>>>> <mailto:guillaume@lelarge.info <mailto:guillaume@lelarge.info>>>:
> >>>>>
> >>>>> 2015-01-05 17:40 GMT+01:00 Robert Haas <robertmhaas@gmail.com
> >>> <mailto:robertmhaas@gmail.com>
> >>>>> <mailto:robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>>>:
> >>>>>
> >>>>> On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane
> >>> <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>
> >>>>> <mailto:tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>>> wrote:
> >>>>> > I'd be all right with putting the data structure
> >>> declarations in a file
> >>>>> > named something like autovacuum_private.h, especially if
> >>> it carried an
> >>>>> > annotation that "if you depend on this, don't be surprised
> >>> if we break
> >>>>> > your code in future".
> >>>>>
> >>>>> Works for me. I am not in general surprised when we do
> >>> things that
> >>>>> break my code, or anyway, the code that I'm responsible for
> >>>>> maintaining. But I think it makes sense to segregate this
> >>> into a
> >>>>> separate header file so that we are clear that it is only
> >>>>> exposed for
> >>>>> the benefit of extension authors, not so that other things in
> >>>>> the core
> >>>>> system can touch it.
> >>>>>
> >>>>>
> >>>>> I'm fine with that too. I'll try to find some time to work on that.
> >>>>>
> >>>>>
> >>>>> So I took a look at this this week. I discovered, with the help of a
> >>>>> coworker, that I can already use the AutoVacuumShmem pointer and read
> >>>>> the struct. Unfortunately, it doesn't give me as much details as I would
> >>>>> have liked. The list of databases and tables aren't in shared memory.
> >>>>> They are local to the process that uses them. Putting them in shared
> >>>>> memory (if at all possible) would imply a much bigger patch than I was
> >>>>> willing to write right now.
> >>>>>
> >>>>> Thanks anyway for the help.
> >>>>>
> >>>>>
> >>>>
> >>>> Sorry to revive such an old thread.
> >>>>
> >>>> I think some hooks in the autovacuum could be enough to have good
> >>>> insight without exposing private structure.
> >
> > Instead of introducing 4 new hooks, which do not represent a general
> > use actually, why don't you expose a portion of this information in
> > shared memory as mentioned upthread? This sounds like a good approach
> > to me. Your extension could then scan them as needed and put that on
> > view or a function. This information is now private in the autovacuum
> > processes, exposing them would allow plugin authors to do a bunch of
> > fancy things I think, in a more flexible way than those hooks. And
> > there is no need to add more hooks should the structure of the
> > autovacuum code change for a reason or another in the future.
> >
>
> I thought exposing private structures could be a blocking issue. I
> tried to see what could be done using hooks, and one thing I like is
> that we can compute the process time of each relation, or even aggregate
> some statistics. Having the vacuum time is something that we can
> actually only obtain by setting log_autovacuum_min_duration and parsing
> the logs, and I don't think it would be possible to do this by just
> exposing current private structure.
>

We understood (IMHO is an interesting idea) but as Michael said hooks is for a general purpose. So can you demonstrate other use cases for this new hooks?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: Publish autovacuum informations

От

Julien Rouhaud

Дата:

01 марта 2016 г., 17:37:37

On 01/03/2016 13:47, Fabrízio de Royes Mello wrote:
> On Tue, Mar 1, 2016 at 8:44 AM, Julien Rouhaud
> <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>> wrote:
>>
>> On 01/03/2016 07:50, Michael Paquier wrote:
>> > On Tue, Mar 1, 2016 at 4:38 AM, Julien Rouhaud
>> > <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>> wrote:
>> >> On 29/02/2016 20:20, Fabrízio de Royes Mello wrote:
>> >>>
>> >>> On Mon, Feb 29, 2016 at 3:04 PM, Julien Rouhaud
>> >>> <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>
> <mailto:julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>>>
> wrote:
>> >>>>
>> >>>> I think some hooks in the autovacuum could be enough to have good
>> >>>> insight without exposing private structure.
>> >
>> > Instead of introducing 4 new hooks, which do not represent a general
>> > use actually, why don't you expose a portion of this information in
>> > shared memory as mentioned upthread? This sounds like a good approach
>> > to me. Your extension could then scan them as needed and put that on
>> > view or a function. This information is now private in the autovacuum
>> > processes, exposing them would allow plugin authors to do a bunch of
>> > fancy things I think, in a more flexible way than those hooks. And
>> > there is no need to add more hooks should the structure of the
>> > autovacuum code change for a reason or another in the future.
>> >
>>
>> I thought exposing private structures could be a blocking issue.  I
>> tried to see what could be done using hooks, and one thing I like is
>> that we can compute the process time of each relation, or even aggregate
>> some statistics.  Having the vacuum time is something that we can
>> actually only obtain by setting log_autovacuum_min_duration and parsing
>> the logs, and I don't think it would be possible to do this by just
>> exposing current private structure.
>>
> 
> We understood (IMHO is an interesting idea) but as Michael said hooks is
> for a general purpose. So can you demonstrate other use cases for this
> new hooks?
> 

I can think of several usage.  First, since the hook will always be
called, an extension will see all the activity a worker is doing when
exposing private structure will always be some kind of sampling.  Then,
you can have other information that wouldn't be available just by
exposing private structure.  For instance knowing a VACUUM isn't
performed by the worker (either because another worker is already
working on it or because it isn't needed anymore). IIRC there was a
discussion about concurrency issue in this case. We can also know if the
maintenance was cancelled due to lock not obtained fast enough.
Finally, as long as the hooks aren't use, they don't have any overhead.I agree that all this is for monitoring
purpose.

I'm not sure what are the fancy things that Michael had in mind with
exposing the private structure.  Michael, was it something like having
the ability to change some of these data through an extension?

> Regards,
> 
> --
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
>>> Timbira: http://www.timbira.com.br
>>> Blog: http://fabriziomello.github.io
>>> Linkedin: http://br.linkedin.com/in/fabriziomello
>>> Twitter: http://twitter.com/fabriziomello
>>> Github: http://github.com/fabriziomello


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: Publish autovacuum informations

От

Jim Nasby

Дата:

01 марта 2016 г., 22:29:42

On 3/1/16 8:37 AM, Julien Rouhaud wrote:
>> >
>> >We understood (IMHO is an interesting idea) but as Michael said hooks is
>> >for a general purpose. So can you demonstrate other use cases for this
>> >new hooks?
>> >
> I can think of several usage.  First, since the hook will always be
> called, an extension will see all the activity a worker is doing when
> exposing private structure will always be some kind of sampling.  Then,

I think that's pretty key. If you wanted to create an extension that 
logs vacuums (which would be great, since current state of the art is 
logs + pgBadger), you'd want to gather your data about what the vacuum 
did as the vacuum was ending.

I can certainly see cases where you don't care about that and just want 
what's in shared memory, but that would only be useful for monitoring 
what's happening real-time, not for knowing what final results are.

BTW, I think as much of this as possible should also work for regular 
vacuums.

> you can have other information that wouldn't be available just by
> exposing private structure.  For instance knowing a VACUUM isn't
> performed by the worker (either because another worker is already
> working on it or because it isn't needed anymore). IIRC there was a
> discussion about concurrency issue in this case. We can also know if the
> maintenance was cancelled due to lock not obtained fast enough.
> Finally, as long as the hooks aren't use, they don't have any overhead.
>   I agree that all this is for monitoring purpose.
>
> I'm not sure what are the fancy things that Michael had in mind with
> exposing the private structure.  Michael, was it something like having
> the ability to change some of these data through an extension?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Publish autovacuum informations

От

Julien Rouhaud

Дата:

02 марта 2016 г., 00:03:21

On 01/03/2016 20:29, Jim Nasby wrote:
> On 3/1/16 8:37 AM, Julien Rouhaud wrote:
>>> >
>>> >We understood (IMHO is an interesting idea) but as Michael said
>>> hooks is
>>> >for a general purpose. So can you demonstrate other use cases for this
>>> >new hooks?
>>> >
>> I can think of several usage.  First, since the hook will always be
>> called, an extension will see all the activity a worker is doing when
>> exposing private structure will always be some kind of sampling.  Then,
> 
> I think that's pretty key. If you wanted to create an extension that
> logs vacuums (which would be great, since current state of the art is
> logs + pgBadger), you'd want to gather your data about what the vacuum
> did as the vacuum was ending.
> 

Indeed these information are missing. I guess that'd be possible by
adding (or moving) a hook in lazy_vacuum_rel() that provide access to
part or all of the LVRelStats and rusage informations.

> I can certainly see cases where you don't care about that and just want
> what's in shared memory, but that would only be useful for monitoring
> what's happening real-time, not for knowing what final results are.
> 
> BTW, I think as much of this as possible should also work for regular
> vacuums.
> 

You mean for database wide vacuum?

>> you can have other information that wouldn't be available just by
>> exposing private structure.  For instance knowing a VACUUM isn't
>> performed by the worker (either because another worker is already
>> working on it or because it isn't needed anymore). IIRC there was a
>> discussion about concurrency issue in this case. We can also know if the
>> maintenance was cancelled due to lock not obtained fast enough.
>> Finally, as long as the hooks aren't use, they don't have any overhead.
>>   I agree that all this is for monitoring purpose.
>>
>> I'm not sure what are the fancy things that Michael had in mind with
>> exposing the private structure.  Michael, was it something like having
>> the ability to change some of these data through an extension?


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: Publish autovacuum informations

От

Jim Nasby

Дата:

02 марта 2016 г., 00:30:51

On 3/1/16 3:02 PM, Julien Rouhaud wrote:
> You mean for database wide vacuum?

I mean manual vacuum. Some hooks and stats would apply only to autovac 
obviously (and it'd be nice to get visibility into the scheduling 
decisions both daemons are making). But as much as possible things 
should be done in vacuum.c/lazyvacuum.c so it works for manual vacuums 
as well.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Publish autovacuum informations

От

Michael Paquier

Дата:

02 марта 2016 г., 09:30:57

On Tue, Mar 1, 2016 at 11:37 PM, Julien Rouhaud wrote:
> I'm not sure what are the fancy things that Michael had in mind with
> exposing the private structure.  Michael, was it something like having
> the ability to change some of these data through an extension?

I was referring to you here :)
I have witnessed already many fancy things coming out of brain, and I
have no doubt you could make something out of just a sampling of data.
Jokes apart, what I mean with fancy things here is putting the
sampling data in a shape that a user suits to him. It would be easy to
exploit that for example in a background worker that scans
periodically the shared memory, or have an extension that represents
this shared memory data into something that can be queried at SQL
level. Now, the main use case that I have with the data available in
shared memory is more or less:
- represent the current shared memory as SQL
- Scan this data, reshape it and report it elsewhere, say a background
worker printing out a file.

Now, take your set of hooks... There are 4 hooks here:
- autovacuum_list_tables_hook, to do something with the list of tables
a worker has collected, at the moment they have been collected.
- autovacuum_end_table_hook, to do something when knowing that a table
is skipped or cancelled
- autovacuum_begin_table_hook, to trigger something when a relation is
beginning to be processed.
- autovacuum_database_finished_hook, to trigger something once a
database is done with its processing.

The only use cases that I have in mind here for those hooks, which
would help in decision-making to tune autovacuum-related parameters
would be the following:
- Log entries regarding those operations, then why not introducing a
GUC parameter that is an on/off switch, like log_autovacuum (this is
not a per-relation parameter), defaulting to off. Those could be used
by pgbadger.
- Have system statistics with a new system relation like
pg_stat_autovacuum, and make this information available to user.
Are there other things that you think could make use those hooks? Your
extension just does pg_stat_autovacuum and emulates the existing
pg_stat_* facility when gathering information about the global
autovacuum statistics. So it seems to me that those hooks are not that
necessary, and that this may help in tuning a system, particularly the
number of relations skipped would be interesting to have.

The stats could have a delay, the point being to have hints on how
autovacuum workers are sorting things out. In short, I am doubtful
about the need of hooks in those code paths, the thing that we should
try to do instead is to improve native solutions to give user more
information regarding how autovacuum works, which help in tuning it.
-- 
Michael

Re: Publish autovacuum informations

От

Julien Rouhaud

Дата:

02 марта 2016 г., 19:48:52

On 02/03/2016 07:30, Michael Paquier wrote:
> On Tue, Mar 1, 2016 at 11:37 PM, Julien Rouhaud wrote:
>> I'm not sure what are the fancy things that Michael had in mind with
>> exposing the private structure.  Michael, was it something like having
>> the ability to change some of these data through an extension?
> 
> I was referring to you here :)
> I have witnessed already many fancy things coming out of brain, and I
> have no doubt you could make something out of just a sampling of data.

:D

> Jokes apart, what I mean with fancy things here is putting the
> sampling data in a shape that a user suits to him. It would be easy to
> exploit that for example in a background worker that scans
> periodically the shared memory, or have an extension that represents
> this shared memory data into something that can be queried at SQL
> level. Now, the main use case that I have with the data available in
> shared memory is more or less:
> - represent the current shared memory as SQL
> - Scan this data, reshape it and report it elsewhere, say a background
> worker printing out a file.
> 
> Now, take your set of hooks... There are 4 hooks here:
> - autovacuum_list_tables_hook, to do something with the list of tables
> a worker has collected, at the moment they have been collected.
> - autovacuum_end_table_hook, to do something when knowing that a table
> is skipped or cancelled
> - autovacuum_begin_table_hook, to trigger something when a relation is
> beginning to be processed.
> - autovacuum_database_finished_hook, to trigger something once a
> database is done with its processing.
> 
> The only use cases that I have in mind here for those hooks, which
> would help in decision-making to tune autovacuum-related parameters
> would be the following:
> - Log entries regarding those operations, then why not introducing a
> GUC parameter that is an on/off switch, like log_autovacuum (this is
> not a per-relation parameter), defaulting to off. Those could be used
> by pgbadger.

This would be nice, but the point if this proposal is to be able to have
this available at SQL level. (but big +1 on the feature)

> - Have system statistics with a new system relation like
> pg_stat_autovacuum, and make this information available to user.
> Are there other things that you think could make use those hooks? Your
> extension just does pg_stat_autovacuum and emulates the existing
> pg_stat_* facility when gathering information about the global
> autovacuum statistics. So it seems to me that those hooks are not that
> necessary, and that this may help in tuning a system, particularly the
> number of relations skipped would be interesting to have.
> 
> The stats could have a delay, the point being to have hints on how
> autovacuum workers are sorting things out. In short, I am doubtful
> about the need of hooks in those code paths, the thing that we should
> try to do instead is to improve native solutions to give user more
> information regarding how autovacuum works, which help in tuning it.
> 

Good point, I don't see a lot of information available with this hooks
that a native system statistics couldn't offer. To have the same amount
of information, I think we'd need a pg_stat_autovacuum view that shows a
realtime insight of the workers, and also add some aggregated counters
to PgStat_StatTabEntry. I wonder if adding counters to
PgStat_StatTabEntry would be accepted though.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: Publish autovacuum informations

От

Jim Nasby

Дата:

03 марта 2016 г., 02:48:16

On 3/2/16 10:48 AM, Julien Rouhaud wrote:
> Good point, I don't see a lot of information available with this hooks
> that a native system statistics couldn't offer. To have the same amount
> of information, I think we'd need a pg_stat_autovacuum view that shows a
> realtime insight of the workers, and also add some aggregated counters
> to PgStat_StatTabEntry. I wonder if adding counters to
> PgStat_StatTabEntry would be accepted though.

I would also really like to see a means of logging (auto)vacuum activity 
in the database itself. We figured out how to do that with 
pg_stat_statements, which was a lot harder... it seems kinda silly not 
to offer that for vacuum. Hooks plus shared memory data should allow for 
that (the only tricky bit is the hook would need to start and then 
commit a transaction, but that doesn't seem onerous).

I think the shared memory structures should be done as well. Having that 
real-time info is also valuable.

I don't see too much point in adding stuff to the stats system for this.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Publish autovacuum informations

От

Kyotaro HORIGUCHI

Дата:

03 марта 2016 г., 12:55:12

Hello,

At Wed, 2 Mar 2016 17:48:06 -0600, Jim Nasby <Jim.Nasby@BlueTreble.com> wrote in <56D77BB6.6080606@BlueTreble.com>
> On 3/2/16 10:48 AM, Julien Rouhaud wrote:
> > Good point, I don't see a lot of information available with this hooks
> > that a native system statistics couldn't offer. To have the same
> > amount
> > of information, I think we'd need a pg_stat_autovacuum view that shows
> > a
> > realtime insight of the workers, and also add some aggregated counters
> > to PgStat_StatTabEntry. I wonder if adding counters to
> > PgStat_StatTabEntry would be accepted though.
> 
> I would also really like to see a means of logging (auto)vacuum
> activity in the database itself. We figured out how to do that with
> pg_stat_statements, which was a lot harder... it seems kinda silly not
> to offer that for vacuum. Hooks plus shared memory data should allow
> for that (the only tricky bit is the hook would need to start and then
> commit a transaction, but that doesn't seem onerous).
> 
> I think the shared memory structures should be done as well. Having
> that real-time info is also valuable.
> 
> I don't see too much point in adding stuff to the stats system for
> this.

I wonder why there haven't been discussions so far on what kind
of information we want by this feature. For example I'd be happy
to see the time of last autovacuum trial and the cause if it has
been skipped for every table. Such information would (maybe)
naturally be shown in pg_stat_*_tables.

=====
=# select relid, last_completed_autovacuum, last_completed_autovacv_status, last_autovacuum_trial,
last_autovacuum_resultfrom pg_stat_user_tables;
 
-[ RECORD 1 ]-----------------+------
relid                         | 16390
last_completed_autovacuum     | 2016-03-01 01:25:00.349074+09
last_completed_autovac_status | Completed in 4 seconds. Scanned 434 pages, skipped 23 pages
last_autovacuum_trial         | 2016-03-03 17:33:04.004322+09
last_autovac_traial_status    | Canceled by PID 2355. Processed 144/553 pages.
-[ RECORD 2 ]----------+------
...
last_autovacuum_trial         | 2016-03-03 07:25:00.349074+09
last_autovac_traial_status    | Completed in 4 seconds. Scanned 434 pages, skipped 23 pages
-[ RECORD 3 ]----------+------
...
last_autovacuum_trial         | 2016-03-03 17:59:12.324454+09
last_autovac_trial_status     | Processing by PID 42334, 564 / 32526 pages done.
-[ RECORD 4 ]----------+------
...
last_autovacuum_trial         | 2016-03-03 17:59:12.324454+09
last_autovac_trial_status     | Skipped by dead-tuple threashold.
=====

Apart from the appropriateness of the concrete shape, it would be
done by extending the current stats system and needs modification
of some other parts but the hooks and WorkerInfoData is not
enough. This might be a business of Rahila's "VACUUM Progress
Checker" and it convers some real-time info.

https://commitfest.postgresql.org/9/545/

On the other hand, it would be in another place and needs another
method if we want a history like the current autovacuum
completion logs (at debug3..) of 100 latest invocation of
autovacuum worker. Anyway the WorkerInfoData is not enough.


What kind of information we (will) want to have?


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Publish autovacuum informations

От

Julien Rouhaud

Дата:

05 марта 2016 г., 00:52:08

On 03/03/2016 10:54, Kyotaro HORIGUCHI wrote:
> Hello,
> 
> At Wed, 2 Mar 2016 17:48:06 -0600, Jim Nasby <Jim.Nasby@BlueTreble.com> wrote in <56D77BB6.6080606@BlueTreble.com>
>> On 3/2/16 10:48 AM, Julien Rouhaud wrote:
>>> Good point, I don't see a lot of information available with this hooks
>>> that a native system statistics couldn't offer. To have the same
>>> amount
>>> of information, I think we'd need a pg_stat_autovacuum view that shows
>>> a
>>> realtime insight of the workers, and also add some aggregated counters
>>> to PgStat_StatTabEntry. I wonder if adding counters to
>>> PgStat_StatTabEntry would be accepted though.
>>
>> I would also really like to see a means of logging (auto)vacuum
>> activity in the database itself. We figured out how to do that with
>> pg_stat_statements, which was a lot harder... it seems kinda silly not
>> to offer that for vacuum. Hooks plus shared memory data should allow
>> for that (the only tricky bit is the hook would need to start and then
>> commit a transaction, but that doesn't seem onerous).
>>
>> I think the shared memory structures should be done as well. Having
>> that real-time info is also valuable.
>>
>> I don't see too much point in adding stuff to the stats system for
>> this.
> 
> I wonder why there haven't been discussions so far on what kind
> of information we want by this feature. For example I'd be happy
> to see the time of last autovacuum trial and the cause if it has
> been skipped for every table. Such information would (maybe)
> naturally be shown in pg_stat_*_tables.
> 
> =====
> =# select relid, last_completed_autovacuum, last_completed_autovacv_status, last_autovacuum_trial,
last_autovacuum_resultfrom pg_stat_user_tables;
 
> -[ RECORD 1 ]-----------------+------
> relid                         | 16390
> last_completed_autovacuum     | 2016-03-01 01:25:00.349074+09
> last_completed_autovac_status | Completed in 4 seconds. Scanned 434 pages, skipped 23 pages
> last_autovacuum_trial         | 2016-03-03 17:33:04.004322+09
> last_autovac_traial_status    | Canceled by PID 2355. Processed 144/553 pages.
> -[ RECORD 2 ]----------+------
> ...
> last_autovacuum_trial         | 2016-03-03 07:25:00.349074+09
> last_autovac_traial_status    | Completed in 4 seconds. Scanned 434 pages, skipped 23 pages
> -[ RECORD 3 ]----------+------
> ...
> last_autovacuum_trial         | 2016-03-03 17:59:12.324454+09
> last_autovac_trial_status     | Processing by PID 42334, 564 / 32526 pages done.
> -[ RECORD 4 ]----------+------
> ...
> last_autovacuum_trial         | 2016-03-03 17:59:12.324454+09
> last_autovac_trial_status     | Skipped by dead-tuple threashold.
> =====
> 
> Apart from the appropriateness of the concrete shape, it would be
> done by extending the current stats system and needs modification
> of some other parts but the hooks and WorkerInfoData is not
> enough. This might be a business of Rahila's "VACUUM Progress
> Checker" and it convers some real-time info.
> 
> https://commitfest.postgresql.org/9/545/
> 
> On the other hand, it would be in another place and needs another
> method if we want a history like the current autovacuum
> completion logs (at debug3..) of 100 latest invocation of
> autovacuum worker. Anyway the WorkerInfoData is not enough.
> 
> 
> What kind of information we (will) want to have?
> 

Very good suggestion.

I think the most productive way to work on this is to start a wiki page
to summarize what's the available information, what we should store and
how to represent it.

I'll update this thread as soon as I'll have a first draft finished.

> 
> regards,
> 


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: Publish autovacuum informations

От

Michael Paquier

Дата:

05 марта 2016 г., 01:34:28

On Sat, Mar 5, 2016 at 6:52 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> Very good suggestion.
>
> I think the most productive way to work on this is to start a wiki page
> to summarize what's the available information, what we should store and
> how to represent it.
>
> I'll update this thread as soon as I'll have a first draft finished.

New design discussions are a little bit late for 9.6 I am afraid :(
Perhaps we should consider this patch as returned with feedback for
the time being? The hook approach is not something I'd wish for if we
can improve in-core facility that would help user to decide better how
to tune autovacuum parameters. The VACUUM progress facility covers a
different need by helping to track how long a scan is still going to
take. What we want here is something that would run on top of that.
Logs at least may be helpful for things like pgbadger.
-- 
Michael

Re: Publish autovacuum informations

От

Julien Rouhaud

Дата:

05 марта 2016 г., 03:21:18

On 04/03/2016 23:34, Michael Paquier wrote:
> On Sat, Mar 5, 2016 at 6:52 AM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> Very good suggestion.
>>
>> I think the most productive way to work on this is to start a wiki page
>> to summarize what's the available information, what we should store and
>> how to represent it.
>>
>> I'll update this thread as soon as I'll have a first draft finished.
> 
> New design discussions are a little bit late for 9.6 I am afraid :(
> Perhaps we should consider this patch as returned with feedback for
> the time being? The hook approach is not something I'd wish for if we
> can improve in-core facility that would help user to decide better how
> to tune autovacuum parameters.

Yes, it's clearly not suited for the final commitfest. I just closed the
patch as "returned with feedback".

I'll work on the feedbacks I already had to document a wiki page, and
wait for this commitfest to be more or less finished before starting a
new thread on autovacuum instrumentation design.


> The VACUUM progress facility covers a
> different need by helping to track how long a scan is still going to
> take. What we want here is something that would run on top of that.
> Logs at least may be helpful for things like pgbadger.
> 



-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: Publish autovacuum informations

От

Michael Paquier

Дата:

05 марта 2016 г., 03:32:14

On Sat, Mar 5, 2016 at 9:21 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 04/03/2016 23:34, Michael Paquier wrote:
>> New design discussions are a little bit late for 9.6 I am afraid :(
>> Perhaps we should consider this patch as returned with feedback for
>> the time being? The hook approach is not something I'd wish for if we
>> can improve in-core facility that would help user to decide better how
>> to tune autovacuum parameters.
>
> Yes, it's clearly not suited for the final commitfest. I just closed the
> patch as "returned with feedback".
>
> I'll work on the feedbacks I already had to document a wiki page, and
> wait for this commitfest to be more or less finished before starting a
> new thread on autovacuum instrumentation design.

OK, thanks.
-- 
Michael

Re: Publish autovacuum informations

От

Jim Nasby

Дата:

19 марта 2016 г., 03:12:17

On 3/3/16 3:54 AM, Kyotaro HORIGUCHI wrote:
> I wonder why there haven't been discussions so far on what kind
> of information we want by this feature. For example I'd be happy
> to see the time of last autovacuum trial and the cause if it has
> been skipped for every table. Such information would (maybe)
> naturally be shown in pg_stat_*_tables.
>
> =====
> =# select relid, last_completed_autovacuum, last_completed_autovacv_status, last_autovacuum_trial,
last_autovacuum_resultfrom pg_stat_user_tables;
 
> -[ RECORD 1 ]-----------------+------
> relid                         | 16390
> last_completed_autovacuum     | 2016-03-01 01:25:00.349074+09
> last_completed_autovac_status | Completed in 4 seconds. Scanned 434 pages, skipped 23 pages
> last_autovacuum_trial         | 2016-03-03 17:33:04.004322+09
> last_autovac_traial_status    | Canceled by PID 2355. Processed 144/553 pages.
> -[ RECORD 2 ]----------+------
> ...
> last_autovacuum_trial         | 2016-03-03 07:25:00.349074+09
> last_autovac_traial_status    | Completed in 4 seconds. Scanned 434 pages, skipped 23 pages
> -[ RECORD 3 ]----------+------
> ...
> last_autovacuum_trial         | 2016-03-03 17:59:12.324454+09
> last_autovac_trial_status     | Processing by PID 42334, 564 / 32526 pages done.
> -[ RECORD 4 ]----------+------
> ...
> last_autovacuum_trial         | 2016-03-03 17:59:12.324454+09
> last_autovac_trial_status     | Skipped by dead-tuple threashold.
> =====

I kinda like where you're going here, but I certainly don't think the 
stats system is the way to do it. Stats bloat is already a problem on 
bigger systems. More important, I don't think having just the last 
result is very useful. If you've got a vacuum problem, you want to see 
history, especially history of the vacuum runs themselves.

The good news is that vacuum is a very low-frequency operation, so it 
has none of the concerns that the generic stats system does. I think it 
would be reasonable to provide event triggers that fire on every 
launcher loop, after a worker has built it's "TODO list", and after 
every (auto)vacuum.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Publish autovacuum informations

От

Julien Rouhaud

Дата:

31 марта 2016 г., 00:09:41

On 19/03/2016 01:11, Jim Nasby wrote:
> On 3/3/16 3:54 AM, Kyotaro HORIGUCHI wrote:
>> I wonder why there haven't been discussions so far on what kind
>> of information we want by this feature. For example I'd be happy
>> to see the time of last autovacuum trial and the cause if it has
>> been skipped for every table. Such information would (maybe)
>> naturally be shown in pg_stat_*_tables.
>>
>> =====
>> =# select relid, last_completed_autovacuum,
>> last_completed_autovacv_status, last_autovacuum_trial,
>> last_autovacuum_result from pg_stat_user_tables;
>> -[ RECORD 1 ]-----------------+------
>> relid                         | 16390
>> last_completed_autovacuum     | 2016-03-01 01:25:00.349074+09
>> last_completed_autovac_status | Completed in 4 seconds. Scanned 434
>> pages, skipped 23 pages
>> last_autovacuum_trial         | 2016-03-03 17:33:04.004322+09
>> last_autovac_traial_status    | Canceled by PID 2355. Processed
>> 144/553 pages.
>> -[ RECORD 2 ]----------+------
>> ...
>> last_autovacuum_trial         | 2016-03-03 07:25:00.349074+09
>> last_autovac_traial_status    | Completed in 4 seconds. Scanned 434
>> pages, skipped 23 pages
>> -[ RECORD 3 ]----------+------
>> ...
>> last_autovacuum_trial         | 2016-03-03 17:59:12.324454+09
>> last_autovac_trial_status     | Processing by PID 42334, 564 / 32526
>> pages done.
>> -[ RECORD 4 ]----------+------
>> ...
>> last_autovacuum_trial         | 2016-03-03 17:59:12.324454+09
>> last_autovac_trial_status     | Skipped by dead-tuple threashold.
>> =====
> 
> I kinda like where you're going here, but I certainly don't think the
> stats system is the way to do it. Stats bloat is already a problem on
> bigger systems. More important, I don't think having just the last
> result is very useful. If you've got a vacuum problem, you want to see
> history, especially history of the vacuum runs themselves.
> 
> The good news is that vacuum is a very low-frequency operation, so it
> has none of the concerns that the generic stats system does. I think it
> would be reasonable to provide event triggers that fire on every
> launcher loop, after a worker has built it's "TODO list", and after
> every (auto)vacuum.

The main issue I see with an event trigger based solution is that you'll
always have to create them and the needed objects on every database.

Another issue is that both of these approach are not intended to give a
global overview but per-database statistics. I'd prefer a global overview.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: Publish autovacuum informations

От

Michael Paquier

Дата:

31 марта 2016 г., 03:42:20

On Thu, Mar 31, 2016 at 6:09 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 19/03/2016 01:11, Jim Nasby wrote:
>> On 3/3/16 3:54 AM, Kyotaro HORIGUCHI wrote:
>>> I wonder why there haven't been discussions so far on what kind
>>> of information we want by this feature. For example I'd be happy
>>> to see the time of last autovacuum trial and the cause if it has
>>> been skipped for every table. Such information would (maybe)
>>> naturally be shown in pg_stat_*_tables.
>>>
>>> =====
>>> =# select relid, last_completed_autovacuum,
>>> last_completed_autovacv_status, last_autovacuum_trial,
>>> last_autovacuum_result from pg_stat_user_tables;
>>> -[ RECORD 1 ]-----------------+------
>>> relid                         | 16390
>>> last_completed_autovacuum     | 2016-03-01 01:25:00.349074+09
>>> last_completed_autovac_status | Completed in 4 seconds. Scanned 434
>>> pages, skipped 23 pages
>>> last_autovacuum_trial         | 2016-03-03 17:33:04.004322+09
>>> last_autovac_traial_status    | Canceled by PID 2355. Processed
>>> 144/553 pages.
>>> -[ RECORD 2 ]----------+------
>>> ...
>>> last_autovacuum_trial         | 2016-03-03 07:25:00.349074+09
>>> last_autovac_traial_status    | Completed in 4 seconds. Scanned 434
>>> pages, skipped 23 pages
>>> -[ RECORD 3 ]----------+------
>>> ...
>>> last_autovacuum_trial         | 2016-03-03 17:59:12.324454+09
>>> last_autovac_trial_status     | Processing by PID 42334, 564 / 32526
>>> pages done.
>>> -[ RECORD 4 ]----------+------
>>> ...
>>> last_autovacuum_trial         | 2016-03-03 17:59:12.324454+09
>>> last_autovac_trial_status     | Skipped by dead-tuple threashold.
>>> =====
>>
>> I kinda like where you're going here, but I certainly don't think the
>> stats system is the way to do it. Stats bloat is already a problem on
>> bigger systems. More important, I don't think having just the last
>> result is very useful. If you've got a vacuum problem, you want to see
>> history, especially history of the vacuum runs themselves.
>>
>> The good news is that vacuum is a very low-frequency operation, so it
>> has none of the concerns that the generic stats system does. I think it
>> would be reasonable to provide event triggers that fire on every
>> launcher loop, after a worker has built it's "TODO list", and after
>> every (auto)vacuum.
>
> The main issue I see with an event trigger based solution is that you'll
> always have to create them and the needed objects on every database.

Which has surely a performance impact as those are row-based. I have
seen complains regarding the fact that those objects can be easily
forgotten...

> Another issue is that both of these approach are not intended to give a
> global overview but per-database statistics. I'd prefer a global overview.

That's important, autovacuum GUC parameters, like the number of
workers, are system-wide.
-- 
Michael