Re: bgworker / SPI docs patches

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: bgworker / SPI docs patches
Дата
Msg-id 20150729165015.GC2437@postgresql.org
обсуждение исходный текст
Ответ на bgworker / SPI docs patches  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-docs
I wrote this a very long time ago but apparently it was eaten by a spam
filter before it reached the lists.  Resending.

Craig Ringer wrote:

> From c73d1303cb3474ccc799eeda252eeef1fc3d9026 Mon Sep 17 00:00:00 2001
> From: Craig Ringer <craig@2ndquadrant.com>
> Date: Mon, 16 Feb 2015 18:17:48 +1300
> Subject: [PATCH 1/4] Add xreflabel to most chapter entries

So I think this is pretty neat as a general idea, but there was
opposition in that printed materials would no longer have a useful
pointer.  If we could tweak the output so that it says something like
"Chapter 33, Herding Elephants" instead of just "Chapter 33" (current)
or just "Herding Elephants" (your proposal), I think the whole patch
would be great.  Now I don't really know how to do this; at the very
least we will need to edit stylesheet.dsl and stylesheet.xsl, I think.

So dbl1en.dsl defines en-xref-strings for chapters like

    (list (normalize "chapter")     (if %chapter-autolabel%
                        "&Chapter; %n"
                        "the &chapter; called %t"))

if we could change that to, say,
    "&Chapter; %n (“%t”)"
which I think would render as
Chapter 33 (“Herding Elephants”)

we'd be done in the DSSSL side.  No idea how to actually do it, though
... Peter, any thoughts?

In XSL, there's xref-number which is presumably what we use today, but
there's also xref-number-and-title.  I think it's just a matter of
changing xrefstyle.

> From 6e2103b211aee44780c6106e307ea4a561d11fde Mon Sep 17 00:00:00 2001
> From: Craig Ringer <craig@2ndquadrant.com>
> Date: Mon, 16 Feb 2015 18:17:59 +1300
> Subject: [PATCH 2/4] Document how to build with the website style

+1 for this.


> diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
> index 6eada92..6437ab1 100644
> --- a/doc/src/sgml/xfunc.sgml
> +++ b/doc/src/sgml/xfunc.sgml
> @@ -3307,6 +3307,15 @@ CREATE FUNCTION make_array(anyelement) RETURNS anyarray
>     <sect2 id="xfunc-shmem-lwlocks" xreflabel="shared memory and LWLocks in user-defined C functions">
>      <title>Shared Memory and LWLocks</title>
>
> +    <indexterm zone="xfunc-shmem-lwlocks">
> +     <primary>Shared memory in extensions</primary>
> +    </indexterm>

I would do this as
  <primary>shared memory</primary><secondary>in extensions</secondary>
instead.

> +    <indexterm zone="xfunc-shmem-lwlocks">
> +     <primary>LWLocks (extensions)</primary>
> +     <secondary>Lightweight locks (extensions)</secondary>
> +    </indexterm>

I think this is two entries,
  <indexterm>
    <primary>lightweight locks</primary>
    <secondary>in extensions</secondary>
  </indexterm>

  <indexterm>
    <primary>lwlocks</primary>
    <see>lightweight locks</see>
  </indexterm>

> From 04e1cbe49268e22886e2b30fe412fef0cb9dcc65 Mon Sep 17 00:00:00 2001
> From: Craig Ringer <craig@2ndquadrant.com>
> Date: Mon, 16 Feb 2015 18:37:19 +1300
> Subject: [PATCH 4/4] Document SPI use from bgworkers in greater detail

This is the meat of the submission, isn't it.

> diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
> index ef28f72..d0e7dc0 100644
> --- a/doc/src/sgml/bgworker.sgml
> +++ b/doc/src/sgml/bgworker.sgml
> @@ -34,18 +34,21 @@
>    <productname>PostgreSQL</> is started by including the module name in
>    <varname>shared_preload_libraries</>.  A module wishing to run a background
>    worker can register it by calling
> +  <indexterm><primary>RegisterBackgroundWorker</primary></indexterm>
>    <function>RegisterBackgroundWorker(<type>BackgroundWorker *worker</type>)</function>
>    from its <function>_PG_init()</>.  Background workers can also be started
>    after the system is up and running by calling the function
> +  <indexterm><primary>RegisterDynamicBackgroundWorker</primary></indexterm>
>    <function>RegisterDynamicBackgroundWorker(<type>BackgroundWorker
>    *worker, BackgroundWorkerHandle **handle</type>)</function>.  Unlike
>    <function>RegisterBackgroundWorker</>, which can only be called from within
>    the postmaster, <function>RegisterDynamicBackgroundWorker</function> must be
> -  called from a regular backend.
> +  called from a regular backend (which might be another background worker).
>   </para>

I don't think we typically attach <indexterm> tags within paragraphs;
most commonly they are outside <para> or other block items.  I imagine
this works fine, but it's probably best to continue tradition unless
there's a specific reason to do otherwise.

>    <para>
>     <structfield>bgw_flags</> is a bitwise-or'd bit mask indicating the
> -   capabilities that the module wants.  Possible values are
> -   <literal>BGWORKER_SHMEM_ACCESS</literal> (requesting shared memory access)
> -   and <literal>BGWORKER_BACKEND_DATABASE_CONNECTION</literal> (requesting the
> -   ability to establish a database connection, through which it can later run
> -   transactions and queries). A background worker using
> -   <literal>BGWORKER_BACKEND_DATABASE_CONNECTION</literal> to connect to
> -   a database must also attach shared memory using
> -   <literal>BGWORKER_SHMEM_ACCESS</literal>, or worker start-up will fail.
> +   capabilities that the module wants.  Possible values are:
> +   <variablelist>

Using a <variablelist> rather than prose seems much better to me.


>    <para>
> -   <structfield>bgw_main</structfield> is a pointer to the function to run when
> -   the process is started.  This function must take a single argument of type
> -   <type>Datum</> and return <type>void</>.
> -   <structfield>bgw_main_arg</structfield> will be passed to it as its only
> -   argument.  Note that the global variable <literal>MyBgworkerEntry</literal>
> -   points to a copy of the <structname>BackgroundWorker</structname> structure
> -   passed at registration time. <structfield>bgw_main</structfield> may be
> -   NULL; in that case, <structfield>bgw_library_name</structfield> and
> -   <structfield>bgw_function_name</structfield> will be used to determine
> -   the entry point.  This is useful for background workers launched after
> -   postmaster startup, where the postmaster does not have the requisite
> -   library loaded.
> +   <structfield>bgw_main</structfield> specifies the function to run when the
> +   background worker is started. If non-NULL,
> +   <structfield>bgw_main</structfield> will take precedence over
> +   <structfield>bgw_library_name</structfield> and
> +   <structfield>bgw_function_name</structfield>.
> +   <warning>
> +    <para>
> +     Use of this field is deprecated. It should be set to
> +     <literal>NULL</literal> then <structfield>bgw_library_name</structfield>
> +     and <structfield>bgw_function_name</structfield> should be used instead.
> +    </para>
> +    <para>
> +     Specifying the background worker entry point using a function pointer will
> +     not work correctly on Windows (or with <literal>EXEC_BACKEND</literal>
> +     defined). It may also malfunction when used to point to a function in a
> +     dynamically loaded shared library used by a dynamic background worker.
> +    </para>
> +   </warning>
>    </para>

Interesting.  Given the drawbacks of the old mechanism, I am okay with this change.
In fact, why don't we remove bgw_main completely?

> +  <para>
>     <structfield>bgw_notify_pid</structfield> is the PID of a PostgreSQL
>     backend process to which the postmaster should send <literal>SIGUSR1</>
> -   when the process is started or exits.  It should be 0 for workers registered
> -   at postmaster startup time, or when the backend registering the worker does
> +   when the process is started or exits, causing the notified process's latch
> +   to be set.  It should be 0 for workers registered
> +   at postmaster startup time, when the backend registering the worker does
>     not wish to wait for the worker to start up.  Otherwise, it should be
>     initialized to <literal>MyProcPid</>.
>    </para>

I'm not sure about documenting that SIGUSR1 will set a process' latch.
That can be changed in that process' signal handler.  Maybe this is fine
if we add the word "typically" somewhere.

> diff --git a/doc/src/sgml/example-worker-spi.sgml b/doc/src/sgml/example-worker-spi.sgml
> new file mode 100644
> index 0000000..63d8f3d
> --- /dev/null
> +++ b/doc/src/sgml/example-worker-spi.sgml
> @@ -0,0 +1,26 @@
> +<!-- doc/src/sgml/example-worker-spi.sgml -->
> +
> +<sect1 id="example-worker-spi" xreflabel="worker-spi">
> + <title>worker-spi example</title>
> +
> + <indexterm zone="example-worker-spi">
> +  <primary>Background worker with SPI</primary>
> +  <secondary>SPI in background worker</secondary>
> + </indexterm>

I don't think this module really needs a doc page, but okay.  In any
case, I think these <indexterm> aren't good; one of them should be split
in <primary><secondary> and the other one should probably be a <see>.

> @@ -4092,6 +4097,143 @@ INSERT INTO a SELECT * FROM a;
>    </para>
>   </sect1>
>
> + <sect1 id="spi-bgworker" xreflabel="SPI in background workers">
> +  <title>Using the SPI from background workers</title>
> +
> +  <para>
> +   Using the SPI from <link linkend="bgworker">background workers</>
> +   requires some additional steps that are not required from SPI-using
> +   procedures invoked via SQL from normal user backends. When using the SPI
> +   from an SQL-callable function it is safe to assume that a transaction is
> +   already open, there is already a snapshot, and that PostgreSQL's statistics
> +   have been updated. None of these things may be assumed for use of the SPI
> +   from a bgworker.
> +  </para>
> +
> +  <para>
> +   To safely use the SPI from a bgworker you must ensure that:
> +  <itemizedlist>
> +   <listitem><para>A transaction is open (in progress)</para></listitem>
> +   <listitem><para>The SPI is connected</para></listitem>
> +   <listitem><para>There is an active snapshot or <parameter>read_only</parameter> is set to
<literal>false</literal>in SPI calls</para></listitem> 
> +  </itemizedlist>
> +   You should also update the statement start and end timestamps for use in
> +   <literal>pg_stat_activity</> using <function>pgstat_report_activity</>.
> +  </para>
> +
> +  <para>
> +   Attempting to use the SPI without an open transaction or using the SPI
> +   in <parameter>read_only</parameter> mode without setting a snapshot will
> +   generally <emphasis>not</emphasis> cause obvious failures or, in a
> +   <literal>--enable-cassert</literal> build, assertions. Instead it will
> +   usually appear to work for simple cases but may produce incorrect
> +   results or be unstable. Users of background workers should generally
> +   prefix SPI use with
> +<programlisting>
> +Assert(IsTransactionState());
> +</programlisting>
> +   if the function does not its self establish a transaction.
> +  </para>

+1 to the general idea of adding this section.  "its self"?

> +  <para>
> +   Typical SPI use from a background worker involves setting up a transaction,
> +   setting the statement start timestamp, connecting to the SPI, updating
> +   pg_stat_activity, establishing a snapshot, running the query/queries,
> +   then performing matching cleanup actions. For example:
> +  </para>

Can't we just refer to the source code of worker_spi?  That's the whole
point of that module.  Maybe we can add lots more comment to that, to
explain the necessity of each line there.

> +  <para>
> +   In more complex cases the function may need to check if a transaction is
> +   already open with <function>IsTransactionState()</function> and only perform
> +   transaction management if there is no current transaction. If the SPI may
> +   already connected by the caller the function should use
> +   <function>SPI_push_conditional</function> before <function>SPI_connect</function> and
> +   <function>SPI_pop_conditional</function> after <function>SPI_finish</function>. Snapshot
> +   state may also be conditionally handled. The resulting procedure will
> +   look more like:

If we need a more complex example in worker_spi, let's add one.

> diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
> index 6437ab1..39e6a15 100644
> --- a/doc/src/sgml/xfunc.sgml
> +++ b/doc/src/sgml/xfunc.sgml
> @@ -3357,6 +3357,60 @@ if (!ptr)
>  }
>  </programlisting>
>      </para>
> +
> +    <para>
> +     Extensions registered to run at startup
> +     using <xref linkend="guc-shared-preload-libraries"> should instead install a
> +     <literal>shmem_startup_hook</> from <function>_PG_init</>.  This will be
> +     invoked once shared memory is ready but before user backends or background
> +     workers get launched. The hook function invoked should use a pattern like
> +     that shown for shared memory initialization above then call the next hook
> +     function, e.g. declare:
> +<programlisting>
> +static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
> +</programlisting>
> +     then from <function>_PG_init</>, after any
> +     <function>RequestAddinShmemSpace</> and <function>RequestAddinLWLocks</>
> +     install the hook:
> +<programlisting>
> +     prev_shmem_startup_hook = shmem_startup_hook;
> +     shmem_startup_hook = my_worker_shmem_startup;
> +</programlisting>
> +     where <function>my_worker_shmem_startup</function> is a <type>void</> function
> +     with <type>void</> arguments that calls the next hook (if any):
> +<programlisting>
> +     if (prev_shmem_startup_hook != NULL)
> +         prev_shmem_startup_hook();
> +</programlisting>
> +     then calls <function>ShmemInitStruct</> per the example shown above. An
> +     example of this usage can be found in the initialisation of the <xref
> +     linkend="pgstatstatements"> extension.
> +    </para>

No problem with this.


> +     <caution>
> +      <title>Shared memory is zeroed on postmaster restart</title>
> +      <para>
> +       If the postmaster restarts - usually due to the crash of a backend
> +       process like a user backend or a background worker - then it zeroes
> +       shared memory on restart then re-invokes any shared memory setup hooks
> +       that have been installed by extensions. Extensions using shared memory
> +       must be prepared for this.
> +      </para>
> +      <para>
> +       Background workers are not unregistered if the postmaster restarts
> +       due to the crash of a background worker or user backend. This can lead
> +       to background workers accessing shared memory that has been reinitialized.
> +       Care must be taken to ensure that initialisation of shared memory produces
> +       stable results when re-run or workers must detect the postmaster restart
> +       using a generation counter and exit. In particular if parameters are being
> +       passed to a bgworker via shared memory the background worker
> +       <structfield>bgw_main_arg</structfield> should always be an index into an
> +       array allocated in shared memory, rather than a pointer, and repeated
> +       initialisations of the shared memory segment must produce the same range
> +       of valid indexes in the same order.
> +      </para>
> +     </caution>
> +
>     </sect2>

Hm.  Aren't all bgworkers also restarted if postmaster power-cycles
everything due to a crash in any process?  ISTM that they should be, and
if not, that's a bug.

Please use — for long dashes, but I think the whole <caution>
block should go away.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-docs по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: bgworker / SPI docs patches
Следующее
От: Robert Haas
Дата:
Сообщение: Re: bgworker / SPI docs patches