Обсуждение: Logical Decoding Callbacks
I'm trying to write a logical decoding plugin and I'm seeing some interesting behavior around the startup callback. When I use psql and the built-in SQL functions (like pg_logical_slot_peek_changes) to use my module I see the startup_cb get called. I have written my own streaming replication client and for some reason when I use my client the startup_cb does not get called, but I can still see the changes just fine. The reason why the startup_cb is particularly important in my case, is I'm trying to set two dynamic OIDs for PostGIS geometry and geography types. I've tried setting them both as globals and as part of the output_plugin_private data.
			
		Any idea what's wrong here? Did I miss something in the streaming replication documentation?
Thanks in advance.
Xavier
On Tue, Feb 10, 2015 at 9:52 AM, Xavier Stevens wrote: > I'm trying to write a logical decoding plugin and I'm seeing some > interesting behavior around the startup callback. When I use psql and the > built-in SQL functions (like pg_logical_slot_peek_changes) to use my module > I see the startup_cb get called. I have written my own streaming replication > client and for some reason when I use my client the startup_cb does not get > called, but I can still see the changes just fine. The reason why the > startup_cb is particularly important in my case, is I'm trying to set two > dynamic OIDs for PostGIS geometry and geography types. I've tried setting > them both as globals and as part of the output_plugin_private data. > > Any idea what's wrong here? Did I miss something in the streaming > replication documentation? Not sure, I would imagine that your problem is in the manipulation of the memory contexts related to the decoder plugin and the way your variables are saved. The documentation states that startup_cb is called each time a replication slot is created or when a slot is asked to stream changes, be it through the psql interface or using the replication protocol: http://www.postgresql.org/docs/devel/static/logicaldecoding-output-plugin.html#LOGICALDECODING-OUTPUT-PLUGIN-CALLBACKS Just in case, I have just done a quick test with pg_recvlogical on latest HEAD of REL9_4_STABLE and the startup_cb gets called: DEBUG: received replication command: IDENTIFY_SYSTEM DEBUG: received replication command: START_REPLICATION SLOT "slot" LOGICAL 0/0 LOG: called startup_cb_wrapper Note that on 9.5 (master HEAD at c619c23) I am seeing an assertion failure. What is the server version you are using? I would imagine 9.4 but let's be sure. Regards, -- Michael
Hi, On 2015-02-09 16:52:59 -0800, Xavier Stevens wrote: > I'm trying to write a logical decoding plugin and I'm seeing some > interesting behavior around the startup callback. When I use psql and the > built-in SQL functions (like pg_logical_slot_peek_changes) to use my module > I see the startup_cb get called. I have written my own streaming > replication client and for some reason when I use my client the startup_cb > does not get called, but I can still see the changes just fine. The reason > why the startup_cb is particularly important in my case, is I'm trying to > set two dynamic OIDs for PostGIS geometry and geography types. I've tried > setting them both as globals and as part of the output_plugin_private data. > > Any idea what's wrong here? Did I miss something in the streaming > replication documentation? Hm, that's odd. The startup callback should (and is, I used it just a couple hours ago) definitely be called when using the streaming replication protocol. You're saying that the callback is not called when you do START_LOGICAL_REPLICATION ... ? Can you reproduce the problem with test_decoding? Because that'd actually crash if no startup callback were registered... Can you show your code/a testcase? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2015-02-10 11:01:08 +0900, Michael Paquier wrote: > Just in case, I have just done a quick test with pg_recvlogical on > latest HEAD of REL9_4_STABLE and the startup_cb gets called: > DEBUG: received replication command: IDENTIFY_SYSTEM > DEBUG: received replication command: START_REPLICATION SLOT "slot" LOGICAL 0/0 > LOG: called startup_cb_wrapper > Note that on 9.5 (master HEAD at c619c23) I am seeing an assertion > failure. What is the server version you are using? I would imagine 9.4 > but let's be sure. Uh, which plugin and what assertion? What's the backtrace? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Feb 10, 2015 at 5:59 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-02-10 11:01:08 +0900, Michael Paquier wrote:
>> Just in case, I have just done a quick test with pg_recvlogical on
>> latest HEAD of REL9_4_STABLE and the startup_cb gets called:
>> DEBUG: received replication command: IDENTIFY_SYSTEM
>> DEBUG: received replication command: START_REPLICATION SLOT "slot" LOGICAL 0/0
>> LOG: called startup_cb_wrapper
>> Note that on 9.5 (master HEAD at c619c23) I am seeing an assertion
>> failure. What is the server version you are using? I would imagine 9.4
>> but let's be sure.
>
> Uh, which plugin and what assertion? What's the backtrace?
I am raising that on a new thread on -hackers, that's not related to the report here. Thanks for reminding.> On 2015-02-10 11:01:08 +0900, Michael Paquier wrote:
>> Just in case, I have just done a quick test with pg_recvlogical on
>> latest HEAD of REL9_4_STABLE and the startup_cb gets called:
>> DEBUG: received replication command: IDENTIFY_SYSTEM
>> DEBUG: received replication command: START_REPLICATION SLOT "slot" LOGICAL 0/0
>> LOG: called startup_cb_wrapper
>> Note that on 9.5 (master HEAD at c619c23) I am seeing an assertion
>> failure. What is the server version you are using? I would imagine 9.4
>> but let's be sure.
>
> Uh, which plugin and what assertion? What's the backtrace?
-- 
Michael
Michael
Sorry to raise the issue on startup_cb. I added a whole bunch of logging statements and I was only running the section of code I wanted when the startup callback had options.
This now gets me to the next issue I encounter. In my output plugin, I'm trying to use the SPI interface to query about PostGIS OIDs in the startup callback. Just calling SPI_connect() seems to be causing a segfault.
This is the last thing I see in the logs before the segfault occurs: 
On Tue, Feb 10, 2015 at 4:18 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Feb 10, 2015 at 5:59 PM, Andres Freund <andres@2ndquadrant.com> wrote:I am raising that on a new thread on -hackers, that's not related to the report here. Thanks for reminding.
> On 2015-02-10 11:01:08 +0900, Michael Paquier wrote:
>> Just in case, I have just done a quick test with pg_recvlogical on
>> latest HEAD of REL9_4_STABLE and the startup_cb gets called:
>> DEBUG: received replication command: IDENTIFY_SYSTEM
>> DEBUG: received replication command: START_REPLICATION SLOT "slot" LOGICAL 0/0
>> LOG: called startup_cb_wrapper
>> Note that on 9.5 (master HEAD at c619c23) I am seeing an assertion
>> failure. What is the server version you are using? I would imagine 9.4
>> but let's be sure.
>
> Uh, which plugin and what assertion? What's the backtrace?--
Michael
On 2015-02-10 10:33:41 -0800, Xavier Stevens wrote: > Sorry to raise the issue on startup_cb. I added a whole bunch of logging > statements and I was only running the section of code I wanted when the > startup callback had options. Heh. Just to make sure: You can pass options via replication protocol too. > This now gets me to the next issue I encounter. In my output plugin, I'm > trying to use the SPI interface to query about PostGIS OIDs in the startup > callback. Just calling SPI_connect() seems to be causing a segfault. > This is the last thing I see in the logs before the segfault occurs: > > https://github.com/xstevens/decoderbufs/blob/master/src/decoderbufs.c#L151 The problem likely is that in the startup callback you're neither guaranteed to be in a transaction, nor to have a snapshot set up. It'd generally be easier to analyze such problems if you provide a backtrace (e.g. by enabling core files). Another generally very adviseable thing to do when developing code running in the backend is to enable assertions (you may already do that...). You can lookup types much easier than that btw. C.f. TypenameGetTypid(typname) But note that both that and what you do would possibly fail if there's more than one geometry type around. You could either hardcode postgis' schema name and use namespaceId = LookupExplicitNamespace(schemaname, false); typoid = GetSysCacheOid2(TYPENAMENSP, PointerGetDatum(typname), ObjectIdGetDatum(namespaceId)); if (typoid == InvalidOid) elog(ERROR, "cache lookup failed for type %u", typoid); or be a bit more complex and lookup the postgis' extension's schema pg_extension.extnamespace first. Anyway, to run full queries in the startup callback you're going to have to do something like: if (!IsTransactionState()) { tx_started = true; StartTransactionCommand(); } PushActiveSnapshot(GetTransactionSnapshot()); /* do your stuff */ PopActiveSnapshot(); if (tx_started) CommitTransactionCommand(); Note that the begin, change, commit callbacks *do* run with a transaction and snapshot setup. But you can't run general SQL queries - only catalog tables (or tables marked as such) are accessible. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
There was no reason I needed to run full statements in this case. I just didn't know I could get the type ids like that.
On Tue, Feb 10, 2015 at 1:23 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-02-10 10:33:41 -0800, Xavier Stevens wrote:
> Sorry to raise the issue on startup_cb. I added a whole bunch of logging
> statements and I was only running the section of code I wanted when the
> startup callback had options.
Heh.
Just to make sure: You can pass options via replication protocol too.
> This now gets me to the next issue I encounter. In my output plugin, I'm
> trying to use the SPI interface to query about PostGIS OIDs in the startup
> callback. Just calling SPI_connect() seems to be causing a segfault.
> This is the last thing I see in the logs before the segfault occurs:
>
> https://github.com/xstevens/decoderbufs/blob/master/src/decoderbufs.c#L151
The problem likely is that in the startup callback you're neither
guaranteed to be in a transaction, nor to have a snapshot set up.
It'd generally be easier to analyze such problems if you provide a
backtrace (e.g. by enabling core files). Another generally very
adviseable thing to do when developing code running in the backend is to
enable assertions (you may already do that...).
You can lookup types much easier than that btw. C.f. TypenameGetTypid(typname)
But note that both that and what you do would possibly fail if there's
more than one geometry type around. You could either hardcode postgis'
schema name and use
namespaceId = LookupExplicitNamespace(schemaname, false);
typoid = GetSysCacheOid2(TYPENAMENSP, PointerGetDatum(typname), ObjectIdGetDatum(namespaceId));
if (typoid == InvalidOid)
elog(ERROR, "cache lookup failed for type %u", typoid);
or be a bit more complex and lookup the postgis' extension's schema
pg_extension.extnamespace first.
Anyway, to run full queries in the startup callback you're going to have
to do something like:
if (!IsTransactionState())
{
tx_started = true;
StartTransactionCommand();
}
PushActiveSnapshot(GetTransactionSnapshot());
/* do your stuff */
PopActiveSnapshot();
if (tx_started)
CommitTransactionCommand();
Note that the begin, change, commit callbacks *do* run with a
transaction and snapshot setup. But you can't run general SQL queries -
only catalog tables (or tables marked as such) are accessible.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services