Обсуждение: Unstable C Function
I'm running PG 9.5 on Win 10 64-bit. I'm compiling C under VS 2016.
I am forming a function that will evolve into a somewhat complex beast. To test out my initial efforts, the function accepts an array of int4 (this works fine and the code for processing it is not shown here). The function then grabs a few rows from a table, pushes the values into a FIFO, then pulls them out and renders the results. This approach is strategic to how the function will operate when complete.
The function works fine on first call, sometimes more, then either resets the connection or throws this on any further calls:
ERROR: cache lookup failed for type 0 SQL state: XX000
I noticed that removing references to the FIFO improves stability, but doesn't solve it. Here is my code as lean as I can get it for question purposes:
PG:
CREATE TABLE md_key
( id serial NOT NULL PRIMARY KEY, pid int4 NOT NULL, key integer NOT NULL, vals int4[]
);
…
CREATE OR REPLACE FUNCTION md_key_query(int4[]) RETURNS TABLE ( id int4, vals int4[]) AS E'\RoctPG', --abreviated for question 'md_key_query' LANGUAGE c IMMUTABLE STRICT;
…
select * from md_key_query(array[1,2,3,4]::int4[])
C:
PG_FUNCTION_INFO_V1(md_key_query);
typedef struct { Datum id; Datum vals; } MdKeyNode;
typedef struct fifoAry { MdKeyNode nodes[32]; struct fifoAry *next; int32 readpos; int32 writepos; } FifoAry;
typedef struct { FifoAry *fifo; FifoAry *tail; FifoAry *head; uint32 nodescount; Datum *retvals[2]; bool *retnulls[2]; } CtxArgs;
inline void push(CtxArgs *args, Datum id, Datum vals) { if (args->head->writepos == 32) args->head = args->head->next = (FifoAry*)palloc0(sizeof(FifoAry));
MdKeyNode *node = &(args->head->nodes[args->head->writepos++]); node->id = id; node->vals = vals; args->nodescount++; }
inline MdKeyNode* pop(CtxArgs *args)
{
// if (!args->nodescount)
// return NULL; if (args->tail->readpos == 32) args->tail = args->tail->next;
args->nodescount--;
return &(args->tail->nodes[args->tail->readpos++]);
}
// use STRICT in the caller wrapper to ensure a null isn't passed in
PGMODULEEXPORT Datum md_key_query(PG_FUNCTION_ARGS)
{ uint32 i; FuncCallContext *funcctx; HeapTuple tuple; MdKeyNode *node; CtxArgs *args;
if (SRF_IS_FIRSTCALL()) { funcctx = SRF_FIRSTCALL_INIT();
MemoryContext oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); ArrayType *a = PG_GETARG_ARRAYTYPE_P(0); Datum *in_datums; bool *in_nulls; bool fieldNull; SPITupleTable *tuptable = SPI_tuptable; int32 ret; uint32 proc;
if (get_call_result_type(fcinfo, NULL, &funcctx->tuple_desc) != TYPEFUNC_COMPOSITE) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("function returning record called in context that cannot accept type record")));
deconstruct_array(a, INT4OID, 4, true, 'i', &in_datums, &in_nulls, &ret);
if (!ret) PG_RETURN_NULL();
(SPI_connect();
// initialize and set the cross-call structure funcctx->user_fctx = args = (CtxArgs*)palloc0(sizeof(CtxArgs)); args->fifo = args->tail = args->head = (FifoAry*)palloc0(sizeof(FifoAry)); args->retvals = (Datum*)palloc(sizeof(Datum) * 2); args->retnulls = (bool*)palloc0(sizeof(bool) * 2);
BlessTupleDesc(funcctx->tuple_desc);
// do some work here
// this is simply a test to see if this function is behaving as expected ret = SPI_execute("select id, vals from public.md_key where vals is not null limit 64", true, 0);
if (ret <= 0) ereport(ERROR, (errcode(ERRCODE_SQL_ROUTINE_EXCEPTION), errmsg("could not execute SQL")));
proc = SPI_processed;
if (proc > 0) { TupleDesc tupdesc = SPI_tuptable->tupdesc; SPITupleTable *tuptable = SPI_tuptable;
for (i = 0; i < proc; i++) { tuple = tuptable->vals[i]; push(args, SPI_getbinval(tuple, tupdesc, 1, &fieldNull), SPI_getbinval(tuple, tupdesc, 2, &fieldNull)); } }
SPI_finish(); MemoryContextSwitchTo(oldcontext); }
funcctx = SRF_PERCALL_SETUP(); args = funcctx->user_fctx;
if (args->nodescount > 0) { node = pop(args); args->retvals[0] = node->id; args->retvals[1] = node->vals; tuple = heap_form_tuple(funcctx->tuple_desc, args->retvals, args->retnulls); SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple)); } else { SRF_RETURN_DONE(funcctx); }
}
Ian Campbell <ianrc72@gmail.com> writes: > The function works fine on first call, sometimes more, then either resets > the connection or throws this on any further calls: > ERROR: cache lookup failed for type 0 SQL state: XX000 I think the core problem here is that you're dealing with pass-by-reference results from the SPI_execute() --- specifically, the int4[] "vals" values --- as if they were pass-by-value. You're just saving the Datums, which are only pointers, and expecting what they point to to still be good when you get around to doing heap_form_tuple with them. But in reality they stopped being good the moment you did SPI_finish(). The failures would be a lot less intermittent if you were testing in a debug build (with CLOBBER_FREED_MEMORY defined). The two basic approaches you could take that would work reliably are 1. Copy all the int4[] values into the multi_call_memory_ctx before doing SPI_finish. 2. Instead of using multi-call mode, form all the result tuples during a single call and return them in a tuplestore, so that all the work is done before you call SPI_finish. You wouldn't really need the FIFO data structure if you did it that way. There are some other things I could criticize here, like labeling the function IMMUTABLE when its results depend on table contents, but they probably aren't causing your crashes. regards, tom lane
Ian Campbell <ianrc72@gmail.com> writes:
> The function works fine on first call, sometimes more, then either resets
> the connection or throws this on any further calls:
> ERROR: cache lookup failed for type 0 SQL state: XX000
I think the core problem here is that you're dealing with
pass-by-reference results from the SPI_execute() --- specifically,
the int4[] "vals" values --- as if they were pass-by-value. You're
just saving the Datums, which are only pointers, and expecting what
they point to to still be good when you get around to doing
heap_form_tuple with them. But in reality they stopped being good
the moment you did SPI_finish().
The failures would be a lot less intermittent if you were testing in
a debug build (with CLOBBER_FREED_MEMORY defined).
The two basic approaches you could take that would work reliably are
1. Copy all the int4[] values into the multi_call_memory_ctx before
doing SPI_finish.
2. Instead of using multi-call mode, form all the result tuples during
a single call and return them in a tuplestore, so that all the work
is done before you call SPI_finish. You wouldn't really need the FIFO
data structure if you did it that way.
There are some other things I could criticize here, like labeling the
function IMMUTABLE when its results depend on table contents, but
they probably aren't causing your crashes.
regards, tom lane
Ian Campbell <ianrc72@gmail.com> writes: > Thanks for personally replying, Tom. I appreciate it. > You are correct. In the interim, I found the following change solved the > issue: > SPI_finish(); // move to here > SRF_RETURN_DONE(funcctx); That might work under light usage, but the problem with it is you're blocking any other function in the same query from using SPI, since you're leaving your own connection active when returning. Sooner or later that's gonna be a problem. regards, tom lane
Ian Campbell <ianrc72@gmail.com> writes:
> Thanks for personally replying, Tom. I appreciate it.
> You are correct. In the interim, I found the following change solved the
> issue:
> SPI_finish(); // move to here
> SRF_RETURN_DONE(funcctx);
That might work under light usage, but the problem with it is you're
blocking any other function in the same query from using SPI, since
you're leaving your own connection active when returning. Sooner
or later that's gonna be a problem.
regards, tom lane
Ian Campbell <ianrc72@gmail.com> writes: > OK, so SPI is only suitable for single-call functions, right? It's usable by strictly nested functions, but a multi-call SRF is more like a coroutine. > If another > function in the query attempted to use SPI, I assume there would be a > deadlock? No, the second arrival would get a SPI_ERROR_CONNECT failure from SPI_connect when there's already an open connection. For the nested- calls case you can prevent that with SPI_push/SPI_pop around a call that might wish to use SPI, but that fix doesn't work in a coroutine situation. regards, tom lane