Обсуждение: Convert pltcl from strings to objects

Поиск
Список
Период
Сортировка

Convert pltcl from strings to objects

От
Jim Nasby
Дата:
Currently, pl/tcl is implemented through nothing but string
manipulation. In other words, the C code is effectively creating a giant
string that the tcl interpreter must re-parse every time the function is
executed. Additionally, all arguments are treated as raw strings,
instead of the far more efficient internal tcl object types.

The biggest win comes with how pltcl interfaces with SPI result sets.
Currently, the entire chunk of tcl code that is executed for each result
row must be reparsed and recompiled from scratch. Now, the code is
compiled once and the bytecode is stored.

This work is sponsored by Flight Aware (http://flightaware.com/).
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Вложения

Re: Convert pltcl from strings to objects

От
Victor Wagner
Дата:
On Tue, 9 Feb 2016 16:23:21 -0600
Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:

> Currently, pl/tcl is implemented through nothing but string 
> manipulation. In other words, the C code is effectively creating a
> giant string that the tcl interpreter must re-parse every time the
> function is executed. Additionally, all arguments are treated as raw
> strings, instead of the far more efficient internal tcl object types.
> 
> The biggest win comes with how pltcl interfaces with SPI result sets. 
> Currently, the entire chunk of tcl code that is executed for each
> result row must be reparsed and recompiled from scratch. Now, the
> code is compiled once and the bytecode is stored.
> 
> This work is sponsored by Flight Aware (http://flightaware.com/).

I've looked to your patch. As far as I know Tcl Object API it looks
quite good.

Of course, it breaks compatibility with pre-8.0 versions of Tcl, but it
is almost twenty years since Tcl 8.0 come out.

I think that checking for pre-8.0 Tcl and defining Tcl_StringResult in
this case, should be removed from the code, because there is no object
API in pre-8.0 Tcl anyway, and it doesn't worth effort to maintain
compatibility. (line 51 of pltcl.c). 

There is suspicious  place at the end of compile_pltcl_fuction function,
where you've put comment that old prodesc cannot be deallocated,
because it might be used by other call.

It seems that reference counting mechanism which Tcl already has, can
help here. Would there be serious performance impact if you'll use Tcl
list instead of C structure to store procedure description? 
If not, you can rely on Tcl_IncrRefCount/Tcl_DecrRefCount to free a
procedure description when last active call finishes.

Function pltcl_elog have a big switch case to convert enum logpriority,
local to this function to PostgreSql log levels.

It seems not a most readable solution, because this enumeration is
desined to be passed to Tcl_GetIndexFromObj, so it can be used and is
used to index an array (logpriorities array of string representation).
You can define simular array with PostgreSQL integer constant and
replace page-sized switch with just two lines - this array definition
and getting value from it by index

static CONST84 int
loglevels[]={DEBUG,LOG,INFO,NOTICE,WARNING,ERROR,FATAL};

....
Tcl_GetIndexFromObj(....

level=loglevels[priIndex];


It seems that you are losing some diagnostic information by
extracting just message field from ErrorData, which you do in 
pltcl_elog and pltcl_subtrans_abort.

Tcl has  mechanisms for passing around additional error information.
See Tcl_SetErrorCode and Tcl_AddErrorInfo functions

pltcl_process_SPI_result might benefit from providing SPI result code
in machine readable way via Tcl_SetErrorCode too.


In some cases you use Tcl_SetObjResult(interp, Tcl_NewStringObj("error
message",-1)) to report error with constant message, where in other
places Tcl_SetResult(interp,"error message",TCL_STATIC) left in place.

I see no harm in using old API with static messages, especially if
Tcl_AppendResult is used, but mixing two patterns above seems to be a
bit inconsistent.

As far as I can understand, using Tcl_StringObj to represent all
postgresql primitive (non-array) types and making no attempt to convert
tuple data into integer, boolean or double objects is design decision.

It really can spare few unnecessary type conversions. 

Thanks for your effort.

-- 




Re: Convert pltcl from strings to objects

От
Jim Nasby
Дата:
On 2/18/16 6:26 AM, Victor Wagner wrote:
> On Tue, 9 Feb 2016 16:23:21 -0600
> There is suspicious  place at the end of compile_pltcl_fuction function,
> where you've put comment that old prodesc cannot be deallocated,
> because it might be used by other call.
>
> It seems that reference counting mechanism which Tcl already has, can
> help here. Would there be serious performance impact if you'll use Tcl
> list instead of C structure to store procedure description?
> If not, you can rely on Tcl_IncrRefCount/Tcl_DecrRefCount to free a
> procedure description when last active call finishes.

Are you referring to this comment?

>         /************************************************************
>          * Add the proc description block to the hashtable.  Note we do not
>          * attempt to free any previously existing prodesc block.  This is
>          * annoying, but necessary since there could be active calls using
>          * the old prodesc.
>          ************************************************************/

That is not changed by the patch, and I don't think it's in the scope of 
this patch. I agree it would be nice to get rid of that and the related 
malloc() call, as well as what perm_fmgr_info() is doing, but those are 
unrelated to this work.

(BTW, I also disagree about using a Tcl list for prodesc. That's an 
object in a *Postgres* hash table; as such it needs to be managed by 
Postgres, not Tcl. AFAIK the Postgres ResourceOwner stuff would be the 
correct way to do that (but I'm not exactly an expert on that area).

> Function pltcl_elog have a big switch case to convert enum logpriority,
> local to this function to PostgreSql log levels.
>
> It seems not a most readable solution, because this enumeration is
> desined to be passed to Tcl_GetIndexFromObj, so it can be used and is
> used to index an array (logpriorities array of string representation).
> You can define simular array with PostgreSQL integer constant and
> replace page-sized switch with just two lines - this array definition
> and getting value from it by index
>
> static CONST84 int
> loglevels[]={DEBUG,LOG,INFO,NOTICE,WARNING,ERROR,FATAL};
>
> ....
> Tcl_GetIndexFromObj(....
>
> level=loglevels[priIndex];

Done.

> It seems that you are losing some diagnostic information by
> extracting just message field from ErrorData, which you do in
> pltcl_elog and pltcl_subtrans_abort.
>
> Tcl has  mechanisms for passing around additional error information.
> See Tcl_SetErrorCode and Tcl_AddErrorInfo functions
>
> pltcl_process_SPI_result might benefit from providing SPI result code
> in machine readable way via Tcl_SetErrorCode too.

Is there any backwards compatibility risk to these changes? Could having 
that new info break someone's existing code?

> In some cases you use Tcl_SetObjResult(interp, Tcl_NewStringObj("error
> message",-1)) to report error with constant message, where in other
> places Tcl_SetResult(interp,"error message",TCL_STATIC) left in place.
>
> I see no harm in using old API with static messages, especially if
> Tcl_AppendResult is used, but mixing two patterns above seems to be a
> bit inconsistent.

Switched everything to using the new API.

> As far as I can understand, using Tcl_StringObj to represent all
> postgresql primitive (non-array) types and making no attempt to convert
> tuple data into integer, boolean or double objects is design decision.
>
> It really can spare few unnecessary type conversions.

Yeah, that's on the list. But IMHO it's outside the scope of this patch.

> Thanks for your effort.

Thanks for the review!
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Convert pltcl from strings to objects

От
Victor Wagner
Дата:
On Mon, 22 Feb 2016 17:57:36 -0600
Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:

> On 2/18/16 6:26 AM, Victor Wagner wrote:

> Are you referring to this comment?
> 
> >         /************************************************************
> >          * Add the proc description block to the
> > hashtable.  Note we do not
> >          * attempt to free any previously existing prodesc
> > block.  This is
> >          * annoying, but necessary since there could be
> > active calls using
> >          * the old prodesc.
> >          ************************************************************/  
> 
> That is not changed by the patch, and I don't think it's in the scope
> of this patch. I agree it would be nice to get rid of that and the
> related malloc() call, as well as what perm_fmgr_info() is doing, but
> those are unrelated to this work.

If it has been there, before, it is OK to leave it this way.
It was rather wild guess that use of referential counting which we now
have here, can solve this old problem.
> (BTW, I also disagree about using a Tcl list for prodesc. That's an 
> object in a *Postgres* hash table; as such it needs to be managed by 
> Postgres, not Tcl. AFAIK the Postgres ResourceOwner stuff would be
> the correct way to do that (but I'm not exactly an expert on that
> area).

As far as I understand ResourceOwner is completely different garbage
collection scheme, than reference counting. ResourceOwner mechanism lets
free everything associated with transaction when transaction is over.

Here we have another case. prodesc is a global thing. And it is shared
between different operations. Problem was that there is no partcular
owner, and we have to wait when last operation which deals with it
would finish. It looks like perfect job for reference counting.

Actually, it doesn't matter much, because it wastes some memory only on
procedure change, and typically in the heavy loaded production
environments server side code doesn't change too frequently.

> 
> > Function pltcl_elog have a big switch case to convert enum
> > logpriority, local to this function to PostgreSql log levels.
> >
> > It seems not a most readable solution, because this enumeration is
> > desined to be passed to Tcl_GetIndexFromObj, so it can 
[skip]
> 
> Done.

Glad that my suggestion was useful to you.

> 
> > It seems that you are losing some diagnostic information by
> > extracting just message field from ErrorData, which you do in
> > pltcl_elog and pltcl_subtrans_abort.
> >
> > Tcl has  mechanisms for passing around additional error information.
> > See Tcl_SetErrorCode and Tcl_AddErrorInfo functions
> >
> > pltcl_process_SPI_result might benefit from providing SPI result
> > code in machine readable way via Tcl_SetErrorCode too.  
> 
> Is there any backwards compatibility risk to these changes? Could
> having that new info break someone's existing code?

I don't think so. ErrorCode and ErrorInfo mechanisms present in the Tcl
for ages. As far as I remember, it predates Tcl_Obj API. Unfortunately
I've lost my copy of 2-nd edition of "Practical Programming on Tcl/Tk"
and have only 4-th edition handy, so it's quite hard to dig out Tcl 7.6
docs. 

Moreover. errorcode and errorinfo information travel behind the scenes
and cannot break any code which doesn't try to explicitely access it.


> > In some cases you use Tcl_SetObjResult(interp,
> > Tcl_NewStringObj("error message",-1)) to report error with constant
> > message, where in other places Tcl_SetResult(interp,"error
> > message",TCL_STATIC) left in place.
> >
> > I see no harm in using old API with static messages, especially if
> > Tcl_AppendResult is used, but mixing two patterns above seems to be
> > a bit inconsistent.  
> 
> Switched everything to using the new API.
> 
> > As far as I can understand, using Tcl_StringObj to represent all
> > postgresql primitive (non-array) types and making no attempt to
> > convert tuple data into integer, boolean or double objects is
> > design decision.
> >
> > It really can spare few unnecessary type conversions.  
> 
> Yeah, that's on the list. But IMHO it's outside the scope of this
> patch.

I've mean, that there is no need to be to aggressive in the type
conversion. You'll never know if Tcl code would actually access some
tuple elements or not. Let's leave conversion to the time of actual use
of values.
> 
> > Thanks for your effort.  
> 
> Thanks for the review!

Please, send updated patch to the list in this thread, so it would
appear in the commitfest and I can mark your patch as "ready for
committer".







--                                   Victor Wagner <vitus@wagner.pp.ru>



Re: Convert pltcl from strings to objects

От
Jim Nasby
Дата:
On 2/23/16 6:04 AM, Victor Wagner wrote:
> On Mon, 22 Feb 2016 17:57:36 -0600
> Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:
>
>> On 2/18/16 6:26 AM, Victor Wagner wrote:
>> (BTW, I also disagree about using a Tcl list for prodesc. That's an
>> object in a *Postgres* hash table; as such it needs to be managed by
>> Postgres, not Tcl. AFAIK the Postgres ResourceOwner stuff would be
>> the correct way to do that (but I'm not exactly an expert on that
>> area).
>
> As far as I understand ResourceOwner is completely different garbage
> collection scheme, than reference counting. ResourceOwner mechanism lets
> free everything associated with transaction when transaction is over.

Ahh, right.

> Here we have another case. prodesc is a global thing. And it is shared
> between different operations. Problem was that there is no partcular
> owner, and we have to wait when last operation which deals with it
> would finish. It looks like perfect job for reference counting.

I've just tried to wrap my head around what's going on with prodesc and
failed... specifically, I don't understand this claim in the comment:

* Add the proc description block to the hashtable.  Note we do not
* attempt to free any previously existing prodesc block.  !!This is
* annoying, but necessary since there could be active calls using
* the old prodesc.!!

What else could be referencing it? I realize it's stored in
pltcl_proc_htab, but AFAICT that's backend-local. So I don't understand
what else could be referencing it.

I suspect this code predates conveniences that have been added to
Postgres over the years and that the whole way this stuff is being
cached should be overhauled to do whatever plpgsql does now...

> Actually, it doesn't matter much, because it wastes some memory only on
> procedure change, and typically in the heavy loaded production
> environments server side code doesn't change too frequently.

... but this is the reason it's maybe not a big priority. Though,
prodesc does appear to be a few hundred bytes, and this is something
that will affect *every* backend that has executed a procedure that then
gets modified.


> Moreover. errorcode and errorinfo information travel behind the scenes
> and cannot break any code which doesn't try to explicitely access it.

Yeah, Karl will be looking into this as part of a separate patch.


> I've mean, that there is no need to be to aggressive in the type
> conversion. You'll never know if Tcl code would actually access some
> tuple elements or not. Let's leave conversion to the time of actual use
> of values.

Will also be looked at as part of a separate patch.

> Please, send updated patch to the list in this thread, so it would
> appear in the commitfest and I can mark your patch as "ready for
> committer".

Done!
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Вложения

Re: Convert pltcl from strings to objects

От
Victor Wagner
Дата:
On Tue, 23 Feb 2016 17:14:26 -0600
Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:

> On 2/23/16 6:04 AM, Victor Wagner wrote:

> 
> > Please, send updated patch to the list in this thread, so it would
> > appear in the commitfest and I can mark your patch as "ready for
> > committer".  
> 
> Done!

Nice job. I've marking the patch as "Ready for committer".




Re: Convert pltcl from strings to objects

От
Alvaro Herrera
Дата:
Jim Nasby wrote:

> >Here we have another case. prodesc is a global thing. And it is shared
> >between different operations. Problem was that there is no partcular
> >owner, and we have to wait when last operation which deals with it
> >would finish. It looks like perfect job for reference counting.
> 
> I've just tried to wrap my head around what's going on with prodesc and
> failed... specifically, I don't understand this claim in the comment:
> 
> * Add the proc description block to the hashtable.  Note we do not
> * attempt to free any previously existing prodesc block.  !!This is
> * annoying, but necessary since there could be active calls using
> * the old prodesc.!!
> 
> What else could be referencing it? I realize it's stored in pltcl_proc_htab,
> but AFAICT that's backend-local. So I don't understand what else could be
> referencing it.

Try to open a cursor that uses the function, fetch a few tuples from it;
then change the function and fetch more rows from the cursor.  I suppose
the open cursor could contain a reference to the function's prodesc.

Refcounting the prodesc would let it live until the cursor's closed,
then free it.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Convert pltcl from strings to objects

От
Jim Nasby
Дата:
On 2/25/16 9:30 AM, Alvaro Herrera wrote:
> Jim Nasby wrote:
>
>>> Here we have another case. prodesc is a global thing. And it is shared
>>> between different operations. Problem was that there is no partcular
>>> owner, and we have to wait when last operation which deals with it
>>> would finish. It looks like perfect job for reference counting.
>>
>> I've just tried to wrap my head around what's going on with prodesc and
>> failed... specifically, I don't understand this claim in the comment:
>>
>> * Add the proc description block to the hashtable.  Note we do not
>> * attempt to free any previously existing prodesc block.  !!This is
>> * annoying, but necessary since there could be active calls using
>> * the old prodesc.!!
>>
>> What else could be referencing it? I realize it's stored in pltcl_proc_htab,
>> but AFAICT that's backend-local. So I don't understand what else could be
>> referencing it.
>
> Try to open a cursor that uses the function, fetch a few tuples from it;
> then change the function and fetch more rows from the cursor.  I suppose
> the open cursor could contain a reference to the function's prodesc.
>
> Refcounting the prodesc would let it live until the cursor's closed,
> then free it.

Hadn't thought about cursors; I suspect you're right about that. I 
wounder if other PLs would handle that correctly.

I'm also not sure how the reference would get decremented... via 
ResourceOwner somehow?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Convert pltcl from strings to objects

От
Tom Lane
Дата:
Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> On 2/25/16 9:30 AM, Alvaro Herrera wrote:
>> Refcounting the prodesc would let it live until the cursor's closed,
>> then free it.

> I'm also not sure how the reference would get decremented... via 
> ResourceOwner somehow?

plpgsql already has a similar mechanism (see PLpgSQL_function.use_count)
which you could probably copy.  But I'd advise that this is a separate
matter to be addressed in a separate patch; it has little to do with the
nominal subject matter of this patch.
        regards, tom lane



Re: Convert pltcl from strings to objects

От
Jim Nasby
Дата:
On 2/29/16 9:57 AM, Tom Lane wrote:
> plpgsql already has a similar mechanism (see PLpgSQL_function.use_count)
> which you could probably copy.  But I'd advise that this is a separate
> matter to be addressed in a separate patch; it has little to do with the
> nominal subject matter of this patch.

Ahh, thanks for pointing that out.

Completely agree on it being a separate patch. Flight Aware is a big 
pltcl user as well as a contributor to the TCL community, so there's 
several more patches in the works. This would be one of them.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Convert pltcl from strings to objects

От
Tom Lane
Дата:
Victor Wagner <vitus@wagner.pp.ru> writes:
> On Mon, 22 Feb 2016 17:57:36 -0600
> Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:
>> Is there any backwards compatibility risk to these changes? Could
>> having that new info break someone's existing code?

> I don't think so. ErrorCode and ErrorInfo mechanisms present in the Tcl
> for ages. As far as I remember, it predates Tcl_Obj API. Unfortunately
> I've lost my copy of 2-nd edition of "Practical Programming on Tcl/Tk"
> and have only 4-th edition handy, so it's quite hard to dig out Tcl 7.6
> docs. 

I've got some dead-tree Tcl 7.3 documentation that describes those
functions, so for sure we can rely on them if we're going to rely on
Tcl objects.


Speaking of which, I wonder if this isn't a good time to move the
goalposts a little further in terms of which Tcl versions we support.
Tcl 8.4 was released in 2002; does anybody really think that someone
would try to use Postgres 9.6+ with a Tcl older than that?  If we
just said "8.4 is the minimum", we could get rid of a number of #if's
in pltcl.c.  I also note the comment in front of one of those #if's:

/** Hack to override Tcl's builtin Notifier subsystem.  This prevents the* backend from becoming multithreaded, which
breaksall sorts of things.* ...* We can only fix this with Tcl >= 8.4, when Tcl_SetNotifier() appeared.*/
 

In other words, if you run PG with a pre-8.4 version of Tcl, you're at
very serious risk of breakage.

Also, AFAICT we have no buildfarm members testing anything older than
8.4, so it's pretty hypothetical whether it'd even still work at all.

So I propose that we remove those #if's in favor of something like this
near the top:

#if !HAVE_TCL_VERSION(8,4)
#error PostgreSQL only supports Tcl 8.4 or later.
#endif

If we don't do that, I'm at least going to put in a similar #error for
Tcl 8.0; but I really think we ought to just say 8.4 is the minimum.
        regards, tom lane



Re: Convert pltcl from strings to objects

От
Tom Lane
Дата:
Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> [ pltcl_objects_2.patch ]

I've pushed this with some minor fixes, as well as the followup work
mentioned in this thread.
        regards, tom lane



Re: Convert pltcl from strings to objects

От
Jim Nasby
Дата:
On 3/2/16 12:32 PM, Tom Lane wrote:
> Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
>> [ pltcl_objects_2.patch ]
>
> I've pushed this with some minor fixes, as well as the followup work
> mentioned in this thread.

Awesome, thanks!

I've asked Karl's opinion on increasing the minimum TCL version, but I 
suspect that won't be an issue.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Convert pltcl from strings to objects

От
Jim Nasby
Дата:
On 3/1/16 5:06 PM, Tom Lane wrote:
> If we don't do that, I'm at least going to put in a similar #error for
> Tcl 8.0; but I really think we ought to just say 8.4 is the minimum.

Just confirmed that should be completely reasonable. I'll take a look at 
it in a few days if you don't beat me to it.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Convert pltcl from strings to objects

От
Tom Lane
Дата:
Jim Nasby <Jim.Nasby@bluetreble.com> writes:
> On 3/1/16 5:06 PM, Tom Lane wrote:
>> If we don't do that, I'm at least going to put in a similar #error for
>> Tcl 8.0; but I really think we ought to just say 8.4 is the minimum.

> Just confirmed that should be completely reasonable. I'll take a look at 
> it in a few days if you don't beat me to it.

Done already.
        regards, tom lane