Обсуждение: [PATCH] Fix conversion for Decimal arguments in plpython functions
Hi,
I've got a patch.
This is for a plpython enhancement.
There is an item at the TODO list http://wiki.postgresql.org/wiki/Todo#Server-Side_Languages
"Fix loss of information during conversion of numeric type to Python float"
This patch uses a decimal.Decimal type from Python standard library for the plpthon function numeric argument instead of float.
Patch contains changes in code, documentation and tests.
Most probably there is something wrong, as this is my first Postgres patch :)
thanks,
Szymon
Вложения
On 05/28/2013 04:41 PM, Szymon Guz wrote: > Hi, > I've got a patch. > > This is for a plpython enhancement. > > There is an item at the TODO list > http://wiki.postgresql.org/wiki/Todo#Server-Side_Languages > "Fix loss of information during conversion of numeric type to Python > float" > > This patch uses a decimal.Decimal type from Python standard library > for the plpthon function numeric argument instead of float. > > Patch contains changes in code, documentation and tests. > > Most probably there is something wrong, as this is my first Postgres > patch :) > Thanks for contributing. This patch applies cleanly against master and compiles with warnings plpy_main.c: In function ‘PLy_init_interp’: plpy_main.c:157:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] plpy_main.c:161:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] You can avoid this by moving the declaration of decimal and decimal_dict to be at the top of the function where mainmod is declared. Also in this function you've introduced places where it returns with an error (the PLy_elog(ERROR...) calls before decrementing the reference to mainmod. I think you can decrement the mainmod reference after the call to SetItemString before your changes that import the Decimal module. The patch works as expected, I am able to write python functions that take numerics as arguments and work with them. I can adjust the decimal context precision inside of my function. One concern I have is that this patch makes pl/python functions involving numerics more than 3 times as slow as before. create temp table b(a numeric); insert into b select generate_series(1,10000); create or replace function x(a numeric,b numeric) returns numeric as $$ if a==None: return b return a+b $$ language plpythonu; create aggregate sm(basetype=numeric, sfunc=x,stype=numeric); test=# select sm(a) from b; sm ---------- 50005000 (1 row) Time: 565.650 ms versus before the patch this was taking in the range of 80ms. Would it be faster to call numeric_send instead of numeric_out and then convert the sequence of Int16's to a tuple of digits that can be passed into the Decimal constructor? I think this is worth trying and testing, Documentation ================= Your patched version of the docs say PostgreSQL <type>real</type>, <type>double</type>, and <type>numeric</type> are converted to Python <type>Decimal</type>. This type is imported from<literal>decimal.Decimal</literal>. I don't think this is correct, as far as I can tell your not changing the behaviour for postgresql real and double types, they continue to use floating point. <listitem> <para> PostgreSQL <type>real</type> and <type>double</type>are converted to Python <type>float</type>. </para> </listitem> <listitem> <para> PostgreSQL <type>numeric</type> is converted to Python <type>Decimal</type>. This type is imported from <literal>decimal.Decimal</literal>. </para> </listitem> Maybe? Steve > thanks, > Szymon > > >
On 25 June 2013 05:16, Steve Singer <steve@ssinger.info> wrote:
One concern I have is that this patch makes pl/python functions involving numerics more than 3 times as slow as before.
create temp table b(a numeric);
insert into b select generate_series(1,10000);
create or replace function x(a numeric,b numeric) returns numeric as $$
if a==None:
return b
return a+b
$$ language plpythonu;
create aggregate sm(basetype=numeric, sfunc=x,stype=numeric);
test=# select sm(a) from b;
sm
----------
50005000
(1 row)
Time: 565.650 ms
versus before the patch this was taking in the range of 80ms.
Would it be faster to call numeric_send instead of numeric_out and then convert the sequence of Int16's to a tuple of digits that can be passed into the Decimal constructor? I think this is worth trying and testing,
Hi,
thanks for all the remarks.
I think I cannot do anything about speeding up the code. What I've found so far is:
I cannot use simple fields from NumericVar in my code, so to not waste time on something not sensible, I've tried to found out if using the tuple constructor for decimal.Decimal will be faster. I've changed the function to something like this:
static PyObject *
PLyDecimal_FromNumeric(PLyDatumToOb *arg, Datum d)
{
PyObject *digits = PyTuple_New(4);
PyTuple_SetItem(digits, 0, PyInt_FromLong(1));
PyTuple_SetItem(digits, 1, PyInt_FromLong(4));
PyTuple_SetItem(digits, 2, PyInt_FromLong(1));
PyTuple_SetItem(digits, 3, PyInt_FromLong(4));
PyObject *tuple = PyTuple_New(3);
PyTuple_SetItem(tuple, 0, PyInt_FromLong(1));
PyTuple_SetItem(tuple, 1, digits);
PyTuple_SetItem(tuple, 2, PyInt_FromLong(-3));
value = PyObject_CallFunctionObjArgs(PLy_decimal_ctor_global, tuple, NULL);
return value;
}
Yes, it returns the same value regardless the params. The idea is to call Python code like:
Decimal((0, (1, 4, 1, 4), -3))
which is simply:
Decimal('1.414')
Unfortunately this is not faster. It is as slow as it was with string constructor.
I've checked the speed of decimal.Decimal using pure python. For this I used a simple function, similar to yours:
def x(a, b):
if a is None:
return b
return a + b
I've run the tests using simple ints:
def test():
a = 0
for i in xrange(0, 10000):
a += x(a, i)
for a in xrange(1, 100):
test()
And later I've run the same function, but with converting the arguments to Decimals:
from decimal import Decimal
def x(a, b):
if a is None:
return b
return a + b
def test():
a = 0
for i in xrange(0, 10000):
a += x(Decimal(a), Decimal(i))
for a in xrange(1, 100):
test()
It was run 100 times for decreasing the impact of test initialization.
The results for both files are:
int: 0.697s
decimal: 38.859s
What gives average time for one function call of:
int: 69ms
decimal: 380ms
For me the problem is with slow code at Python's side, the Decimal constructors are pretty slow, and there is nothing I can do with that at the Postgres' side.
I will send patch with fixes later.
thanks,
Szymon
On 25 June 2013 05:16, Steve Singer <steve@ssinger.info> wrote:
Thanks for contributing.On 05/28/2013 04:41 PM, Szymon Guz wrote:Hi,
I've got a patch.
This is for a plpython enhancement.
There is an item at the TODO list http://wiki.postgresql.org/wiki/Todo#Server-Side_Languages
"Fix loss of information during conversion of numeric type to Python float"
This patch uses a decimal.Decimal type from Python standard library for the plpthon function numeric argument instead of float.
Patch contains changes in code, documentation and tests.
Most probably there is something wrong, as this is my first Postgres patch :)
This patch applies cleanly against master and compiles with warnings
plpy_main.c: In function ‘PLy_init_interp’:
plpy_main.c:157:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
plpy_main.c:161:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
You can avoid this by moving the declaration of decimal and decimal_dict to be at the top of the function where mainmod is declared.
Also in this function you've introduced places where it returns with an error (the PLy_elog(ERROR...) calls before decrementing the reference to mainmod. I think you can decrement the mainmod reference after the call to SetItemString before your changes that import the Decimal module.
The patch works as expected, I am able to write python functions that take numerics as arguments and work with them. I can adjust the decimal context precision inside of my function.
One concern I have is that this patch makes pl/python functions involving numerics more than 3 times as slow as before.
create temp table b(a numeric);
insert into b select generate_series(1,10000);
create or replace function x(a numeric,b numeric) returns numeric as $$
if a==None:
return b
return a+b
$$ language plpythonu;
create aggregate sm(basetype=numeric, sfunc=x,stype=numeric);
test=# select sm(a) from b;
sm
----------
50005000
(1 row)
Time: 565.650 ms
versus before the patch this was taking in the range of 80ms.
Would it be faster to call numeric_send instead of numeric_out and then convert the sequence of Int16's to a tuple of digits that can be passed into the Decimal constructor? I think this is worth trying and testing,
Documentation
=================
Your patched version of the docs say
PostgreSQL <type>real</type>, <type>double</type>, and <type>numeric</type> are converted to
Python <type>Decimal</type>. This type is imported from<literal>decimal.Decimal</literal>.
I don't think this is correct, as far as I can tell your not changing the behaviour for postgresql real and double types, they continue to use floating point.
<listitem>
<para>
PostgreSQL <type>real</type> and <type>double</type>are converted to
Python <type>float</type>.
</para>
</listitem>
<listitem>
<para>
PostgreSQL <type>numeric</type> is converted to
Python <type>Decimal</type>. This type is imported from
<literal>decimal.Decimal</literal>.
</para>
</listitem>
Hi,
I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email.
thanks,
Szymon
Вложения
Concerning the efficiency problem, it should be noted that the latest 3.3 release of cpython introduces an "accelerator" for decimal data types, as a C-module. This module was previously available from the Python package index at: https://pypi.python.org/pypi/cdecimal/2.2
It may be overkill to try to include such a dependency, but the performance overhead from using decimal is really mitigated by this implementation.
It may be overkill to try to include such a dependency, but the performance overhead from using decimal is really mitigated by this implementation.
2013/6/25 Szymon Guz <mabewlun@gmail.com>
On 25 June 2013 05:16, Steve Singer <steve@ssinger.info> wrote:Thanks for contributing.On 05/28/2013 04:41 PM, Szymon Guz wrote:Hi,
I've got a patch.
This is for a plpython enhancement.
There is an item at the TODO list http://wiki.postgresql.org/wiki/Todo#Server-Side_Languages
"Fix loss of information during conversion of numeric type to Python float"
This patch uses a decimal.Decimal type from Python standard library for the plpthon function numeric argument instead of float.
Patch contains changes in code, documentation and tests.
Most probably there is something wrong, as this is my first Postgres patch :)
This patch applies cleanly against master and compiles with warnings
plpy_main.c: In function ‘PLy_init_interp’:
plpy_main.c:157:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
plpy_main.c:161:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
You can avoid this by moving the declaration of decimal and decimal_dict to be at the top of the function where mainmod is declared.
Also in this function you've introduced places where it returns with an error (the PLy_elog(ERROR...) calls before decrementing the reference to mainmod. I think you can decrement the mainmod reference after the call to SetItemString before your changes that import the Decimal module.
The patch works as expected, I am able to write python functions that take numerics as arguments and work with them. I can adjust the decimal context precision inside of my function.
One concern I have is that this patch makes pl/python functions involving numerics more than 3 times as slow as before.
create temp table b(a numeric);
insert into b select generate_series(1,10000);
create or replace function x(a numeric,b numeric) returns numeric as $$
if a==None:
return b
return a+b
$$ language plpythonu;
create aggregate sm(basetype=numeric, sfunc=x,stype=numeric);
test=# select sm(a) from b;
sm
----------
50005000
(1 row)
Time: 565.650 ms
versus before the patch this was taking in the range of 80ms.
Would it be faster to call numeric_send instead of numeric_out and then convert the sequence of Int16's to a tuple of digits that can be passed into the Decimal constructor? I think this is worth trying and testing,
Documentation
=================
Your patched version of the docs say
PostgreSQL <type>real</type>, <type>double</type>, and <type>numeric</type> are converted to
Python <type>Decimal</type>. This type is imported from<literal>decimal.Decimal</literal>.
I don't think this is correct, as far as I can tell your not changing the behaviour for postgresql real and double types, they continue to use floating point.
<listitem>
<para>
PostgreSQL <type>real</type> and <type>double</type>are converted to
Python <type>float</type>.
</para>
</listitem>
<listitem>
<para>
PostgreSQL <type>numeric</type> is converted to
Python <type>Decimal</type>. This type is imported from
<literal>decimal.Decimal</literal>.
</para>
</listitem>Hi,I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email.thanks,Szymon
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Well, I really don't like the idea of such a dependency.
However it could be added as configuration option, so you could compile postgres with e.g. --with-cdecimal, and then it would be user dependent.
Maybe it is a good idea for another patch.
Concerning the efficiency problem, it should be noted that the latest 3.3 release of cpython introduces an "accelerator" for decimal data types, as a C-module. This module was previously available from the Python package index at: https://pypi.python.org/pypi/cdecimal/2.2
It may be overkill to try to include such a dependency, but the performance overhead from using decimal is really mitigated by this implementation.2013/6/25 Szymon Guz <mabewlun@gmail.com>--On 25 June 2013 05:16, Steve Singer <steve@ssinger.info> wrote:Thanks for contributing.On 05/28/2013 04:41 PM, Szymon Guz wrote:Hi,
I've got a patch.
This is for a plpython enhancement.
There is an item at the TODO list http://wiki.postgresql.org/wiki/Todo#Server-Side_Languages
"Fix loss of information during conversion of numeric type to Python float"
This patch uses a decimal.Decimal type from Python standard library for the plpthon function numeric argument instead of float.
Patch contains changes in code, documentation and tests.
Most probably there is something wrong, as this is my first Postgres patch :)
This patch applies cleanly against master and compiles with warnings
plpy_main.c: In function ‘PLy_init_interp’:
plpy_main.c:157:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
plpy_main.c:161:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
You can avoid this by moving the declaration of decimal and decimal_dict to be at the top of the function where mainmod is declared.
Also in this function you've introduced places where it returns with an error (the PLy_elog(ERROR...) calls before decrementing the reference to mainmod. I think you can decrement the mainmod reference after the call to SetItemString before your changes that import the Decimal module.
The patch works as expected, I am able to write python functions that take numerics as arguments and work with them. I can adjust the decimal context precision inside of my function.
One concern I have is that this patch makes pl/python functions involving numerics more than 3 times as slow as before.
create temp table b(a numeric);
insert into b select generate_series(1,10000);
create or replace function x(a numeric,b numeric) returns numeric as $$
if a==None:
return b
return a+b
$$ language plpythonu;
create aggregate sm(basetype=numeric, sfunc=x,stype=numeric);
test=# select sm(a) from b;
sm
----------
50005000
(1 row)
Time: 565.650 ms
versus before the patch this was taking in the range of 80ms.
Would it be faster to call numeric_send instead of numeric_out and then convert the sequence of Int16's to a tuple of digits that can be passed into the Decimal constructor? I think this is worth trying and testing,
Documentation
=================
Your patched version of the docs say
PostgreSQL <type>real</type>, <type>double</type>, and <type>numeric</type> are converted to
Python <type>Decimal</type>. This type is imported from<literal>decimal.Decimal</literal>.
I don't think this is correct, as far as I can tell your not changing the behaviour for postgresql real and double types, they continue to use floating point.
<listitem>
<para>
PostgreSQL <type>real</type> and <type>double</type>are converted to
Python <type>float</type>.
</para>
</listitem>
<listitem>
<para>
PostgreSQL <type>numeric</type> is converted to
Python <type>Decimal</type>. This type is imported from
<literal>decimal.Decimal</literal>.
</para>
</listitem>Hi,I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email.thanks,Szymon
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/25/2013 06:42 AM, Szymon Guz wrote: > > Hi, > I've attached a new patch. I've fixed all the problems you've found, > except for the efficiency problem, which has been described in > previous email. > > thanks, > Szymon > This version of the patch addresses the issues I mentioned. Thanks for looking into seeing if the performance issue is with our conversions to strings or inherit with the python decimal type. I guess we (Postgresql) can't do much about it. A runtime switch to use cdecimal if it is available is a good idea, but I agree with you that could be a different patch. One minor thing I noticed in this round, PLy_elog(ERROR, "could not import module 'decimal'"); I think should have "decimal" in double-quotes. I think this patch is ready for a committer to look at it. Steve >
On 26 June 2013 01:40, Steve Singer <steve@ssinger.info> wrote:
On 06/25/2013 06:42 AM, Szymon Guz wrote:
Hi,
I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email.
thanks,
Szymon
This version of the patch addresses the issues I mentioned. Thanks for looking into seeing if the performance issue is with our conversions to strings or inherit with the python decimal type. I guess we (Postgresql) can't do much about it. A runtime switch to use cdecimal if it is available is a good idea, but I agree with you that could be a different patch.
One minor thing I noticed in this round,
PLy_elog(ERROR, "could not import module 'decimal'");
I think should have "decimal" in double-quotes.
I think this patch is ready for a committer to look at it.
Steve
Hi Steve,
thanks for the review.
I was thinking about speeding up the Decimal conversion using the module you wrote about. What about trying to import it, if it fails, than trying to load decimal.Decimal? There will be no warning in logs, just additional information in documentation that it uses this module if it is available?
thanks,
Szymon
The v2 patch does not work for me: regression tests for plpython fail on the plpython_types test: every numeric is converted to None.
It seems the global decimal ctor is not initialized.
Please find two patches, to be applied on top of the v2 patch: one initializes the decimal ctor, the other uses cdecimal when possible.
Using the performance test from steve, on my machine:
- with cdecimal installed: ~84ms
- without cdecimal installed (standard decimal module): ~511ms
It seems the global decimal ctor is not initialized.
Please find two patches, to be applied on top of the v2 patch: one initializes the decimal ctor, the other uses cdecimal when possible.
Using the performance test from steve, on my machine:
- with cdecimal installed: ~84ms
- without cdecimal installed (standard decimal module): ~511ms
2013/6/26 Szymon Guz <mabewlun@gmail.com>
On 26 June 2013 01:40, Steve Singer <steve@ssinger.info> wrote:On 06/25/2013 06:42 AM, Szymon Guz wrote:
Hi,
I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email.
thanks,
Szymon
This version of the patch addresses the issues I mentioned. Thanks for looking into seeing if the performance issue is with our conversions to strings or inherit with the python decimal type. I guess we (Postgresql) can't do much about it. A runtime switch to use cdecimal if it is available is a good idea, but I agree with you that could be a different patch.
One minor thing I noticed in this round,
PLy_elog(ERROR, "could not import module 'decimal'");
I think should have "decimal" in double-quotes.
I think this patch is ready for a committer to look at it.
SteveHi Steve,thanks for the review.I was thinking about speeding up the Decimal conversion using the module you wrote about. What about trying to import it, if it fails, than trying to load decimal.Decimal? There will be no warning in logs, just additional information in documentation that it uses this module if it is available?thanks,Szymon
Вложения
Thanks Steve, that's exactly what I wanted to send when you sent your patches :)
I need to figure out why that patch v2 worked for me, I think I made mess somewhere in my git repo and didn't create the patch properly. Sorry for that.
Patch is attached, I've also added information about cdecimal to plpython documentation.
I'm just wondering how to make integration tests to check when cdecimal is installed and when it is not.
thanks,
Szymon
On 26 June 2013 10:12, Ronan Dunklau <rdunklau@gmail.com> wrote:
The v2 patch does not work for me: regression tests for plpython fail on the plpython_types test: every numeric is converted to None.
It seems the global decimal ctor is not initialized.
Please find two patches, to be applied on top of the v2 patch: one initializes the decimal ctor, the other uses cdecimal when possible.
Using the performance test from steve, on my machine:
- with cdecimal installed: ~84ms
- without cdecimal installed (standard decimal module): ~511ms2013/6/26 Szymon Guz <mabewlun@gmail.com>On 26 June 2013 01:40, Steve Singer <steve@ssinger.info> wrote:On 06/25/2013 06:42 AM, Szymon Guz wrote:
Hi,
I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email.
thanks,
Szymon
This version of the patch addresses the issues I mentioned. Thanks for looking into seeing if the performance issue is with our conversions to strings or inherit with the python decimal type. I guess we (Postgresql) can't do much about it. A runtime switch to use cdecimal if it is available is a good idea, but I agree with you that could be a different patch.
One minor thing I noticed in this round,
PLy_elog(ERROR, "could not import module 'decimal'");
I think should have "decimal" in double-quotes.
I think this patch is ready for a committer to look at it.
SteveHi Steve,thanks for the review.I was thinking about speeding up the Decimal conversion using the module you wrote about. What about trying to import it, if it fails, than trying to load decimal.Decimal? There will be no warning in logs, just additional information in documentation that it uses this module if it is available?thanks,Szymon
Вложения
You had a great idea, the time with cdecimal is really great, the difference on my machine is 64 ms vs 430 ms.
Szymon
It seems like you confused me with steve :)
The patch applies cleanly, and the regression tests pass on python2 when cdecimal is not installed. When it is, the type info returned for the converted numeric value is cdecimal.Decimal instead of decimal.Decimal.
The regression tests expected output have not been modified for python3, and as such they fail on the type conversions.
I am a bit confused with the use of PyModule_GetDict: shouldn't PyObj_GetAttrString be used directly instead ? Moreover, the reference count in the current implementation might be off: the reference count for the decimal module is never decreased, while the reference count to the module dict is, when the docs say it returns a borrowed reference.
Please find a patch that fixes both issues.
The patch applies cleanly, and the regression tests pass on python2 when cdecimal is not installed. When it is, the type info returned for the converted numeric value is cdecimal.Decimal instead of decimal.Decimal.
The regression tests expected output have not been modified for python3, and as such they fail on the type conversions.
I am a bit confused with the use of PyModule_GetDict: shouldn't PyObj_GetAttrString be used directly instead ? Moreover, the reference count in the current implementation might be off: the reference count for the decimal module is never decreased, while the reference count to the module dict is, when the docs say it returns a borrowed reference.
Please find a patch that fixes both issues.
2013/6/26 Szymon Guz <mabewlun@gmail.com>
Thanks Steve, that's exactly what I wanted to send when you sent your patches :)I need to figure out why that patch v2 worked for me, I think I made mess somewhere in my git repo and didn't create the patch properly. Sorry for that.Patch is attached, I've also added information about cdecimal to plpython documentation.I'm just wondering how to make integration tests to check when cdecimal is installed and when it is not.thanks,SzymonOn 26 June 2013 10:12, Ronan Dunklau <rdunklau@gmail.com> wrote:The v2 patch does not work for me: regression tests for plpython fail on the plpython_types test: every numeric is converted to None.
It seems the global decimal ctor is not initialized.
Please find two patches, to be applied on top of the v2 patch: one initializes the decimal ctor, the other uses cdecimal when possible.
Using the performance test from steve, on my machine:
- with cdecimal installed: ~84ms
- without cdecimal installed (standard decimal module): ~511ms2013/6/26 Szymon Guz <mabewlun@gmail.com>On 26 June 2013 01:40, Steve Singer <steve@ssinger.info> wrote:On 06/25/2013 06:42 AM, Szymon Guz wrote:
Hi,
I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email.
thanks,
Szymon
This version of the patch addresses the issues I mentioned. Thanks for looking into seeing if the performance issue is with our conversions to strings or inherit with the python decimal type. I guess we (Postgresql) can't do much about it. A runtime switch to use cdecimal if it is available is a good idea, but I agree with you that could be a different patch.
One minor thing I noticed in this round,
PLy_elog(ERROR, "could not import module 'decimal'");
I think should have "decimal" in double-quotes.
I think this patch is ready for a committer to look at it.
SteveHi Steve,thanks for the review.I was thinking about speeding up the Decimal conversion using the module you wrote about. What about trying to import it, if it fails, than trying to load decimal.Decimal? There will be no warning in logs, just additional information in documentation that it uses this module if it is available?thanks,Szymon
Вложения
On 26 June 2013 12:04, Ronan Dunklau <rdunklau@gmail.com> wrote:
It seems like you confused me with steve :)
Hi Ronan,
Oh, yes. I'm sorry for that :)
The patch applies cleanly, and the regression tests pass on python2 when cdecimal is not installed. When it is, the type info returned for the converted numeric value is cdecimal.Decimal instead of decimal.Decimal.
The regression tests expected output have not been modified for python3, and as such they fail on the type conversions.
I am a bit confused with the use of PyModule_GetDict: shouldn't PyObj_GetAttrString be used directly instead ? Moreover, the reference count in the current implementation might be off: the reference count for the decimal module is never decreased, while the reference count to the module dict is, when the docs say it returns a borrowed reference.
Please find a patch that fixes both issues.
Thanks for the patch. I assume you generated that from clean trunk, and it includes all the changes (mine and yours) right?
I've checked the patch, everything looks great.
I've attached it to this email with changed name, just for consistent naming in commitfest app.
thanks,
Szymon
Вложения
On 6/26/13 7:03 AM, Szymon Guz wrote: > I've checked the patch, everything looks great. > I've attached it to this email with changed name, just for consistent > naming in commitfest app. Could the setup of the decimal.Decimal constructor be moved into PLyDecimal_FromNumeric() and kept in a static pointer? I'd rather not clutter up the main initialization routine.
<div dir="ltr">On 26 June 2013 21:59, Peter Eisentraut <span dir="ltr"><<a href="mailto:peter_e@gmx.net" target="_blank">peter_e@gmx.net</a>></span>wrote:<br /><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 6/26/13 7:03AM, Szymon Guz wrote:<br /> > I've checked the patch, everything looks great.<br /> > I've attached it to thisemail with changed name, just for consistent<br /> > naming in commitfest app.<br /><br /></div>Could the setup ofthe decimal.Decimal constructor be moved into<br /> PLyDecimal_FromNumeric() and kept in a static pointer? I'd rathernot<br /> clutter up the main initialization routine.<br /><br /></blockquote></div><br /></div><div class="gmail_extra"style="style">OK, I will.</div></div>
On 26 June 2013 22:08, Szymon Guz <mabewlun@gmail.com> wrote:
On 26 June 2013 21:59, Peter Eisentraut <peter_e@gmx.net> wrote:On 6/26/13 7:03 AM, Szymon Guz wrote:Could the setup of the decimal.Decimal constructor be moved into
> I've checked the patch, everything looks great.
> I've attached it to this email with changed name, just for consistent
> naming in commitfest app.
PLyDecimal_FromNumeric() and kept in a static pointer? I'd rather not
clutter up the main initialization routine.
Attached patch has all changes against trunk code.
There is added a function for conversion from Postgres numeric to Python Decimal. The Decimal type is taken from cdecimal.Decimal, if it is available. It is an external library, quite fast, but may be not available. If it is not available, then decimal.Decimal will be used. It is in standard Python library, however it is rather slow.
The initialization is done in the conversion function, the pointer to a proper Decimal constructor is stored as static variable inside the function and is lazy initialized.
The documentation is updated.
Tests for python 2 and 3 have been added. They work only with standard decimal.Decimal, as the type is printed in the *.out files. I think there is nothing we can do with that now.
regards,
Szymon
Вложения
On 06/26/2013 04:47 PM, Szymon Guz wrote: > > > > > Attached patch has all changes against trunk code. > > There is added a function for conversion from Postgres numeric to > Python Decimal. The Decimal type is taken from cdecimal.Decimal, if it > is available. It is an external library, quite fast, but may be not > available. If it is not available, then decimal.Decimal will be used. > It is in standard Python library, however it is rather slow. > > The initialization is done in the conversion function, the pointer to > a proper Decimal constructor is stored as static variable inside the > function and is lazy initialized. > > The documentation is updated. > I've tested this version with python 2.7 with and without cdecimal and also with 3.3 that has the faster decimal performance. It seems fine. The v5 version of the patch makes only white-space changes to plpy_main.c you should excluded that from the patch if your making a new version (I have done this in the v6 version I'm attaching) > Tests for python 2 and 3 have been added. They work only with standard > decimal.Decimal, as the type is printed in the *.out files. I think > there is nothing we can do with that now. > > I think we should make test_type_conversion_numeric to do something that generates the same output in both cases. ie py.info(str(x)). I downside of having the test fail on installs with cdecimal installed is much greater than any benefit we get by ensuring that the type is really decimal. I've attached a v6 version of the patch that does this, do you agree with my thinking? Steve > regards, > Szymon > >
Вложения
On 27 June 2013 05:21, Steve Singer <steve@ssinger.info> wrote:
On 06/26/2013 04:47 PM, Szymon Guz wrote:I've tested this version with python 2.7 with and without cdecimal and also with 3.3 that has the faster decimal performance. It seems fine.
Attached patch has all changes against trunk code.
There is added a function for conversion from Postgres numeric to Python Decimal. The Decimal type is taken from cdecimal.Decimal, if it is available. It is an external library, quite fast, but may be not available. If it is not available, then decimal.Decimal will be used. It is in standard Python library, however it is rather slow.
The initialization is done in the conversion function, the pointer to a proper Decimal constructor is stored as static variable inside the function and is lazy initialized.
The documentation is updated.
The v5 version of the patch makes only white-space changes to plpy_main.c you should excluded that from the patch if your making a new version (I have done this in the v6 version I'm attaching)I think we should make test_type_conversion_numeric to do something that generates the same output in both cases. ieTests for python 2 and 3 have been added. They work only with standard decimal.Decimal, as the type is printed in the *.out files. I think there is nothing we can do with that now.
py.info(str(x)). I downside of having the test fail on installs with cdecimal installed is much greater than any benefit we get by ensuring that the type is really decimal.
I've attached a v6 version of the patch that does this, do you agree with my thinking?
Hi Steve,
thanks for the changes.
You're idea about common code for decimal and cdecimal is good, however not good enough. I like the idea of common code for decimal and cdecimal. But we need class name, not the value.
I've changed the code from str(x) to x.__class__.__name__ so the function prints class name (which is Decimal for both packages), not the value. We need to have the class name check. The value is returned by the function and is a couple of lines lower in the file.
patch is attached.
thanks,
Szymon
Вложения
On 06/27/2013 05:04 AM, Szymon Guz wrote: > On 27 June 2013 05:21, Steve Singer <steve@ssinger.info > <mailto:steve@ssinger.info>> wrote: > > On 06/26/2013 04:47 PM, Szymon Guz wrote: > > > > > > > Hi Steve, > thanks for the changes. > > You're idea about common code for decimal and cdecimal is good, > however not good enough. I like the idea of common code for decimal > and cdecimal. But we need class name, not the value. > > I've changed the code from str(x) to x.__class__.__name__ so the > function prints class name (which is Decimal for both packages), not > the value. We need to have the class name check. The value is returned > by the function and is a couple of lines lower in the file. > > patch is attached. > I think the value is more important than the name, I want to the tests to make sure that the conversion is actually converting properly. With your method of getting the class name without the module we can have both. The attached patch should print the value and the class name but not the module name. Steve > thanks, > Szymon > > > > > >
Вложения
On 28 June 2013 22:14, Steve Singer <steve@ssinger.info> wrote:
I think the value is more important than the name, I want to the tests to make sure that the conversion is actually converting properly. With your method of getting the class name without the module we can have both.
The attached patch should print the value and the class name but not the module name.
Steve
Hi Steve,
I agree, we can check both. This is quite a nice patch now, I've reviewed it, all tests pass, works as expected. I think it is ready for committing.
thanks,
Szymon
On Fri, Jun 28, 2013 at 5:14 PM, Steve Singer <steve@ssinger.info> wrote: > On 06/27/2013 05:04 AM, Szymon Guz wrote: >> >> On 27 June 2013 05:21, Steve Singer <steve@ssinger.info >> <mailto:steve@ssinger.info>> wrote: >> >> On 06/26/2013 04:47 PM, Szymon Guz wrote: >> >> >> >> >> >> >> Hi Steve, >> thanks for the changes. >> >> You're idea about common code for decimal and cdecimal is good, however >> not good enough. I like the idea of common code for decimal and cdecimal. >> But we need class name, not the value. >> >> I've changed the code from str(x) to x.__class__.__name__ so the function >> prints class name (which is Decimal for both packages), not the value. We >> need to have the class name check. The value is returned by the function and >> is a couple of lines lower in the file. >> >> patch is attached. >> > > I think the value is more important than the name, I want to the tests to > make sure that the conversion is actually converting properly. With your > method of getting the class name without the module we can have both. > > The attached patch should print the value and the class name but not the > module name. Why not forego checking of the type, and instead check the interface? plpy.info(x.as_tuple()) Should do. >>> d = decimal.Decimal((0,(3,1,4),-2)) >>> d.as_tuple() DecimalTuple(sign=0, digits=(3, 1, 4), exponent=-2) >>> d.as_tuple() == (0,(3,1,4),-2) True >>> d = decimal.Decimal("3.14") >>> d.as_tuple() DecimalTuple(sign=0, digits=(3, 1, 4), exponent=-2) >>> d.as_tuple() == (0,(3,1,4),-2) True >>>
On Fri, 2013-06-28 at 22:28 +0200, Szymon Guz wrote: > I agree, we can check both. This is quite a nice patch now, I've reviewed > it, all tests pass, works as expected. I think it is ready for committing. Committed. Very nice to get that finally fixed.
On Fri, 2013-06-28 at 17:29 -0300, Claudio Freire wrote: > Why not forego checking of the type, and instead check the interface? > > plpy.info(x.as_tuple()) > > Should do. > > >>> d = decimal.Decimal((0,(3,1,4),-2)) > >>> d.as_tuple() > DecimalTuple(sign=0, digits=(3, 1, 4), exponent=-2) I think that potentially exposes us to more version differences and such, and it doesn't really add much value in the test output.
On Fri, Jul 5, 2013 at 11:47 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On Fri, 2013-06-28 at 17:29 -0300, Claudio Freire wrote: >> Why not forego checking of the type, and instead check the interface? >> >> plpy.info(x.as_tuple()) >> >> Should do. >> >> >>> d = decimal.Decimal((0,(3,1,4),-2)) >> >>> d.as_tuple() >> DecimalTuple(sign=0, digits=(3, 1, 4), exponent=-2) > > I think that potentially exposes us to more version differences and > such, and it doesn't really add much value in the test output. Why? That interface is documented, and Python guys aren't likely to change it, not even in alternative implementations. Besides, this is the "zen" way of checking. Python uses duck typing, so checking the actual class of stuff is frowned upon. The "Pythonic" way to write tests is to check the expected behavior of returned objects. In this case, that it has an as_tuple that returns the right thing. You could also check other interesting behavior if you want, including its string representation, that it can be converted to float, that two decimals can be operated upon and that they preserve the right amount of precision, etc..
<div dir="ltr"><br /><div class="gmail_extra"><br /><br /><div class="gmail_quote">On 28 June 2013 22:29, Claudio Freire<span dir="ltr"><<a href="mailto:klaussfreire@gmail.com" target="_blank">klaussfreire@gmail.com</a>></span> wrote:<br/><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><divclass="h5">On Fri, Jun 28, 2013 at 5:14 PM, Steve Singer <<a href="mailto:steve@ssinger.info">steve@ssinger.info</a>>wrote:<br /> > On 06/27/2013 05:04 AM, Szymon Guz wrote:<br/> >><br /> >> On 27 June 2013 05:21, Steve Singer <<a href="mailto:steve@ssinger.info">steve@ssinger.info</a><br/> >> <mailto:<a href="mailto:steve@ssinger.info">steve@ssinger.info</a>>>wrote:<br /> >><br /> >> On 06/26/2013 04:47PM, Szymon Guz wrote:<br /> >><br /> >><br /> >><br /> >><br /> >><br /> >><br />>> Hi Steve,<br /> >> thanks for the changes.<br /> >><br /> >> You're idea about common code fordecimal and cdecimal is good, however<br /> >> not good enough. I like the idea of common code for decimal and cdecimal.<br/> >> But we need class name, not the value.<br /> >><br /> >> I've changed the code from str(x)to x.__class__.__name__ so the function<br /> >> prints class name (which is Decimal for both packages), notthe value. We<br /> >> need to have the class name check. The value is returned by the function and<br /> >>is a couple of lines lower in the file.<br /> >><br /> >> patch is attached.<br /> >><br /> ><br/> > I think the value is more important than the name, I want to the tests to<br /> > make sure that the conversionis actually converting properly. With your<br /> > method of getting the class name without the module we canhave both.<br /> ><br /> > The attached patch should print the value and the class name but not the<br /> > modulename.<br /><br /><br /></div></div>Why not forego checking of the type, and instead check the interface?<br /><br /><ahref="http://plpy.info" target="_blank">plpy.info</a>(x.as_tuple())<br /><br /> Should do.<br /><br /> >>> d = decimal.Decimal((0,(3,1,4),-2))<br /> >>> d.as_tuple()<br /> DecimalTuple(sign=0, digits=(3, 1, 4), exponent=-2)<br/> >>> d.as_tuple() == (0,(3,1,4),-2)<br /> True<br /> >>> d = decimal.Decimal("3.14")<br/> >>> d.as_tuple()<br /> DecimalTuple(sign=0, digits=(3, 1, 4), exponent=-2)<br /> >>>d.as_tuple() == (0,(3,1,4),-2)<br /> True<br /> >>><br /></blockquote></div><br /></div><div class="gmail_extra">Yea,however decimal and cdecimal have different outputs:</div><div class="gmail_extra"><br /></div><divclass="gmail_extra"><div class="gmail_extra">For decimal:</div><div class="gmail_extra"><br /></div><div class="gmail_extra">!INFO: DecimalTuple(sign=1, digits=(1, 0, 0), exponent=0)</div><div class="gmail_extra"><br /></div><divclass="gmail_extra">for cdecimal:</div><div class="gmail_extra"><br /></div><div class="gmail_extra">! INFO: DecimalTuple(sign=1, digits=(1, 0, 0), exponent=0L)</div></div></div>