Re: plpython SPI cursors

Поиск
Список
Период
Сортировка
От Steve Singer
Тема Re: plpython SPI cursors
Дата
Msg-id BLU0-SMTP4530C1907601DFB9AEAE778ECA0@phx.gbl
обсуждение исходный текст
Ответ на plpython SPI cursors  (Jan Urbański <wulczer@wulczer.org>)
Ответы Re: plpython SPI cursors  (Jan Urbański <wulczer@wulczer.org>)
Re: plpython SPI cursors  (Jan Urbański <wulczer@wulczer.org>)
Список pgsql-hackers
On 11-10-15 07:28 PM, Jan Urbański wrote:
> Hi,
>
> attached is a patch implementing the usage of SPI cursors in PL/Python.
> Currently when trying to process a large table in PL/Python you have
> slurp it all into memory (that's what plpy.execute does).
>
> J

I found a few bugs  (see my testing section below) that will need fixing 
+ a few questions about the code


Overview & Feature Review
-----------
This patch adds cursor support to plpython.  SPI cursors will allow
a plpython function to read process a large results set without having to
read it all into memory at once.  This is a good thing.  Without this
patch I think you could accomplish the same with using SQL DECLARE CURSOR
and SQL fetch.    This feature allows you to use a python cursor as
an iterator resulting in much cleaner python code than the SQL FETCH
approach.   I think the feature is worth having


Usability Review
---------------------- The patch adds the user methods
cursor=plpy.cursor(query_or_plan)
cursor.fetch(100)
cursor.close()

Do we like the name plpy.cursor or would we rather call it something like
plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...)
Since we will be mostly stuck with the API once we release 9.2 this is worth
some opinions on.  I like cursor() but if anyone disagrees now is the time.

This patch does not provide a wrapper around SPI_cursor_move.  The patch
is useful without that and I don't see anything that preculdes someone else
adding that later if they see a need.


Documentation Review
---------------------

The patch includes documentation updates that describes the new feature.
The Database Access page doesn't provide a API style list of database access
functions like the plperl 
http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page 
does.  I think the organization of the perl page is
clearer than the python one and we should think about a doing some
documentaiton refactoring.  That should be done as a seperate patch and
shouldn't be a barrier to committing this one.




Code Review
----------------

in PLy_cursor_plan line 4080
+     PG_TRY();
+     {
+         Portal        portal;
+         char       *volatile nulls;
+         volatile int j;
+
+         if (nargs > 0)
+             nulls = palloc(nargs * sizeof(char));
+         else
+             nulls = NULL;
+
+         for (j = 0; j < nargs; j++)
+         {
+             PyObject   *elem;
I am probably not seeing a code path or misunderstanding something
about the setjmp/longjump usages but I don't see why nulls and j need to be
volatile here.

line 444 PLy_cursor(PyObject *self, PyObject *args)
+ {
+     char            *query;
+     PyObject        *plan;
+     PyObject           *planargs = NULL;
+
+     if (PyArg_ParseTuple(args, "s", &query))
+         return PLy_cursor_query(query);
+

Should query be freed with PyMem_free()



Testing
-----------

I tested both python 2.6 and 3 on a Linux system



create or replace function x() returns text as $$
cur=None
try:  with plpy.subtransaction():    cur=plpy.cursor('select generate_series(1,1000)')    rows=cur.fetch(10);
plpy.execute('selectf()')
 

except plpy.SPIError:  rows=cur.fetch(10);  return rows[0]['generate_series']
return 'none'
$$ language plpythonu;
select x();

crashes the backend
test=# select x();
The connection to the server was lost. Attempting reset: LOG:  server 
process (PID 3166) was terminated by signal 11: Segmentation fault

The below test gives me a strange error message:

create or replace function x1() returns text as $$
plan=None
try:  with plpy.subtransaction():    plpy.execute('CREATE TEMP TABLE z AS select generate_series(1,1000)')
plan=plpy.prepare('select* FROM z')    plpy.execute('select * FROM does_not_exist')
 
except plpy.SPIError, e:    cur=plpy.cursor(plan)    rows=cur.fetch(10)    return rows[0]['generate_series']
return '1'
$$ language plpythonu;
select x1();

test=# select x1()
test-# ;
ERROR:  TypeError: Expected sequence of 82187072 arguments, got 0: <NULL>
CONTEXT:  Traceback (most recent call last):      PL/Python function "x1", line 9, in <module>
cur=plpy.cursor(plan)   PL/Python function "x1"
 
STATEMENT:  select x1()


I was expecting an error from the function just a bit more useful one.


Performance
-------------------
I did not do any specific performance testing but I don't see this patch 
as having any impact to performance




В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: testing ProcArrayLock patches
Следующее
От: Jan Urbański
Дата:
Сообщение: Re: plpython SPI cursors