Re: Caching Python modules

Поиск
Список
Период
Сортировка
От PostgreSQL - Hans-Jürgen Schönig
Тема Re: Caching Python modules
Дата
Msg-id 4854F53C-381A-4D22-B60E-18C997C43A85@cybertec.at
обсуждение исходный текст
Ответ на Re: Caching Python modules  (Jan Urbański <wulczer@wulczer.org>)
Список pgsql-hackers
On Aug 17, 2011, at 2:19 PM, Jan Urbański wrote:

> On 17/08/11 14:09, PostgreSQL - Hans-Jürgen Schönig wrote:
>> CREATE OR REPLACE FUNCTION textprocess.add_to_corpus(lang text, t text) RETURNS float4 AS $$
>>
>>        from SecondCorpus import SecondCorpus
>>        from SecondDocument import SecondDocument
>>
>> i am doing some intense text mining here.
>> the problem is: is it possible to cache those imported modules from function to function call.
>> GD works nicely for variables but can this actually be done with imported modules as well?
>> the import takes around 95% of the total time so it is definitely something which should go away somehow.
>> i have checked the docs but i am not more clever now.
>
> After a module is imported in a backend, it stays in the interpreter's
> sys.modules dictionary and importing it again will not cause the module
> Python code to be executed.
>
> As long as you are using the same backend you should be able to call
> add_to_corpus repeatedly and the import statements should take a long
> time only the first time you call them.
>
> This simple test demonstrates it:
>
> $ cat /tmp/slow.py
> import time
> time.sleep(5)
>
> $ PYTHONPATH=/tmp/ bin/postgres -p 5433 -D data/
> LOG:  database system was shut down at 2011-08-17 14:16:18 CEST
> LOG:  database system is ready to accept connections
>
> $ bin/psql -p 5433 postgres
> Timing is on.
> psql (9.2devel)
> Type "help" for help.
>
> postgres=# select slow();
> slow
> ------
>
> (1 row)
>
> Time: 5032.835 ms
> postgres=# select slow();
> slow
> ------
>
> (1 row)
>
> Time: 1.051 ms
>
> Cheers,
> Jan




hello jan …

the code is actually like this …
the first function is called once per backend. it compiles some fairly fat in memory stuff …
this takes around 2 secs or so … but this is fine and not an issue.

-- setup the environment
CREATE OR REPLACE FUNCTION textprocess.setup_sentiment(pypath text, lang text) RETURNS void AS $$       import sys
sys.path.append(pypath)       sys.path.append(pypath + "/external") 
       from SecondCorpus import SecondCorpus       import const
       GD['path_to_classes'] = pypath       GD['corpus'] = SecondCorpus(lang)       GD['lang'] = lang
       return;
$$ LANGUAGE 'plpythonu' STABLE;

this is called more frequently ...

-- add a document to the corpus
CREATE OR REPLACE FUNCTION textprocess.add_to_corpus(lang text, t text) RETURNS float4 AS $$
       from SecondCorpus import SecondCorpus       from SecondDocument import SecondDocument
       doc1 = SecondDocument(GD['corpus'].senti_provider, lang, t)       doc1.create_sentences()
GD['corpus'].add_document(doc1)      GD['corpus'].process()       return doc1.total_score 
$$ LANGUAGE 'plpythonu' STABLE;

the point here actually is: if i use the classes in a normal python command line program this routine does not look
likean issue 
creating the document object and doing the magic in there is not a problem actually …

on the SQL side this is already fairly heavy for some reason ...
funcid | schemaname  |    funcname     | calls | total_time | self_time | ?column?
--------+-------------+-----------------+-------+------------+-----------+----------235287 | textprocess |
setup_sentiment|    54 |     100166 |    100166 |     1854235288 | textprocess | add_to_corpus   |   996 |     438909 |
  438909 |      440 

looks like some afternoon with some more low level tools :(.
many thanks,
    hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Online base backup from the hot-standby
Следующее
От: Ashesh Vashi
Дата:
Сообщение: PATCH: Compiling PostgreSQL using ActiveState Python 3.2