Обсуждение: is there a way to make this more efficient

Поиск
Список
Период
Сортировка

is there a way to make this more efficient

От
Dan Sawyer
Дата:
Below is a snip it of python/psycopg2 code. It is inefficient when
compared with parallel logic that creates a file and then updates the
table in postgres sql from the file. In a test data base it takes 24
seconds to update 100,000 records. The actual database is over 5,000,000
records. The data base is on a solid state drive, I would imagine it
would be very inefficient on a hard drive.

# init lines
     conn_string = "host='localhost' dbname='opace0421' user='dan'
port=5432"
     print ("Connecting to database\n    ->%s" % (conn_string))
     conn = psycopg2.connect(conn_string)
     cursori = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
     cursoro = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
     work_mem = 2048
     cursori.execute('SET work_mem TO %s', (work_mem,))
     cursori.execute('select address_2, row_num from opace')
     i = 1
     while i != 100000:
         records = cursori.fetchone()
         record = records['address_2']
         rn = str(records['row_num'])

#python code to create replacement string s

         cursoro.execute("UPDATE opace SET p_norm_add = %s WHERE row_num
= %s", (s, rn,))
         i = i+1
     conn.commit()



Re: is there a way to make this more efficient

От
Rory Campbell-Lange
Дата:
You could:

1. cut out the transaction and continue along the same lines
2. select into a temporary table and calculate 's' at the same time
   then update using the temporary table
3. do an inline update to simply use an SQL statement to update opace
   directly

Since 2 and 3 one can do directly in psql you don't need to do much,
other than wait; assuming 's' isn't complicated to calculate.

By the way the namedtuple cursor
(http://initd.org/psycopg/docs/extras.html#namedtuple-cursor)
I find much more convenient, and allows one to do something like
    record = records.address2
-- although I find your variable naming a bit confusing!

Rory

On 09/06/15, Dan Sawyer (dansawyer@earthlink.net) wrote:
> Below is a snip it of python/psycopg2 code. It is inefficient when compared
> with parallel logic that creates a file and then updates the table in
> postgres sql from the file. In a test data base it takes 24 seconds to
> update 100,000 records. The actual database is over 5,000,000 records. The
> data base is on a solid state drive, I would imagine it would be very
> inefficient on a hard drive.
>
> # init lines
>     conn_string = "host='localhost' dbname='opace0421' user='dan' port=5432"
>     print ("Connecting to database\n    ->%s" % (conn_string))
>     conn = psycopg2.connect(conn_string)
>     cursori = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
>     cursoro = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
>     work_mem = 2048
>     cursori.execute('SET work_mem TO %s', (work_mem,))
>     cursori.execute('select address_2, row_num from opace')
>     i = 1
>     while i != 100000:
>         records = cursori.fetchone()
>         record = records['address_2']
>         rn = str(records['row_num'])
>
> #python code to create replacement string s
>
>         cursoro.execute("UPDATE opace SET p_norm_add = %s WHERE row_num =
> %s", (s, rn,))
>         i = i+1
>     conn.commit()
>
>
>
> --
> Sent via psycopg mailing list (psycopg@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/psycopg


Re: is there a way to make this more efficient

От
Adrian Klaver
Дата:
On 06/09/2015 08:08 AM, Dan Sawyer wrote:
> Below is a snip it of python/psycopg2 code. It is inefficient when
> compared with parallel logic that creates a file and then updates the
> table in postgres sql from the file. In a test data base it takes 24
> seconds to update 100,000 records. The actual database is over 5,000,000
> records. The data base is on a solid state drive, I would imagine it
> would be very inefficient on a hard drive.
>
> # init lines
>      conn_string = "host='localhost' dbname='opace0421' user='dan'
> port=5432"
>      print ("Connecting to database\n    ->%s" % (conn_string))
>      conn = psycopg2.connect(conn_string)
>      cursori = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
>      cursoro = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
>      work_mem = 2048
>      cursori.execute('SET work_mem TO %s', (work_mem,))
>      cursori.execute('select address_2, row_num from opace')
>      i = 1
>      while i != 100000:
>          records = cursori.fetchone()
>          record = records['address_2']
>          rn = str(records['row_num'])
>
> #python code to create replacement string s
>
>          cursoro.execute("UPDATE opace SET p_norm_add = %s WHERE row_num
> = %s", (s, rn,))

Build a list of dicts with s and rn and then use executemany():

http://initd.org/psycopg/docs/cursor.html

This separates the Python manipulations from the database operation.

>          i = i+1
>      conn.commit()
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com