On 06/09/2015 08:08 AM, Dan Sawyer wrote:
> Below is a snip it of python/psycopg2 code. It is inefficient when
> compared with parallel logic that creates a file and then updates the
> table in postgres sql from the file. In a test data base it takes 24
> seconds to update 100,000 records. The actual database is over 5,000,000
> records. The data base is on a solid state drive, I would imagine it
> would be very inefficient on a hard drive.
>
> # init lines
> conn_string = "host='localhost' dbname='opace0421' user='dan'
> port=5432"
> print ("Connecting to database\n ->%s" % (conn_string))
> conn = psycopg2.connect(conn_string)
> cursori = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
> cursoro = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
> work_mem = 2048
> cursori.execute('SET work_mem TO %s', (work_mem,))
> cursori.execute('select address_2, row_num from opace')
> i = 1
> while i != 100000:
> records = cursori.fetchone()
> record = records['address_2']
> rn = str(records['row_num'])
>
> #python code to create replacement string s
>
> cursoro.execute("UPDATE opace SET p_norm_add = %s WHERE row_num
> = %s", (s, rn,))
Build a list of dicts with s and rn and then use executemany():
http://initd.org/psycopg/docs/cursor.html
This separates the Python manipulations from the database operation.
> i = i+1
> conn.commit()
>
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com