Обсуждение: Out of Memory and Configuration Problems (Big Computer)

Поиск

Список

Период

Сортировка

Out of Memory and Configuration Problems (Big Computer)

От

Tom Wilcox

Дата:

28 мая 2010 г., 10:16:46

Hi,

I am fairly new to postgres and I have been using it with Python to develop a set of functions that operate on some moderately large tables (30million rows with 13-15 columns).

I have spent some time messing with the configuration file to get PG to use indexes when planning queries, etc.

At the moment, I have a fairly fundamental, recurring problem which is that a query has ran out of memory. This particular query is a sequential scan UPDATE query that is operating on a table with approximately 27,778,000 rows:

UPDATE tbl SET f1 = COALESCE(f2,'') || ' ' || COALESCE(f3);

ERROR: out of memory
DETAIL: Failed on request of size 36
Run time (est): 6,000,000ms

EXPLAIN shows the query plan as:

Seq Scan on tbl (cost=0.00..2088542.83 rows=59868855 width=128)

I have noticed that the "rows=59xxxxxx" suggests that the plan features 2 x the number of rows in the table. Perhaps I am writing poor SQL?

Can anyone suggest reasons why I might be running out of memory on such a simple query? Is it possible that, because it is executed as a transaction, the changes are not being committed until the query is complete and therefore the whole table is being stored in memory?

Also, can anyone give me any pointers for configuring postgres to use ALL 96GB of RAM in my new machine? I would like to know it was using everything available.. especially when it is possible to load an entire 30m row table into memory! I am currently using the default configuration from standard installation.

Any help/suggestions are very much appreciated.

Cheers,
Tom

Re: Out of Memory and Configuration Problems (Big Computer)

От

Stephen Frost

Дата:

28 мая 2010 г., 10:20:33

* Tom Wilcox (hungrytom@googlemail.com) wrote:
> UPDATE tbl SET f1 = COALESCE(f2,'') || ' ' || COALESCE(f3);
>
> Can anyone suggest reasons why I might be running out of memory on such
> a simple query?

Do you have any triggers on that table?  Or FK's?

    Stephen

Вложения

signature.asc

Re: Out of Memory and Configuration Problems (Big Computer)

От

Bill Moran

Дата:

28 мая 2010 г., 12:34:14

In response to Tom Wilcox <hungrytom@googlemail.com>:

<snip>

> Also, can anyone give me any pointers for configuring postgres to use
> ALL 96GB of RAM in my new machine? I would like to know it was using
> everything available.. especially when it is possible to load an entire
> 30m row table into memory! I am currently using the default
> configuration from standard installation.
>
> Any help/suggestions are very much appreciated.

There is a LOT of information missing from this email, such as PG
version and the OS it's running on.

As a starter, though, the default config is not appropriate for most
workloads, and is certainly not optimal for a machine with 96G of RAM.

Start out by following any of the dozens of PG tuning howtos to get
your basic memory setting set up.  For example:
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Out of Memory and Configuration Problems (Big Computer)

От

Bill Moran

Дата:

28 мая 2010 г., 13:54:58

In response to Tom Wilcox <hungrytom@googlemail.com>:

> In addition, I have discovered that the update query that runs on each row
> of a 27million row table and fails with Out of memory error will work when
> limited to 1million rows in an extremely shorter period of time:
>
> EXPLAIN ANALYZE
>         UPDATE nlpg.match_data SET org = normalise(org) WHERE match_data_id
> < 1000000;
>
> "Index Scan using match_data_pkey1 on match_data  (cost=0.00..3285969.97
> rows=1147685 width=206) (actual time=0.280..18018.601 rows=999999 loops=1)"
> "  Index Cond: (match_data_id < 1000000)"
> "Total runtime: 182732.207 ms"
>
>
> Whereas this fails with Out of Memory:
>
> UPDATE nlpg.match_data SET org = normalise(org) WHERE match_data_id;

You're not liable to get shit for answers if you omit the mailing list from
the conversation, especially since I know almost nothing about tuning
PostgreSQL installed on Windows.

Are there multiple queries having this problem?  The original query didn't
have normalise() in it, and I would be highly suspicious that a custom
function may have a memory leak or other memory-intensive side-effects.
What is the code for that function?

For example, does:
UPDATE nlpg.match_data SET org = org WHERE match_data_id;
finish in a reasonable amount of time or exhibit the same out of memory
problem?

It'd be nice to see a \d on that table ... does it have any triggers or
cascading foreign keys?

And stop

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Out of Memory and Configuration Problems (Big Computer)

От

Tom Wilcox

Дата:

28 мая 2010 г., 15:40:26

Oops. Sorry about that.

I am having this problem with multiple queries however I am confident that a fair number may involve the custom plpython "normalise" function which I have made myself. I didn't think it would be complicated enough to produce a memory problem.. here it is:

-- Normalises common address words (i.e. 'Ground' maps to 'grd')
CREATE OR REPLACE FUNCTION normalise(s text) RETURNS text AS $$
ADDR_FIELD_DELIM = ' '

# Returns distinct list without null or empty elements
def distinct_str(list):
    seen = set()
    return [x for x in list if x not in seen and not seen.add(x) and x!=None and len(x)>0]

# normalise common words in given address string
def normalise(match_data):
    if match_data==None: return ''
    import re
    # Tokenise
    toks = distinct_str(re.split(r'\s', match_data.lower()))
    out = ''
    for tok in toks:
        ## full word replace
        if tok == 'house' : out += 'hse'+ADDR_FIELD_DELIM
        elif tok == 'ground' : out += 'grd'+ADDR_FIELD_DELIM
        elif tok == 'gnd' : out += 'grd'+ADDR_FIELD_DELIM
        elif tok == 'front' : out += 'fnt'+ADDR_FIELD_DELIM
        elif tok == 'floor' : out += 'flr'+ADDR_FIELD_DELIM
        elif tok == 'floors' : out += 'flr'+ADDR_FIELD_DELIM
        elif tok == 'flrs' : out += 'flr'+ADDR_FIELD_DELIM
        elif tok == 'fl' : out += 'flr'+ADDR_FIELD_DELIM
        elif tok == 'basement' : out += 'bst'+ADDR_FIELD_DELIM
        elif tok == 'subbasement' : out += 'sbst'+ADDR_FIELD_DELIM
        elif tok == 'bsmt' : out += 'bst'+ADDR_FIELD_DELIM
        elif tok == 'lbst' : out += 'lower bst'+ADDR_FIELD_DELIM
        elif tok == 'street' : out += 'st'+ADDR_FIELD_DELIM
        elif tok == 'road' : out += 'rd'+ADDR_FIELD_DELIM
        elif tok == 'lane' : out += 'ln'+ADDR_FIELD_DELIM
        elif tok == 'rooms' : out += 'rm'+ADDR_FIELD_DELIM
        elif tok == 'room' : out += 'rm'+ADDR_FIELD_DELIM
        elif tok == 'no' : pass
        elif tok == 'number' : pass
        elif tok == 'and' : out += '&'+ADDR_FIELD_DELIM
        elif tok == 'rear' : out += 'rr'+ADDR_FIELD_DELIM
        elif tok == 'part' : out += 'pt'+ADDR_FIELD_DELIM
        elif tok == 'south' : out += 's'+ADDR_FIELD_DELIM
        elif tok == 'sth' : out += 's'+ADDR_FIELD_DELIM
        elif tok == 'north' : out += 'n'+ADDR_FIELD_DELIM
        elif tok == 'nth' : out += 'n'+ADDR_FIELD_DELIM
        elif tok == 'west' : out += 'w'+ADDR_FIELD_DELIM
        elif tok == 'wst' : out += 'w'+ADDR_FIELD_DELIM
        elif tok == 'east' : out += 'e'+ADDR_FIELD_DELIM
        elif tok == 'est' : out += 'e'+ADDR_FIELD_DELIM
        elif tok == 'first' : out += '1st'+ADDR_FIELD_DELIM
        elif tok == 'second' : out += '2nd'+ADDR_FIELD_DELIM
        elif tok == 'third' : out += '3rd'+ADDR_FIELD_DELIM
        elif tok == 'fourth' : out += '4th'+ADDR_FIELD_DELIM
        elif tok == 'fifth' : out += '5th'+ADDR_FIELD_DELIM
        elif tok == 'sixth' : out += '6th'+ADDR_FIELD_DELIM
        elif tok == 'seventh' : out += '7th'+ADDR_FIELD_DELIM
        elif tok == 'eighth' : out += '8th'+ADDR_FIELD_DELIM
        elif tok == 'ninth' : out += '9th'+ADDR_FIELD_DELIM
        elif tok == 'tenth' : out += '10th'+ADDR_FIELD_DELIM
        elif tok == 'eleventh' : out += '11th'+ADDR_FIELD_DELIM
        elif tok == 'twelfth' : out += '12th'+ADDR_FIELD_DELIM
        elif tok == 'thirteenth' : out += '13th'+ADDR_FIELD_DELIM
        elif tok == 'fourteenth' : out += '14th'+ADDR_FIELD_DELIM
        elif tok == 'fifteenth' : out += '15th'+ADDR_FIELD_DELIM
        elif tok == 'sixteenth' : out += '16th'+ADDR_FIELD_DELIM
        elif tok == 'seventeenth' : out += '17th'+ADDR_FIELD_DELIM
        elif tok == 'eighteenth' : out += '18th'+ADDR_FIELD_DELIM
        elif tok == 'nineteenth' : out += '19th'+ADDR_FIELD_DELIM
        elif tok == 'twentieth' : out += '20th'+ADDR_FIELD_DELIM
        # numbers 0 - 20
        elif tok == 'one' : out += '1'+ADDR_FIELD_DELIM
        elif tok == 'two' : out += '2'+ADDR_FIELD_DELIM
        elif tok == 'three' : out += '3'+ADDR_FIELD_DELIM
        elif tok == 'four' : out += '4'+ADDR_FIELD_DELIM
        elif tok == 'five' : out += '5'+ADDR_FIELD_DELIM
        elif tok == 'six' : out += '6'+ADDR_FIELD_DELIM
        elif tok == 'seven' : out += '7'+ADDR_FIELD_DELIM
        elif tok == 'eight' : out += '8'+ADDR_FIELD_DELIM
        elif tok == 'nine' : out += '9'+ADDR_FIELD_DELIM
        elif tok == 'ten' : out += '10'+ADDR_FIELD_DELIM
        elif tok == 'eleven' : out += '11'+ADDR_FIELD_DELIM
        elif tok == 'twelve' : out += '12'+ADDR_FIELD_DELIM
        elif tok == 'thirteen' : out += '13'+ADDR_FIELD_DELIM
        elif tok == 'fourteen' : out += '14'+ADDR_FIELD_DELIM
        elif tok == 'fifteen' : out += '15'+ADDR_FIELD_DELIM
        elif tok == 'sixteen' : out += '16'+ADDR_FIELD_DELIM
        elif tok == 'seventeen' : out += '17'+ADDR_FIELD_DELIM
        elif tok == 'eighteen' : out += '18'+ADDR_FIELD_DELIM
        elif tok == 'nineteen' : out += '19'+ADDR_FIELD_DELIM
        elif tok == 'twenty' : out += '20'+ADDR_FIELD_DELIM
        # town dictionary items
        elif tok == 'borough' : pass
        elif tok == 'city' : pass
        elif tok == 'of' : pass
        elif tok == 'the' : pass
        # a few extras (from looking at voa)
        elif tok == 'at' : pass
        elif tok == 'incl' : pass
        elif tok == 'inc' : pass
        else: out += tok+ADDR_FIELD_DELIM
    return out

return normalise(s)
$$ LANGUAGE plpythonu;

Here's the create script for the table from pgAdmin (I hope that will be good enough instead of \d as I can't do that right now)..

-- Table: nlpg.match_data

-- DROP TABLE nlpg.match_data;

CREATE TABLE nlpg.match_data
(
premise_id integer,
usrn bigint,
org text,
sao text,
"level" text,
pao text,
"name" text,
street text,
town text,
pc postcode,
postcode text,
match_data_id integer NOT NULL DEFAULT nextval('nlpg.match_data_match_data_id_seq1'::regclass),
addr_str text,
tssearch_name tsvector,
CONSTRAINT match_data_pkey1 PRIMARY KEY (match_data_id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE nlpg.match_data OWNER TO postgres;
ALTER TABLE nlpg.match_data ALTER COLUMN "name" SET STATISTICS 10000;

-- Index: nlpg.index_match_data_mid

-- DROP INDEX nlpg.index_match_data_mid;

CREATE INDEX index_match_data_mid
ON nlpg.match_data
USING btree
(match_data_id);

-- Index: nlpg.index_match_data_pc

-- DROP INDEX nlpg.index_match_data_pc;

CREATE INDEX index_match_data_pc
ON nlpg.match_data
USING btree
(pc);

-- Index: nlpg.index_match_data_pid

-- DROP INDEX nlpg.index_match_data_pid;

CREATE INDEX index_match_data_pid
ON nlpg.match_data
USING btree
(premise_id);

-- Index: nlpg.index_match_data_tssearch_name

-- DROP INDEX nlpg.index_match_data_tssearch_name;

CREATE INDEX index_match_data_tssearch_name
ON nlpg.match_data
USING gin
(tssearch_name);

-- Index: nlpg.index_match_data_usrn

-- DROP INDEX nlpg.index_match_data_usrn;

CREATE INDEX index_match_data_usrn
ON nlpg.match_data
USING btree
(usrn);

As you can see, no FKs or triggers..

I am running: EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;

However, as it should take around 90mins (if it is linear) then I thought I would send this now and follow up with the results once it finishes. (Has taken 2hours so far..)

Thanks very much for your help.

Tom

On 28 May 2010 17:54, "Bill Moran" <wmoran@potentialtech.com> wrote:

In response to Tom Wilcox <hungrytom@googlemail.com>:
> In addition, I have discovered that the update query that runs on each row
> of a 27million row ta...You're not liable to get shit for answers if you omit the mailing list from
the conversation, especially since I know almost nothing about tuning
PostgreSQL installed on Windows.

Are there multiple queries having this problem? The original query didn't
have normalise() in it, and I would be highly suspicious that a custom
function may have a memory leak or other memory-intensive side-effects.
What is the code for that function?

For example, does:
UPDATE nlpg.match_data SET org = org WHERE match_data_id;
finish in a reasonable amount of time or exhibit the same out of memory
problem?

It'd be nice to see a \d on that table ... does it have any triggers or
cascading foreign keys?

And stop

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Out of Memory and Configuration Problems (Big Computer)

От

Alban Hertroys

Дата:

28 мая 2010 г., 20:34:58

On 28 May 2010, at 20:39, Tom Wilcox wrote:

>     out = ''
>     for tok in toks:
>         ## full word replace
>         if tok == 'house' : out += 'hse'+ADDR_FIELD_DELIM
>         elif tok == 'ground' : out += 'grd'+ADDR_FIELD_DELIM
>         elif tok == 'gnd' : out += 'grd'+ADDR_FIELD_DELIM
>         elif tok == 'front' : out += 'fnt'+ADDR_FIELD_DELIM
>         elif tok == 'floor' : out += 'flr'+ADDR_FIELD_DELIM
>         elif tok == 'floors' : out += 'flr'+ADDR_FIELD_DELIM

Not that it would solve your problems, but you can write the above much more elegantly using a dictionary:

# normalize the token
try:
    out += {
        'house'        : 'hse',
        'ground'    : 'grd',
        'gnd'        : 'grd',
        'front'        : 'fnt',
        'floor'        : 'flr',
        ...
    }[tok]
except KeyError:
    out += tok

# add a field delimiter if the token isn't among the exceptions for those
if tok not in ('borough', 'city', 'of', 'the', 'at', 'incl', 'inc'):
    out += ADDR_FIELD_DELIM

You should probably define those lists outside the for-loop though, I'm not sure the Python interpreter is smart enough
todeclare those lists only once otherwise. The concept remains though. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4c00531510211149731783!

Re: Out of Memory and Configuration Problems (Big Computer)

От

Tom Wilcox

Дата:

28 мая 2010 г., 21:47:03

I ran this query:

EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;

And I got this result:

"Seq Scan on match_data (cost=0.00..9762191.68 rows=32205168 width=206) (actual time=76873.592..357450.519 rows=27777961 loops=1)"
"Total runtime: 8028212.367 ms"

On 28 May 2010 19:39, Tom Wilcox <hungrytom@googlemail.com> wrote:

Oops. Sorry about that.

I am having this problem with multiple queries however I am confident that a fair number may involve the custom plpython "normalise" function which I have made myself. I didn't think it would be complicated enough to produce a memory problem.. here it is:

-- Normalises common address words (i.e. 'Ground' maps to 'grd')
CREATE OR REPLACE FUNCTION normalise(s text) RETURNS text AS $$
ADDR_FIELD_DELIM = ' '

# Returns distinct list without null or empty elements
def distinct_str(list):
    seen = set()
    return [x for x in list if x not in seen and not seen.add(x) and x!=None and len(x)>0]

# normalise common words in given address string
def normalise(match_data):
    if match_data==None: return ''
    import re
    # Tokenise
    toks = distinct_str(re.split(r'\s', match_data.lower()))
    out = ''
    for tok in toks:
        ## full word replace
        if tok == 'house' : out += 'hse'+ADDR_FIELD_DELIM
        elif tok == 'ground' : out += 'grd'+ADDR_FIELD_DELIM
        elif tok == 'gnd' : out += 'grd'+ADDR_FIELD_DELIM
        elif tok == 'front' : out += 'fnt'+ADDR_FIELD_DELIM
        elif tok == 'floor' : out += 'flr'+ADDR_FIELD_DELIM
        elif tok == 'floors' : out += 'flr'+ADDR_FIELD_DELIM
        elif tok == 'flrs' : out += 'flr'+ADDR_FIELD_DELIM
        elif tok == 'fl' : out += 'flr'+ADDR_FIELD_DELIM
        elif tok == 'basement' : out += 'bst'+ADDR_FIELD_DELIM
        elif tok == 'subbasement' : out += 'sbst'+ADDR_FIELD_DELIM
        elif tok == 'bsmt' : out += 'bst'+ADDR_FIELD_DELIM
        elif tok == 'lbst' : out += 'lower bst'+ADDR_FIELD_DELIM
        elif tok == 'street' : out += 'st'+ADDR_FIELD_DELIM
        elif tok == 'road' : out += 'rd'+ADDR_FIELD_DELIM
        elif tok == 'lane' : out += 'ln'+ADDR_FIELD_DELIM
        elif tok == 'rooms' : out += 'rm'+ADDR_FIELD_DELIM
        elif tok == 'room' : out += 'rm'+ADDR_FIELD_DELIM
        elif tok == 'no' : pass
        elif tok == 'number' : pass
        elif tok == 'and' : out += '&'+ADDR_FIELD_DELIM
        elif tok == 'rear' : out += 'rr'+ADDR_FIELD_DELIM
        elif tok == 'part' : out += 'pt'+ADDR_FIELD_DELIM
        elif tok == 'south' : out += 's'+ADDR_FIELD_DELIM
        elif tok == 'sth' : out += 's'+ADDR_FIELD_DELIM
        elif tok == 'north' : out += 'n'+ADDR_FIELD_DELIM
        elif tok == 'nth' : out += 'n'+ADDR_FIELD_DELIM
        elif tok == 'west' : out += 'w'+ADDR_FIELD_DELIM
        elif tok == 'wst' : out += 'w'+ADDR_FIELD_DELIM
        elif tok == 'east' : out += 'e'+ADDR_FIELD_DELIM
        elif tok == 'est' : out += 'e'+ADDR_FIELD_DELIM
        elif tok == 'first' : out += '1st'+ADDR_FIELD_DELIM
        elif tok == 'second' : out += '2nd'+ADDR_FIELD_DELIM
        elif tok == 'third' : out += '3rd'+ADDR_FIELD_DELIM
        elif tok == 'fourth' : out += '4th'+ADDR_FIELD_DELIM
        elif tok == 'fifth' : out += '5th'+ADDR_FIELD_DELIM
        elif tok == 'sixth' : out += '6th'+ADDR_FIELD_DELIM
        elif tok == 'seventh' : out += '7th'+ADDR_FIELD_DELIM
        elif tok == 'eighth' : out += '8th'+ADDR_FIELD_DELIM
        elif tok == 'ninth' : out += '9th'+ADDR_FIELD_DELIM
        elif tok == 'tenth' : out += '10th'+ADDR_FIELD_DELIM
        elif tok == 'eleventh' : out += '11th'+ADDR_FIELD_DELIM
        elif tok == 'twelfth' : out += '12th'+ADDR_FIELD_DELIM
        elif tok == 'thirteenth' : out += '13th'+ADDR_FIELD_DELIM
        elif tok == 'fourteenth' : out += '14th'+ADDR_FIELD_DELIM
        elif tok == 'fifteenth' : out += '15th'+ADDR_FIELD_DELIM
        elif tok == 'sixteenth' : out += '16th'+ADDR_FIELD_DELIM
        elif tok == 'seventeenth' : out += '17th'+ADDR_FIELD_DELIM
        elif tok == 'eighteenth' : out += '18th'+ADDR_FIELD_DELIM
        elif tok == 'nineteenth' : out += '19th'+ADDR_FIELD_DELIM
        elif tok == 'twentieth' : out += '20th'+ADDR_FIELD_DELIM
        # numbers 0 - 20
        elif tok == 'one' : out += '1'+ADDR_FIELD_DELIM
        elif tok == 'two' : out += '2'+ADDR_FIELD_DELIM
        elif tok == 'three' : out += '3'+ADDR_FIELD_DELIM
        elif tok == 'four' : out += '4'+ADDR_FIELD_DELIM
        elif tok == 'five' : out += '5'+ADDR_FIELD_DELIM
        elif tok == 'six' : out += '6'+ADDR_FIELD_DELIM
        elif tok == 'seven' : out += '7'+ADDR_FIELD_DELIM
        elif tok == 'eight' : out += '8'+ADDR_FIELD_DELIM
        elif tok == 'nine' : out += '9'+ADDR_FIELD_DELIM
        elif tok == 'ten' : out += '10'+ADDR_FIELD_DELIM
        elif tok == 'eleven' : out += '11'+ADDR_FIELD_DELIM
        elif tok == 'twelve' : out += '12'+ADDR_FIELD_DELIM
        elif tok == 'thirteen' : out += '13'+ADDR_FIELD_DELIM
        elif tok == 'fourteen' : out += '14'+ADDR_FIELD_DELIM
        elif tok == 'fifteen' : out += '15'+ADDR_FIELD_DELIM
        elif tok == 'sixteen' : out += '16'+ADDR_FIELD_DELIM
        elif tok == 'seventeen' : out += '17'+ADDR_FIELD_DELIM
        elif tok == 'eighteen' : out += '18'+ADDR_FIELD_DELIM
        elif tok == 'nineteen' : out += '19'+ADDR_FIELD_DELIM
        elif tok == 'twenty' : out += '20'+ADDR_FIELD_DELIM
        # town dictionary items
        elif tok == 'borough' : pass
        elif tok == 'city' : pass
        elif tok == 'of' : pass
        elif tok == 'the' : pass
        # a few extras (from looking at voa)
        elif tok == 'at' : pass
        elif tok == 'incl' : pass
        elif tok == 'inc' : pass
        else: out += tok+ADDR_FIELD_DELIM
    return out

return normalise(s)
$$ LANGUAGE plpythonu;

Here's the create script for the table from pgAdmin (I hope that will be good enough instead of \d as I can't do that right now)..

-- Table: nlpg.match_data

-- DROP TABLE nlpg.match_data;

CREATE TABLE nlpg.match_data
(
premise_id integer,
usrn bigint,
org text,
sao text,
"level" text,
pao text,
"name" text,
street text,
town text,
pc postcode,
postcode text,
match_data_id integer NOT NULL DEFAULT nextval('nlpg.match_data_match_data_id_seq1'::regclass),
addr_str text,
tssearch_name tsvector,

CONSTRAINT match_data_pkey1 PRIMARY KEY (match_data_id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE nlpg.match_data OWNER TO postgres;
ALTER TABLE nlpg.match_data ALTER COLUMN "name" SET STATISTICS 10000;

-- Index: nlpg.index_match_data_mid

-- DROP INDEX nlpg.index_match_data_mid;

CREATE INDEX index_match_data_mid
ON nlpg.match_data
USING btree
(match_data_id);

-- Index: nlpg.index_match_data_pc

-- DROP INDEX nlpg.index_match_data_pc;

CREATE INDEX index_match_data_pc
ON nlpg.match_data
USING btree
(pc);

-- Index: nlpg.index_match_data_pid

-- DROP INDEX nlpg.index_match_data_pid;

CREATE INDEX index_match_data_pid
ON nlpg.match_data
USING btree
(premise_id);

-- Index: nlpg.index_match_data_tssearch_name

-- DROP INDEX nlpg.index_match_data_tssearch_name;

CREATE INDEX index_match_data_tssearch_name
ON nlpg.match_data
USING gin
(tssearch_name);

-- Index: nlpg.index_match_data_usrn

-- DROP INDEX nlpg.index_match_data_usrn;

CREATE INDEX index_match_data_usrn
ON nlpg.match_data
USING btree
(usrn);

As you can see, no FKs or triggers..

I am running: EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;

However, as it should take around 90mins (if it is linear) then I thought I would send this now and follow up with the results once it finishes. (Has taken 2hours so far..)

Thanks very much for your help.

Tom

On 28 May 2010 17:54, "Bill Moran" <wmoran@potentialtech.com> wrote:

In response to Tom Wilcox <hungrytom@googlemail.com>:
> In addition, I have discovered that the update query that runs on each row
> of a 27million row ta...
You're not liable to get shit for answers if you omit the mailing list from
the conversation, especially since I know almost nothing about tuning
PostgreSQL installed on Windows.

Are there multiple queries having this problem? The original query didn't
have normalise() in it, and I would be highly suspicious that a custom
function may have a memory leak or other memory-intensive side-effects.
What is the code for that function?

For example, does:
UPDATE nlpg.match_data SET org = org WHERE match_data_id;
finish in a reasonable amount of time or exhibit the same out of memory
problem?

It'd be nice to see a \d on that table ... does it have any triggers or
cascading foreign keys?

And stop

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Out of Memory and Configuration Problems (Big Computer)

От

Bill Moran

Дата:

29 мая 2010 г., 14:56:06

On 5/28/10 8:43:48 PM, Tom Wilcox wrote:
> I ran this query:
>
> EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;
>
> And I got this result:
>
> "Seq Scan on match_data  (cost=0.00..9762191.68 rows=32205168 width=206)
> (actual time=76873.592..357450.519 rows=27777961 loops=1)"
> "Total runtime: 8028212.367 ms"

That would seem to indicate that the problem is in your Python
functions.

Some ideas for next steps:
* Perhaps it's just a few rows that have data in them that the
   function has difficulty with.  Add some debugging/logging to
   the function and see if the row it bombs on has anything unusual
   in it (such as a very large text field)
* While large, that function is fairly simplistic.  You may want
   to consider rewriting it as an SQL function, which should be
   more efficient in any event.

>
>
> On 28 May 2010 19:39, Tom Wilcox <hungrytom@googlemail.com
> <mailto:hungrytom@googlemail.com>> wrote:
>
>     Oops. Sorry about that.
>
>     I am having this problem with multiple queries however I am
>     confident that a fair number may involve the custom plpython
>     "normalise" function which I have made myself. I didn't think it
>     would be complicated enough to produce a memory problem.. here it is:
>
>     -- Normalises common address words (i.e. 'Ground' maps to 'grd')
>     CREATE OR REPLACE FUNCTION normalise(s text) RETURNS text AS $$
>     ADDR_FIELD_DELIM = ' '
>
>     # Returns distinct list without null or empty elements
>     def distinct_str(list):
>          seen = set()
>          return [x for x in list if x not in seen and not seen.add(x)
>     and x!=None and len(x)>0]
>
>     # normalise common words in given address string
>     def normalise(match_data):
>          if match_data==None: return ''
>          import re
>          # Tokenise
>          toks = distinct_str(re.split(r'\s', match_data.lower()))
>          out = ''
>          for tok in toks:
>              ## full word replace
>              if tok == 'house' : out += 'hse'+ADDR_FIELD_DELIM
>              elif tok == 'ground' : out += 'grd'+ADDR_FIELD_DELIM
>              elif tok == 'gnd' : out += 'grd'+ADDR_FIELD_DELIM
>              elif tok == 'front' : out += 'fnt'+ADDR_FIELD_DELIM
>              elif tok == 'floor' : out += 'flr'+ADDR_FIELD_DELIM
>              elif tok == 'floors' : out += 'flr'+ADDR_FIELD_DELIM
>              elif tok == 'flrs' : out += 'flr'+ADDR_FIELD_DELIM
>              elif tok == 'fl' : out += 'flr'+ADDR_FIELD_DELIM
>              elif tok == 'basement' : out += 'bst'+ADDR_FIELD_DELIM
>              elif tok == 'subbasement' : out += 'sbst'+ADDR_FIELD_DELIM
>              elif tok == 'bsmt' : out += 'bst'+ADDR_FIELD_DELIM
>              elif tok == 'lbst' : out += 'lower bst'+ADDR_FIELD_DELIM
>              elif tok == 'street' : out += 'st'+ADDR_FIELD_DELIM
>              elif tok == 'road' : out += 'rd'+ADDR_FIELD_DELIM
>              elif tok == 'lane' : out += 'ln'+ADDR_FIELD_DELIM
>              elif tok == 'rooms' : out += 'rm'+ADDR_FIELD_DELIM
>              elif tok == 'room' : out += 'rm'+ADDR_FIELD_DELIM
>              elif tok == 'no' : pass
>              elif tok == 'number' : pass
>              elif tok == 'and' : out += '&'+ADDR_FIELD_DELIM
>              elif tok == 'rear' : out += 'rr'+ADDR_FIELD_DELIM
>              elif tok == 'part' : out += 'pt'+ADDR_FIELD_DELIM
>              elif tok == 'south' : out += 's'+ADDR_FIELD_DELIM
>              elif tok == 'sth' : out += 's'+ADDR_FIELD_DELIM
>              elif tok == 'north' : out += 'n'+ADDR_FIELD_DELIM
>              elif tok == 'nth' : out += 'n'+ADDR_FIELD_DELIM
>              elif tok == 'west' : out += 'w'+ADDR_FIELD_DELIM
>              elif tok == 'wst' : out += 'w'+ADDR_FIELD_DELIM
>              elif tok == 'east' : out += 'e'+ADDR_FIELD_DELIM
>              elif tok == 'est' : out += 'e'+ADDR_FIELD_DELIM
>              elif tok == 'first' : out += '1st'+ADDR_FIELD_DELIM
>              elif tok == 'second' : out += '2nd'+ADDR_FIELD_DELIM
>              elif tok == 'third' : out += '3rd'+ADDR_FIELD_DELIM
>              elif tok == 'fourth' : out += '4th'+ADDR_FIELD_DELIM
>              elif tok == 'fifth' : out += '5th'+ADDR_FIELD_DELIM
>              elif tok == 'sixth' : out += '6th'+ADDR_FIELD_DELIM
>              elif tok == 'seventh' : out += '7th'+ADDR_FIELD_DELIM
>              elif tok == 'eighth' : out += '8th'+ADDR_FIELD_DELIM
>              elif tok == 'ninth' : out += '9th'+ADDR_FIELD_DELIM
>              elif tok == 'tenth' : out += '10th'+ADDR_FIELD_DELIM
>              elif tok == 'eleventh' : out += '11th'+ADDR_FIELD_DELIM
>              elif tok == 'twelfth' : out += '12th'+ADDR_FIELD_DELIM
>              elif tok == 'thirteenth' : out += '13th'+ADDR_FIELD_DELIM
>              elif tok == 'fourteenth' : out += '14th'+ADDR_FIELD_DELIM
>              elif tok == 'fifteenth' : out += '15th'+ADDR_FIELD_DELIM
>              elif tok == 'sixteenth' : out += '16th'+ADDR_FIELD_DELIM
>              elif tok == 'seventeenth' : out += '17th'+ADDR_FIELD_DELIM
>              elif tok == 'eighteenth' : out += '18th'+ADDR_FIELD_DELIM
>              elif tok == 'nineteenth' : out += '19th'+ADDR_FIELD_DELIM
>              elif tok == 'twentieth' : out += '20th'+ADDR_FIELD_DELIM
>              # numbers 0 - 20
>              elif tok == 'one' : out += '1'+ADDR_FIELD_DELIM
>              elif tok == 'two' : out += '2'+ADDR_FIELD_DELIM
>              elif tok == 'three' : out += '3'+ADDR_FIELD_DELIM
>              elif tok == 'four' : out += '4'+ADDR_FIELD_DELIM
>              elif tok == 'five' : out += '5'+ADDR_FIELD_DELIM
>              elif tok == 'six' : out += '6'+ADDR_FIELD_DELIM
>              elif tok == 'seven' : out += '7'+ADDR_FIELD_DELIM
>              elif tok == 'eight' : out += '8'+ADDR_FIELD_DELIM
>              elif tok == 'nine' : out += '9'+ADDR_FIELD_DELIM
>              elif tok == 'ten' : out += '10'+ADDR_FIELD_DELIM
>              elif tok == 'eleven' : out += '11'+ADDR_FIELD_DELIM
>              elif tok == 'twelve' : out += '12'+ADDR_FIELD_DELIM
>              elif tok == 'thirteen' : out += '13'+ADDR_FIELD_DELIM
>              elif tok == 'fourteen' : out += '14'+ADDR_FIELD_DELIM
>              elif tok == 'fifteen' : out += '15'+ADDR_FIELD_DELIM
>              elif tok == 'sixteen' : out += '16'+ADDR_FIELD_DELIM
>              elif tok == 'seventeen' : out += '17'+ADDR_FIELD_DELIM
>              elif tok == 'eighteen' : out += '18'+ADDR_FIELD_DELIM
>              elif tok == 'nineteen' : out += '19'+ADDR_FIELD_DELIM
>              elif tok == 'twenty' : out += '20'+ADDR_FIELD_DELIM
>              # town dictionary items
>              elif tok == 'borough' : pass
>              elif tok == 'city' : pass
>              elif tok == 'of' : pass
>              elif tok == 'the' : pass
>              # a few extras (from looking at voa)
>              elif tok == 'at' : pass
>              elif tok == 'incl' : pass
>              elif tok == 'inc' : pass
>              else: out += tok+ADDR_FIELD_DELIM
>          return out
>
>     return normalise(s)
>     $$ LANGUAGE plpythonu;
>
>
>     Here's the create script for the table from pgAdmin (I hope that
>     will be good enough instead of \d as I can't do that right now)..
>
>     -- Table: nlpg.match_data
>
>     -- DROP TABLE nlpg.match_data;
>
>     CREATE TABLE nlpg.match_data
>     (
>        premise_id integer,
>        usrn bigint,
>        org text,
>        sao text,
>     "level" text,
>        pao text,
>     "name" text,
>        street text,
>        town text,
>        pc postcode,
>        postcode text,
>        match_data_id integer NOT NULL DEFAULT
>     nextval('nlpg.match_data_match_data_id_seq1'::regclass),
>        addr_str text,
>        tssearch_name tsvector,
>
>        CONSTRAINT match_data_pkey1 PRIMARY KEY (match_data_id)
>     )
>     WITH (
>        OIDS=FALSE
>     );
>     ALTER TABLE nlpg.match_data OWNER TO postgres;
>     ALTER TABLE nlpg.match_data ALTER COLUMN "name" SET STATISTICS 10000;
>
>
>     -- Index: nlpg.index_match_data_mid
>
>     -- DROP INDEX nlpg.index_match_data_mid;
>
>     CREATE INDEX index_match_data_mid
>        ON nlpg.match_data
>        USING btree
>        (match_data_id);
>
>     -- Index: nlpg.index_match_data_pc
>
>     -- DROP INDEX nlpg.index_match_data_pc;
>
>     CREATE INDEX index_match_data_pc
>        ON nlpg.match_data
>        USING btree
>        (pc);
>
>     -- Index: nlpg.index_match_data_pid
>
>     -- DROP INDEX nlpg.index_match_data_pid;
>
>     CREATE INDEX index_match_data_pid
>        ON nlpg.match_data
>        USING btree
>        (premise_id);
>
>     -- Index: nlpg.index_match_data_tssearch_name
>
>     -- DROP INDEX nlpg.index_match_data_tssearch_name;
>
>     CREATE INDEX index_match_data_tssearch_name
>        ON nlpg.match_data
>        USING gin
>        (tssearch_name);
>
>     -- Index: nlpg.index_match_data_usrn
>
>     -- DROP INDEX nlpg.index_match_data_usrn;
>
>     CREATE INDEX index_match_data_usrn
>        ON nlpg.match_data
>        USING btree
>        (usrn);
>
>     As you can see, no FKs or triggers..
>
>     I am running: EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;
>
>     However, as it should take around 90mins (if it is linear) then I
>     thought I would send this now and follow up with the results once it
>     finishes. (Has taken 2hours so far..)
>
>     Thanks very much for your help.
>
>     Tom
>
>
>     On 28 May 2010 17:54, "Bill Moran" <wmoran@potentialtech.com
>     <mailto:wmoran@potentialtech.com>> wrote:
>>
>>     In response to Tom Wilcox <hungrytom@googlemail.com
>>     <mailto:hungrytom@googlemail.com>>:
>>
>>     > In addition, I have discovered that the update query that runs
>>     on each row
>>     > of a 27million row ta...
>>
>>     You're not liable to get shit for answers if you omit the mailing
>>     list from
>>     the conversation, especially since I know almost nothing about tuning
>>     PostgreSQL installed on Windows.
>>
>>     Are there multiple queries having this problem?  The original
>>     query didn't
>>     have normalise() in it, and I would be highly suspicious that a custom
>>     function may have a memory leak or other memory-intensive
>>     side-effects.
>>     What is the code for that function?
>>
>>     For example, does:
>>     UPDATE nlpg.match_data SET org = org WHERE match_data_id;
>>     finish in a reasonable amount of time or exhibit the same out of
>>     memory
>>     problem?
>>
>>     It'd be nice to see a \d on that table ... does it have any
>>     triggers or
>>     cascading foreign keys?
>>
>>     And stop
>>
>>     --
>>
>>     Bill Moran
>>     http://www.potentialtech.com
>>     http://people.collaborativefusion.com/~wmoran/
>>     <http://people.collaborativefusion.com/%7Ewmoran/>
>>
>


--
Bill Moran

Re: Out of Memory and Configuration Problems (Big Computer)

От

Tom Wilcox

Дата:

29 мая 2010 г., 17:45:54

Thanks Bill,

That sounds like good advice. I am rerunning my query with the python
function peppered with plpy.notice("msg") call.

Hopefully that'll shed some light on which inputs it's crashing on. Does
anyone know of a way to measure the memory being consumed by the
function/query so that I can spot any memory leak. I'm not very good at
debugging memory leaks..

Failing that, perhaps it is time to rewrite the function in SQL or
embrace the TSearch2 dictionaries. I was originally intending on
figuring out a way to extend the ISpell dictionary to match and replace
those keywords..

Thanks,
Tom

Bill Moran wrote:
> On 5/28/10 8:43:48 PM, Tom Wilcox wrote:
>> I ran this query:
>>
>> EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;
>>
>> And I got this result:
>>
>> "Seq Scan on match_data  (cost=0.00..9762191.68 rows=32205168 width=206)
>> (actual time=76873.592..357450.519 rows=27777961 loops=1)"
>> "Total runtime: 8028212.367 ms"
>
> That would seem to indicate that the problem is in your Python
> functions.
>
> Some ideas for next steps:
> * Perhaps it's just a few rows that have data in them that the
>   function has difficulty with.  Add some debugging/logging to
>   the function and see if the row it bombs on has anything unusual
>   in it (such as a very large text field)
> * While large, that function is fairly simplistic.  You may want
>   to consider rewriting it as an SQL function, which should be
>   more efficient in any event.
>
>>
>>
>> On 28 May 2010 19:39, Tom Wilcox <hungrytom@googlemail.com
>> <mailto:hungrytom@googlemail.com>> wrote:
>>
>>     Oops. Sorry about that.
>>
>>     I am having this problem with multiple queries however I am
>>     confident that a fair number may involve the custom plpython
>>     "normalise" function which I have made myself. I didn't think it
>>     would be complicated enough to produce a memory problem.. here it
>> is:
>>
>>     -- Normalises common address words (i.e. 'Ground' maps to 'grd')
>>     CREATE OR REPLACE FUNCTION normalise(s text) RETURNS text AS $$
>>     ADDR_FIELD_DELIM = ' '
>>
>>     # Returns distinct list without null or empty elements
>>     def distinct_str(list):
>>          seen = set()
>>          return [x for x in list if x not in seen and not seen.add(x)
>>     and x!=None and len(x)>0]
>>
>>     # normalise common words in given address string
>>     def normalise(match_data):
>>          if match_data==None: return ''
>>          import re
>>          # Tokenise
>>          toks = distinct_str(re.split(r'\s', match_data.lower()))
>>          out = ''
>>          for tok in toks:
>>              ## full word replace
>>              if tok == 'house' : out += 'hse'+ADDR_FIELD_DELIM
>>              elif tok == 'ground' : out += 'grd'+ADDR_FIELD_DELIM
>>              elif tok == 'gnd' : out += 'grd'+ADDR_FIELD_DELIM
>>              elif tok == 'front' : out += 'fnt'+ADDR_FIELD_DELIM
>>              elif tok == 'floor' : out += 'flr'+ADDR_FIELD_DELIM
>>              elif tok == 'floors' : out += 'flr'+ADDR_FIELD_DELIM
>>              elif tok == 'flrs' : out += 'flr'+ADDR_FIELD_DELIM
>>              elif tok == 'fl' : out += 'flr'+ADDR_FIELD_DELIM
>>              elif tok == 'basement' : out += 'bst'+ADDR_FIELD_DELIM
>>              elif tok == 'subbasement' : out += 'sbst'+ADDR_FIELD_DELIM
>>              elif tok == 'bsmt' : out += 'bst'+ADDR_FIELD_DELIM
>>              elif tok == 'lbst' : out += 'lower bst'+ADDR_FIELD_DELIM
>>              elif tok == 'street' : out += 'st'+ADDR_FIELD_DELIM
>>              elif tok == 'road' : out += 'rd'+ADDR_FIELD_DELIM
>>              elif tok == 'lane' : out += 'ln'+ADDR_FIELD_DELIM
>>              elif tok == 'rooms' : out += 'rm'+ADDR_FIELD_DELIM
>>              elif tok == 'room' : out += 'rm'+ADDR_FIELD_DELIM
>>              elif tok == 'no' : pass
>>              elif tok == 'number' : pass
>>              elif tok == 'and' : out += '&'+ADDR_FIELD_DELIM
>>              elif tok == 'rear' : out += 'rr'+ADDR_FIELD_DELIM
>>              elif tok == 'part' : out += 'pt'+ADDR_FIELD_DELIM
>>              elif tok == 'south' : out += 's'+ADDR_FIELD_DELIM
>>              elif tok == 'sth' : out += 's'+ADDR_FIELD_DELIM
>>              elif tok == 'north' : out += 'n'+ADDR_FIELD_DELIM
>>              elif tok == 'nth' : out += 'n'+ADDR_FIELD_DELIM
>>              elif tok == 'west' : out += 'w'+ADDR_FIELD_DELIM
>>              elif tok == 'wst' : out += 'w'+ADDR_FIELD_DELIM
>>              elif tok == 'east' : out += 'e'+ADDR_FIELD_DELIM
>>              elif tok == 'est' : out += 'e'+ADDR_FIELD_DELIM
>>              elif tok == 'first' : out += '1st'+ADDR_FIELD_DELIM
>>              elif tok == 'second' : out += '2nd'+ADDR_FIELD_DELIM
>>              elif tok == 'third' : out += '3rd'+ADDR_FIELD_DELIM
>>              elif tok == 'fourth' : out += '4th'+ADDR_FIELD_DELIM
>>              elif tok == 'fifth' : out += '5th'+ADDR_FIELD_DELIM
>>              elif tok == 'sixth' : out += '6th'+ADDR_FIELD_DELIM
>>              elif tok == 'seventh' : out += '7th'+ADDR_FIELD_DELIM
>>              elif tok == 'eighth' : out += '8th'+ADDR_FIELD_DELIM
>>              elif tok == 'ninth' : out += '9th'+ADDR_FIELD_DELIM
>>              elif tok == 'tenth' : out += '10th'+ADDR_FIELD_DELIM
>>              elif tok == 'eleventh' : out += '11th'+ADDR_FIELD_DELIM
>>              elif tok == 'twelfth' : out += '12th'+ADDR_FIELD_DELIM
>>              elif tok == 'thirteenth' : out += '13th'+ADDR_FIELD_DELIM
>>              elif tok == 'fourteenth' : out += '14th'+ADDR_FIELD_DELIM
>>              elif tok == 'fifteenth' : out += '15th'+ADDR_FIELD_DELIM
>>              elif tok == 'sixteenth' : out += '16th'+ADDR_FIELD_DELIM
>>              elif tok == 'seventeenth' : out += '17th'+ADDR_FIELD_DELIM
>>              elif tok == 'eighteenth' : out += '18th'+ADDR_FIELD_DELIM
>>              elif tok == 'nineteenth' : out += '19th'+ADDR_FIELD_DELIM
>>              elif tok == 'twentieth' : out += '20th'+ADDR_FIELD_DELIM
>>              # numbers 0 - 20
>>              elif tok == 'one' : out += '1'+ADDR_FIELD_DELIM
>>              elif tok == 'two' : out += '2'+ADDR_FIELD_DELIM
>>              elif tok == 'three' : out += '3'+ADDR_FIELD_DELIM
>>              elif tok == 'four' : out += '4'+ADDR_FIELD_DELIM
>>              elif tok == 'five' : out += '5'+ADDR_FIELD_DELIM
>>              elif tok == 'six' : out += '6'+ADDR_FIELD_DELIM
>>              elif tok == 'seven' : out += '7'+ADDR_FIELD_DELIM
>>              elif tok == 'eight' : out += '8'+ADDR_FIELD_DELIM
>>              elif tok == 'nine' : out += '9'+ADDR_FIELD_DELIM
>>              elif tok == 'ten' : out += '10'+ADDR_FIELD_DELIM
>>              elif tok == 'eleven' : out += '11'+ADDR_FIELD_DELIM
>>              elif tok == 'twelve' : out += '12'+ADDR_FIELD_DELIM
>>              elif tok == 'thirteen' : out += '13'+ADDR_FIELD_DELIM
>>              elif tok == 'fourteen' : out += '14'+ADDR_FIELD_DELIM
>>              elif tok == 'fifteen' : out += '15'+ADDR_FIELD_DELIM
>>              elif tok == 'sixteen' : out += '16'+ADDR_FIELD_DELIM
>>              elif tok == 'seventeen' : out += '17'+ADDR_FIELD_DELIM
>>              elif tok == 'eighteen' : out += '18'+ADDR_FIELD_DELIM
>>              elif tok == 'nineteen' : out += '19'+ADDR_FIELD_DELIM
>>              elif tok == 'twenty' : out += '20'+ADDR_FIELD_DELIM
>>              # town dictionary items
>>              elif tok == 'borough' : pass
>>              elif tok == 'city' : pass
>>              elif tok == 'of' : pass
>>              elif tok == 'the' : pass
>>              # a few extras (from looking at voa)
>>              elif tok == 'at' : pass
>>              elif tok == 'incl' : pass
>>              elif tok == 'inc' : pass
>>              else: out += tok+ADDR_FIELD_DELIM
>>          return out
>>
>>     return normalise(s)
>>     $$ LANGUAGE plpythonu;
>>
>>
>>     Here's the create script for the table from pgAdmin (I hope that
>>     will be good enough instead of \d as I can't do that right now)..
>>
>>     -- Table: nlpg.match_data
>>
>>     -- DROP TABLE nlpg.match_data;
>>
>>     CREATE TABLE nlpg.match_data
>>     (
>>        premise_id integer,
>>        usrn bigint,
>>        org text,
>>        sao text,
>>     "level" text,
>>        pao text,
>>     "name" text,
>>        street text,
>>        town text,
>>        pc postcode,
>>        postcode text,
>>        match_data_id integer NOT NULL DEFAULT
>>     nextval('nlpg.match_data_match_data_id_seq1'::regclass),
>>        addr_str text,
>>        tssearch_name tsvector,
>>
>>        CONSTRAINT match_data_pkey1 PRIMARY KEY (match_data_id)
>>     )
>>     WITH (
>>        OIDS=FALSE
>>     );
>>     ALTER TABLE nlpg.match_data OWNER TO postgres;
>>     ALTER TABLE nlpg.match_data ALTER COLUMN "name" SET STATISTICS
>> 10000;
>>
>>
>>     -- Index: nlpg.index_match_data_mid
>>
>>     -- DROP INDEX nlpg.index_match_data_mid;
>>
>>     CREATE INDEX index_match_data_mid
>>        ON nlpg.match_data
>>        USING btree
>>        (match_data_id);
>>
>>     -- Index: nlpg.index_match_data_pc
>>
>>     -- DROP INDEX nlpg.index_match_data_pc;
>>
>>     CREATE INDEX index_match_data_pc
>>        ON nlpg.match_data
>>        USING btree
>>        (pc);
>>
>>     -- Index: nlpg.index_match_data_pid
>>
>>     -- DROP INDEX nlpg.index_match_data_pid;
>>
>>     CREATE INDEX index_match_data_pid
>>        ON nlpg.match_data
>>        USING btree
>>        (premise_id);
>>
>>     -- Index: nlpg.index_match_data_tssearch_name
>>
>>     -- DROP INDEX nlpg.index_match_data_tssearch_name;
>>
>>     CREATE INDEX index_match_data_tssearch_name
>>        ON nlpg.match_data
>>        USING gin
>>        (tssearch_name);
>>
>>     -- Index: nlpg.index_match_data_usrn
>>
>>     -- DROP INDEX nlpg.index_match_data_usrn;
>>
>>     CREATE INDEX index_match_data_usrn
>>        ON nlpg.match_data
>>        USING btree
>>        (usrn);
>>
>>     As you can see, no FKs or triggers..
>>
>>     I am running: EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;
>>
>>     However, as it should take around 90mins (if it is linear) then I
>>     thought I would send this now and follow up with the results once it
>>     finishes. (Has taken 2hours so far..)
>>
>>     Thanks very much for your help.
>>
>>     Tom
>>
>>
>>     On 28 May 2010 17:54, "Bill Moran" <wmoran@potentialtech.com
>>     <mailto:wmoran@potentialtech.com>> wrote:
>>>
>>>     In response to Tom Wilcox <hungrytom@googlemail.com
>>>     <mailto:hungrytom@googlemail.com>>:
>>>
>>>     > In addition, I have discovered that the update query that runs
>>>     on each row
>>>     > of a 27million row ta...
>>>
>>>     You're not liable to get shit for answers if you omit the mailing
>>>     list from
>>>     the conversation, especially since I know almost nothing about
>>> tuning
>>>     PostgreSQL installed on Windows.
>>>
>>>     Are there multiple queries having this problem?  The original
>>>     query didn't
>>>     have normalise() in it, and I would be highly suspicious that a
>>> custom
>>>     function may have a memory leak or other memory-intensive
>>>     side-effects.
>>>     What is the code for that function?
>>>
>>>     For example, does:
>>>     UPDATE nlpg.match_data SET org = org WHERE match_data_id;
>>>     finish in a reasonable amount of time or exhibit the same out of
>>>     memory
>>>     problem?
>>>
>>>     It'd be nice to see a \d on that table ... does it have any
>>>     triggers or
>>>     cascading foreign keys?
>>>
>>>     And stop
>>>
>>>     --
>>>
>>>     Bill Moran
>>>     http://www.potentialtech.com
>>>     http://people.collaborativefusion.com/~wmoran/
>>>     <http://people.collaborativefusion.com/%7Ewmoran/>
>>>
>>
>
>

Re: Out of Memory and Configuration Problems (Big Computer)

От

Tom Wilcox

Дата:

31 мая 2010 г., 19:48:01

I am having difficulties. I have rerun my update that uses the python
functions..

(1) UPDATE nlpg.match_data SET org = normalise(org);

And some other similar queries on neighbouring fields in the table. They
have all now worked. Without any changes to the configuration. I have
done one thing in an attempt to minimise the risk of memory leak
normalise() I added "toks = None" to the end of the normalise()
function. However this was done after query (1) succeeded on the rerun.

Why would I get inconsistent behaviour? Would it have anything to do
with SQL Server running on the same machine (although not actually doing
anything at the moment - just idle server running in background).

Tangent: Is there any way to increase the memory allocated to postgres
by Windows using Job Objects?

Cheers,
Tom

On 29/05/2010 18:55, Bill Moran wrote:
> On 5/28/10 8:43:48 PM, Tom Wilcox wrote:
>> I ran this query:
>>
>> EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;
>>
>> And I got this result:
>>
>> "Seq Scan on match_data  (cost=0.00..9762191.68 rows=32205168 width=206)
>> (actual time=76873.592..357450.519 rows=27777961 loops=1)"
>> "Total runtime: 8028212.367 ms"
>
> That would seem to indicate that the problem is in your Python
> functions.
>
> Some ideas for next steps:
> * Perhaps it's just a few rows that have data in them that the
>   function has difficulty with.  Add some debugging/logging to
>   the function and see if the row it bombs on has anything unusual
>   in it (such as a very large text field)
> * While large, that function is fairly simplistic.  You may want
>   to consider rewriting it as an SQL function, which should be
>   more efficient in any event.
>
>>
>>
>> On 28 May 2010 19:39, Tom Wilcox <hungrytom@googlemail.com
>> <mailto:hungrytom@googlemail.com>> wrote:
>>
>>     Oops. Sorry about that.
>>
>>     I am having this problem with multiple queries however I am
>>     confident that a fair number may involve the custom plpython
>>     "normalise" function which I have made myself. I didn't think it
>>     would be complicated enough to produce a memory problem.. here it
>> is:
>>
>>     -- Normalises common address words (i.e. 'Ground' maps to 'grd')
>>     CREATE OR REPLACE FUNCTION normalise(s text) RETURNS text AS $$
>>     ADDR_FIELD_DELIM = ' '
>>
>>     # Returns distinct list without null or empty elements
>>     def distinct_str(list):
>>          seen = set()
>>          return [x for x in list if x not in seen and not seen.add(x)
>>     and x!=None and len(x)>0]
>>
>>     # normalise common words in given address string
>>     def normalise(match_data):
>>          if match_data==None: return ''
>>          import re
>>          # Tokenise
>>          toks = distinct_str(re.split(r'\s', match_data.lower()))
>>          out = ''
>>          for tok in toks:
>>              ## full word replace
>>              if tok == 'house' : out += 'hse'+ADDR_FIELD_DELIM
>>              elif tok == 'ground' : out += 'grd'+ADDR_FIELD_DELIM
>>              elif tok == 'gnd' : out += 'grd'+ADDR_FIELD_DELIM
>>              elif tok == 'front' : out += 'fnt'+ADDR_FIELD_DELIM
>>              elif tok == 'floor' : out += 'flr'+ADDR_FIELD_DELIM
>>              elif tok == 'floors' : out += 'flr'+ADDR_FIELD_DELIM
>>              elif tok == 'flrs' : out += 'flr'+ADDR_FIELD_DELIM
>>              elif tok == 'fl' : out += 'flr'+ADDR_FIELD_DELIM
>>              elif tok == 'basement' : out += 'bst'+ADDR_FIELD_DELIM
>>              elif tok == 'subbasement' : out += 'sbst'+ADDR_FIELD_DELIM
>>              elif tok == 'bsmt' : out += 'bst'+ADDR_FIELD_DELIM
>>              elif tok == 'lbst' : out += 'lower bst'+ADDR_FIELD_DELIM
>>              elif tok == 'street' : out += 'st'+ADDR_FIELD_DELIM
>>              elif tok == 'road' : out += 'rd'+ADDR_FIELD_DELIM
>>              elif tok == 'lane' : out += 'ln'+ADDR_FIELD_DELIM
>>              elif tok == 'rooms' : out += 'rm'+ADDR_FIELD_DELIM
>>              elif tok == 'room' : out += 'rm'+ADDR_FIELD_DELIM
>>              elif tok == 'no' : pass
>>              elif tok == 'number' : pass
>>              elif tok == 'and' : out += '&'+ADDR_FIELD_DELIM
>>              elif tok == 'rear' : out += 'rr'+ADDR_FIELD_DELIM
>>              elif tok == 'part' : out += 'pt'+ADDR_FIELD_DELIM
>>              elif tok == 'south' : out += 's'+ADDR_FIELD_DELIM
>>              elif tok == 'sth' : out += 's'+ADDR_FIELD_DELIM
>>              elif tok == 'north' : out += 'n'+ADDR_FIELD_DELIM
>>              elif tok == 'nth' : out += 'n'+ADDR_FIELD_DELIM
>>              elif tok == 'west' : out += 'w'+ADDR_FIELD_DELIM
>>              elif tok == 'wst' : out += 'w'+ADDR_FIELD_DELIM
>>              elif tok == 'east' : out += 'e'+ADDR_FIELD_DELIM
>>              elif tok == 'est' : out += 'e'+ADDR_FIELD_DELIM
>>              elif tok == 'first' : out += '1st'+ADDR_FIELD_DELIM
>>              elif tok == 'second' : out += '2nd'+ADDR_FIELD_DELIM
>>              elif tok == 'third' : out += '3rd'+ADDR_FIELD_DELIM
>>              elif tok == 'fourth' : out += '4th'+ADDR_FIELD_DELIM
>>              elif tok == 'fifth' : out += '5th'+ADDR_FIELD_DELIM
>>              elif tok == 'sixth' : out += '6th'+ADDR_FIELD_DELIM
>>              elif tok == 'seventh' : out += '7th'+ADDR_FIELD_DELIM
>>              elif tok == 'eighth' : out += '8th'+ADDR_FIELD_DELIM
>>              elif tok == 'ninth' : out += '9th'+ADDR_FIELD_DELIM
>>              elif tok == 'tenth' : out += '10th'+ADDR_FIELD_DELIM
>>              elif tok == 'eleventh' : out += '11th'+ADDR_FIELD_DELIM
>>              elif tok == 'twelfth' : out += '12th'+ADDR_FIELD_DELIM
>>              elif tok == 'thirteenth' : out += '13th'+ADDR_FIELD_DELIM
>>              elif tok == 'fourteenth' : out += '14th'+ADDR_FIELD_DELIM
>>              elif tok == 'fifteenth' : out += '15th'+ADDR_FIELD_DELIM
>>              elif tok == 'sixteenth' : out += '16th'+ADDR_FIELD_DELIM
>>              elif tok == 'seventeenth' : out += '17th'+ADDR_FIELD_DELIM
>>              elif tok == 'eighteenth' : out += '18th'+ADDR_FIELD_DELIM
>>              elif tok == 'nineteenth' : out += '19th'+ADDR_FIELD_DELIM
>>              elif tok == 'twentieth' : out += '20th'+ADDR_FIELD_DELIM
>>              # numbers 0 - 20
>>              elif tok == 'one' : out += '1'+ADDR_FIELD_DELIM
>>              elif tok == 'two' : out += '2'+ADDR_FIELD_DELIM
>>              elif tok == 'three' : out += '3'+ADDR_FIELD_DELIM
>>              elif tok == 'four' : out += '4'+ADDR_FIELD_DELIM
>>              elif tok == 'five' : out += '5'+ADDR_FIELD_DELIM
>>              elif tok == 'six' : out += '6'+ADDR_FIELD_DELIM
>>              elif tok == 'seven' : out += '7'+ADDR_FIELD_DELIM
>>              elif tok == 'eight' : out += '8'+ADDR_FIELD_DELIM
>>              elif tok == 'nine' : out += '9'+ADDR_FIELD_DELIM
>>              elif tok == 'ten' : out += '10'+ADDR_FIELD_DELIM
>>              elif tok == 'eleven' : out += '11'+ADDR_FIELD_DELIM
>>              elif tok == 'twelve' : out += '12'+ADDR_FIELD_DELIM
>>              elif tok == 'thirteen' : out += '13'+ADDR_FIELD_DELIM
>>              elif tok == 'fourteen' : out += '14'+ADDR_FIELD_DELIM
>>              elif tok == 'fifteen' : out += '15'+ADDR_FIELD_DELIM
>>              elif tok == 'sixteen' : out += '16'+ADDR_FIELD_DELIM
>>              elif tok == 'seventeen' : out += '17'+ADDR_FIELD_DELIM
>>              elif tok == 'eighteen' : out += '18'+ADDR_FIELD_DELIM
>>              elif tok == 'nineteen' : out += '19'+ADDR_FIELD_DELIM
>>              elif tok == 'twenty' : out += '20'+ADDR_FIELD_DELIM
>>              # town dictionary items
>>              elif tok == 'borough' : pass
>>              elif tok == 'city' : pass
>>              elif tok == 'of' : pass
>>              elif tok == 'the' : pass
>>              # a few extras (from looking at voa)
>>              elif tok == 'at' : pass
>>              elif tok == 'incl' : pass
>>              elif tok == 'inc' : pass
>>              else: out += tok+ADDR_FIELD_DELIM
>>          return out
>>
>>     return normalise(s)
>>     $$ LANGUAGE plpythonu;
>>
>>
>>     Here's the create script for the table from pgAdmin (I hope that
>>     will be good enough instead of \d as I can't do that right now)..
>>
>>     -- Table: nlpg.match_data
>>
>>     -- DROP TABLE nlpg.match_data;
>>
>>     CREATE TABLE nlpg.match_data
>>     (
>>        premise_id integer,
>>        usrn bigint,
>>        org text,
>>        sao text,
>>     "level" text,
>>        pao text,
>>     "name" text,
>>        street text,
>>        town text,
>>        pc postcode,
>>        postcode text,
>>        match_data_id integer NOT NULL DEFAULT
>>     nextval('nlpg.match_data_match_data_id_seq1'::regclass),
>>        addr_str text,
>>        tssearch_name tsvector,
>>
>>        CONSTRAINT match_data_pkey1 PRIMARY KEY (match_data_id)
>>     )
>>     WITH (
>>        OIDS=FALSE
>>     );
>>     ALTER TABLE nlpg.match_data OWNER TO postgres;
>>     ALTER TABLE nlpg.match_data ALTER COLUMN "name" SET STATISTICS
>> 10000;
>>
>>
>>     -- Index: nlpg.index_match_data_mid
>>
>>     -- DROP INDEX nlpg.index_match_data_mid;
>>
>>     CREATE INDEX index_match_data_mid
>>        ON nlpg.match_data
>>        USING btree
>>        (match_data_id);
>>
>>     -- Index: nlpg.index_match_data_pc
>>
>>     -- DROP INDEX nlpg.index_match_data_pc;
>>
>>     CREATE INDEX index_match_data_pc
>>        ON nlpg.match_data
>>        USING btree
>>        (pc);
>>
>>     -- Index: nlpg.index_match_data_pid
>>
>>     -- DROP INDEX nlpg.index_match_data_pid;
>>
>>     CREATE INDEX index_match_data_pid
>>        ON nlpg.match_data
>>        USING btree
>>        (premise_id);
>>
>>     -- Index: nlpg.index_match_data_tssearch_name
>>
>>     -- DROP INDEX nlpg.index_match_data_tssearch_name;
>>
>>     CREATE INDEX index_match_data_tssearch_name
>>        ON nlpg.match_data
>>        USING gin
>>        (tssearch_name);
>>
>>     -- Index: nlpg.index_match_data_usrn
>>
>>     -- DROP INDEX nlpg.index_match_data_usrn;
>>
>>     CREATE INDEX index_match_data_usrn
>>        ON nlpg.match_data
>>        USING btree
>>        (usrn);
>>
>>     As you can see, no FKs or triggers..
>>
>>     I am running: EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;
>>
>>     However, as it should take around 90mins (if it is linear) then I
>>     thought I would send this now and follow up with the results once it
>>     finishes. (Has taken 2hours so far..)
>>
>>     Thanks very much for your help.
>>
>>     Tom
>>
>>
>>     On 28 May 2010 17:54, "Bill Moran" <wmoran@potentialtech.com
>> <mailto:wmoran@potentialtech.com>> wrote:
>>>
>>>     In response to Tom Wilcox <hungrytom@googlemail.com
>>> <mailto:hungrytom@googlemail.com>>:
>>>
>>> > In addition, I have discovered that the update query that runs
>>>     on each row
>>> > of a 27million row ta...
>>>
>>>     You're not liable to get shit for answers if you omit the mailing
>>>     list from
>>>     the conversation, especially since I know almost nothing about
>>> tuning
>>>     PostgreSQL installed on Windows.
>>>
>>>     Are there multiple queries having this problem?  The original
>>>     query didn't
>>>     have normalise() in it, and I would be highly suspicious that a
>>> custom
>>>     function may have a memory leak or other memory-intensive
>>>     side-effects.
>>>     What is the code for that function?
>>>
>>>     For example, does:
>>>     UPDATE nlpg.match_data SET org = org WHERE match_data_id;
>>>     finish in a reasonable amount of time or exhibit the same out of
>>>     memory
>>>     problem?
>>>
>>>     It'd be nice to see a \d on that table ... does it have any
>>>     triggers or
>>>     cascading foreign keys?
>>>
>>>     And stop
>>>
>>>     --
>>>
>>>     Bill Moran
>>>     http://www.potentialtech.com
>>>     http://people.collaborativefusion.com/~wmoran/
>>> <http://people.collaborativefusion.com/%7Ewmoran/>
>>>
>>
>
>

Re: Out of Memory and Configuration Problems (Big Computer)

От

Tom Wilcox

Дата:

02 июня 2010 г., 08:17:34

I have now hit a new query that produces Out of memory errors in a
similar way to the last ones. Can anyone please suggest why I might be
getting this error and any way I can go about diagnosing or fixing it..

The error I get is:

ERROR: out of memory
SQL state: 53200
Detail: Failed on request of size 114.
Context: SQL statement "ANALYZE VERBOSE nlpg.match_data" PL/pgSQL
function "mk_tbls_4e" line 8 at SQL .. (see log below)

The offending function is called using:

-- Modify NLPG matchdata (addr_str tssearch lookup field)
BEGIN;
SELECT nlpg.mk_tbls_4e();
COMMIT;

The function is:

CREATE OR REPLACE FUNCTION nlpg.mk_tbls_4e() RETURNS BOOLEAN AS $$
BEGIN
     ALTER TABLE nlpg.match_data ADD COLUMN tssearch_addr_str tsvector;
     UPDATE nlpg.match_data SET tssearch_addr_str =
to_tsvector(meta_addr_str(addr_str));
     DROP INDEX IF EXISTS nlpg.index_match_data_tssearch_addr_str;
     CREATE INDEX index_match_data_tssearch_addr_str ON nlpg.match_data
USING gin(tssearch_addr_str);
     ANALYZE VERBOSE nlpg.match_data;
     RETURN true;
END;
$$ LANGUAGE 'plpgsql';

Since the query failed on line 8: "ANALYZE VERBOSE nlpg.match_data" I
hope you won't need to know much more about the inner workings of
meta_addr_str. However, here they are (featuring the normalise function
from earlier conversations):

CREATE OR REPLACE FUNCTION metaphoneExt(word text) RETURNS text AS $$
BEGIN
     IF is_alnum(word) THEN
         RETURN word;
     ELSE
         RETURN metaphone(word,100);
     END IF;
END;
$$ LANGUAGE plpgsql;

-- Return a normalised metaphone-encoded string containing all the valid
words for text searching
DROP FUNCTION IF EXISTS meta_addr_str(addr_str text) ;
CREATE OR REPLACE FUNCTION meta_addr_str(addr_str text) RETURNS text AS $$
DECLARE
     meta_addr_str text;
     meta_word text;
BEGIN
     meta_addr_str = '';
     FOR meta_word IN
         SELECT * FROM
         (
             SELECT
             metaphoneExt(
             regexp_split_to_table(
             regexp_replace(
             normalise(
                 $1
             )
             ,'[^\\w]', ' ', 'g')
             , E'\\\s+')
             ) AS meta
         ) AS x
         WHERE meta IS NOT NULL and length(trim(meta))>0
     LOOP
         meta_addr_str = meta_addr_str || ' ' || COALESCE(meta_word,'');
     END LOOP;

     RETURN meta_addr_str;
END;
$$ LANGUAGE 'plpgsql';

Finally, here is the end of the log file where the error occurs:

...(more of the same above)...
2010-06-02 03:09:32 BSTHINT:  Consider increasing the configuration
parameter "checkpoint_segments".
2010-06-02 03:09:41 BSTLOG:  checkpoints are occurring too frequently (9
seconds apart)
2010-06-02 03:09:41 BSTHINT:  Consider increasing the configuration
parameter "checkpoint_segments".
2010-06-02 03:09:49 BSTLOG:  checkpoints are occurring too frequently (8
seconds apart)
2010-06-02 03:09:49 BSTHINT:  Consider increasing the configuration
parameter "checkpoint_segments".
2010-06-02 03:09:56 BSTLOG:  checkpoints are occurring too frequently (7
seconds apart)
2010-06-02 03:09:56 BSTHINT:  Consider increasing the configuration
parameter "checkpoint_segments".
2010-06-02 03:10:03 BSTLOG:  checkpoints are occurring too frequently (7
seconds apart)
2010-06-02 03:10:03 BSTHINT:  Consider increasing the configuration
parameter "checkpoint_segments".
2010-06-02 03:10:09 BSTLOG:  checkpoints are occurring too frequently (6
seconds apart)
2010-06-02 03:10:09 BSTHINT:  Consider increasing the configuration
parameter "checkpoint_segments".
TopMemoryContext: 66200 total in 8 blocks; 4144 free (13 chunks); 62056 used
   PL/PgSQL function context: 8192 total in 1 blocks; 6480 free (4
chunks); 1712 used
   TopTransactionContext: 516096 total in 6 blocks; 183384 free (26
chunks); 332712 used
     Combo CIDs: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used
     ExecutorState: 8192 total in 1 blocks; 2424 free (4 chunks); 5768 used
       ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
     SPI Exec: 24576 total in 2 blocks; 24544 free (12 chunks); 32 used
     SPI Proc: 8192 total in 1 blocks; 7264 free (2 chunks); 928 used
   Type information cache: 8192 total in 1 blocks; 1800 free (0 chunks);
6392 used
   Tsearch dictionary cache: 8192 total in 1 blocks; 5384 free (0
chunks); 2808 used
   Tsearch parser cache: 8192 total in 1 blocks; 4872 free (0 chunks);
3320 used
   Tsearch configuration cache: 8192 total in 1 blocks; 5384 free (0
chunks); 2808 used
   PL/PgSQL function context: 8192 total in 1 blocks; 7128 free (4
chunks); 1064 used
   PL/PgSQL function context: 24576 total in 2 blocks; 19616 free (10
chunks); 4960 used
   PL/PgSQL function context: 8192 total in 1 blocks; 6544 free (4
chunks); 1648 used
   CFuncHash: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used
   Rendezvous variable hash: 8192 total in 1 blocks; 3848 free (0
chunks); 4344 used
   PLpgSQL function cache: 24328 total in 2 blocks; 5904 free (0
chunks); 18424 used
   Operator class cache: 8192 total in 1 blocks; 3848 free (0 chunks);
4344 used
   Operator lookup cache: 24576 total in 2 blocks; 14072 free (6
chunks); 10504 used
   MessageContext: 24576 total in 2 blocks; 20408 free (6 chunks); 4168 used
   smgr relation table: 8192 total in 1 blocks; 760 free (0 chunks);
7432 used
   TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0
chunks); 16 used
   Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
   PortalMemory: 8192 total in 1 blocks; 8040 free (1 chunks); 152 used
     PortalHeapMemory: 1024 total in 1 blocks; 920 free (0 chunks); 104 used
       Analyze: 1689879704 total in 217 blocks; 20451272 free (34962
chunks); 1669428432 used
         Analyze Column: 310378496 total in 63 blocks; 2168 free (61
chunks); 310376328 used
           Analyzed lexemes table: 516096 total in 6 blocks; 110216 free
(21 chunks); 405880 used
       Vacuum: 8192 total in 1 blocks; 7872 free (0 chunks); 320 used
       ExecutorState: 8192 total in 1 blocks; 5984 free (1 chunks); 2208
used
         ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
   Relcache by OID: 8192 total in 1 blocks; 2856 free (0 chunks); 5336 used
   CacheMemoryContext: 667696 total in 20 blocks; 120608 free (43
chunks); 547088 used
     CachedPlan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
     CachedPlanSource: 1024 total in 1 blocks; 720 free (0 chunks); 304 used
     SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
     index_match_data_tssearch_addr_str: 1024 total in 1 blocks; 136
free (0 chunks); 888 used
     CachedPlan: 1024 total in 1 blocks; 568 free (0 chunks); 456 used
     CachedPlanSource: 1024 total in 1 blocks; 384 free (0 chunks); 640 used
     SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
     CachedPlan: 1024 total in 1 blocks; 672 free (0 chunks); 352 used
     CachedPlanSource: 1024 total in 1 blocks; 552 free (0 chunks); 472 used
     SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
     CachedPlan: 7168 total in 3 blocks; 2864 free (0 chunks); 4304 used
     CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used
     SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
     index_match_data_tssearch_street: 1024 total in 1 blocks; 136 free
(0 chunks); 888 used
     index_match_data_tssearch_name: 1024 total in 1 blocks; 136 free (0
chunks); 888 used
     index_match_data_usrn: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
     index_match_data_pid: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
     index_match_data_pc: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
     index_match_data_mid: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
     match_data_pkey1: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
     CachedPlan: 1024 total in 1 blocks; 544 free (0 chunks); 480 used
     CachedPlanSource: 1024 total in 1 blocks; 360 free (0 chunks); 664 used
     SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
     CachedPlan: 1024 total in 1 blocks; 584 free (0 chunks); 440 used
     CachedPlanSource: 1024 total in 1 blocks; 144 free (0 chunks); 880 used
     SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
     pg_toast_2619_index: 1024 total in 1 blocks; 240 free (0 chunks);
784 used
     CachedPlan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
     CachedPlanSource: 1024 total in 1 blocks; 720 free (0 chunks); 304 used
     SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
     pg_depend_depender_index: 1024 total in 1 blocks; 152 free (0
chunks); 872 used
     CachedPlan: 1024 total in 1 blocks; 584 free (0 chunks); 440 used
     CachedPlanSource: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
     SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
     CachedPlan: 1024 total in 1 blocks; 672 free (0 chunks); 352 used
     CachedPlanSource: 1024 total in 1 blocks; 552 free (0 chunks); 472 used
     SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
     CachedPlan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
     CachedPlanSource: 1024 total in 1 blocks; 336 free (0 chunks); 688 used
     SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
     simple: 1024 total in 1 blocks; 984 free (0 chunks); 40 used
     CachedPlan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
     CachedPlanSource: 1024 total in 1 blocks; 336 free (0 chunks); 688 used
     SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
     CachedPlan: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
     CachedPlanSource: 3072 total in 2 blocks; 1912 free (1 chunks);
1160 used
     SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
     CachedPlan: 1024 total in 1 blocks; 376 free (0 chunks); 648 used
     CachedPlanSource: 1024 total in 1 blocks; 120 free (0 chunks); 904 used
     SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
     CachedPlan: 1024 total in 1 blocks; 432 free (0 chunks); 592 used
     CachedPlanSource: 1024 total in 1 blocks; 160 free (0 chunks); 864 used
     SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
     CachedPlan: 3072 total in 2 blocks; 744 free (1 chunks); 2328 used
     CachedPlanSource: 3072 total in 2 blocks; 80 free (0 chunks); 2992 used
     SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
     CachedPlan: 1024 total in 1 blocks; 560 free (0 chunks); 464 used
     CachedPlanSource: 1024 total in 1 blocks; 336 free (0 chunks); 688 used
     SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
     CachedPlan: 7168 total in 3 blocks; 3104 free (0 chunks); 4064 used
     CachedPlanSource: 1024 total in 1 blocks; 104 free (0 chunks); 920 used
     SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
     pg_depend_reference_index: 1024 total in 1 blocks; 152 free (0
chunks); 872 used
     pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 240 free (0
chunks); 784 used
     CachedPlan: 1024 total in 1 blocks; 560 free (0 chunks); 464 used
     CachedPlanSource: 1024 total in 1 blocks; 440 free (0 chunks); 584 used
     SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
     pg_database_datname_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
     pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
     pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 152 free (0
chunks); 872 used
     pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 344
free (0 chunks); 680 used
     pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680
used
     pg_class_relname_nsp_index: 1024 total in 1 blocks; 240 free (0
chunks); 784 used
     pg_foreign_server_oid_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
     pg_statistic_relid_att_index: 1024 total in 1 blocks; 240 free (0
chunks); 784 used
     pg_cast_source_target_index: 1024 total in 1 blocks; 240 free (0
chunks); 784 used
     pg_language_name_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
     pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
     pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0
chunks); 936 used
     pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
     pg_ts_template_tmplname_index: 1024 total in 1 blocks; 280 free (0
chunks); 744 used
     pg_ts_config_map_index: 1024 total in 1 blocks; 152 free (0
chunks); 872 used
     pg_opclass_oid_index: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
     pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 344 free
(0 chunks); 680 used
     pg_auth_members_member_role_index: 1024 total in 1 blocks; 280 free
(0 chunks); 744 used
     pg_ts_dict_oid_index: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
     pg_conversion_default_index: 1024 total in 1 blocks; 128 free (0
chunks); 896 used
     pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 88 free (0
chunks); 936 used
     pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 240 free
(0 chunks); 784 used
     pg_enum_typid_label_index: 1024 total in 1 blocks; 280 free (0
chunks); 744 used
     pg_ts_config_oid_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
     pg_user_mapping_oid_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
     pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 192 free (0
chunks); 832 used
     pg_type_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720
used
     pg_aggregate_fnoid_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
     pg_constraint_oid_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
     pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 280 free (0
chunks); 744 used
     pg_ts_parser_prsname_index: 1024 total in 1 blocks; 280 free (0
chunks); 744 used
     pg_ts_config_cfgname_index: 1024 total in 1 blocks; 240 free (0
chunks); 784 used
     pg_ts_parser_oid_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
     pg_operator_oid_index: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
     pg_namespace_nspname_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
     pg_ts_template_oid_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
     pg_amop_opr_fam_index: 1024 total in 1 blocks; 240 free (0 chunks);
784 used
     pg_ts_dict_dictname_index: 1024 total in 1 blocks; 280 free (0
chunks); 744 used
     pg_auth_members_role_member_index: 1024 total in 1 blocks; 280 free
(0 chunks); 744 used
     pg_type_typname_nsp_index: 1024 total in 1 blocks; 240 free (0
chunks); 784 used
     pg_opfamily_oid_index: 1024 total in 1 blocks; 344 free (0 chunks);
680 used
     pg_class_oid_index: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
     pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 152 free (0
chunks); 872 used
     pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 240 free
(0 chunks); 784 used
     pg_proc_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720
used
     pg_language_oid_index: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
     pg_namespace_oid_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
     pg_database_oid_index: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
     pg_amproc_fam_proc_index: 1024 total in 1 blocks; 88 free (0
chunks); 936 used
     pg_authid_rolname_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
     pg_foreign_server_name_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
     pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 240 free
(0 chunks); 784 used
     pg_conversion_oid_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
     pg_user_mapping_user_server_index: 1024 total in 1 blocks; 280 free
(0 chunks); 744 used
     pg_conversion_name_nsp_index: 1024 total in 1 blocks; 280 free (0
chunks); 744 used
   MdSmgr: 8192 total in 1 blocks; 2128 free (3 chunks); 6064 used
   LOCALLOCK hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
   Timezones: 79320 total in 2 blocks; 5968 free (0 chunks); 73352 used
   ErrorContext: 8192 total in 1 blocks; 8176 free (10 chunks); 16 used
2010-06-02 03:18:42 BSTERROR:  out of memory
2010-06-02 03:18:42 BSTDETAIL:  Failed on request of size 114.
2010-06-02 03:18:42 BSTCONTEXT:  SQL statement "ANALYZE VERBOSE
nlpg.match_data"
     PL/pgSQL function "mk_tbls_4e" line 8 at SQL statement
2010-06-02 03:18:42 BSTSTATEMENT:  -- Modify NLPG matchdata (addr_str
tssearch lookup field)
     BEGIN;
     SELECT nlpg.mk_tbls_4e();
     COMMIT;

My plan now is to try increasing the shared_buffers, work_mem,
maintenance_work_mem and apparently checkpoint_segments and see if that
fixes it. However I am still stuck in a situation where the postgres
service isnt visible to Windows services and I fear that now I really am
just pulling levers blindly hoping to hit the right button. Also, these
queries take days to run. Therefore, please help, any and all
suggestions welcome.

Thanks,
Tom

On 31/05/2010 20:14, Tom Wilcox wrote:
> I am having difficulties. I have rerun my update that uses the python
> functions..
>
> (1) UPDATE nlpg.match_data SET org = normalise(org);
>
> And some other similar queries on neighbouring fields in the table.
> They have all now worked. Without any changes to the configuration. I
> have done one thing in an attempt to minimise the risk of memory leak
> normalise() I added "toks = None" to the end of the normalise()
> function. However this was done after query (1) succeeded on the rerun.
>
> Why would I get inconsistent behaviour? Would it have anything to do
> with SQL Server running on the same machine (although not actually
> doing anything at the moment - just idle server running in background).
>
> Tangent: Is there any way to increase the memory allocated to postgres
> by Windows using Job Objects?
>
> Cheers,
> Tom
>
> On 29/05/2010 18:55, Bill Moran wrote:
>> On 5/28/10 8:43:48 PM, Tom Wilcox wrote:
>>> I ran this query:
>>>
>>> EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;
>>>
>>> And I got this result:
>>>
>>> "Seq Scan on match_data  (cost=0.00..9762191.68 rows=32205168
>>> width=206)
>>> (actual time=76873.592..357450.519 rows=27777961 loops=1)"
>>> "Total runtime: 8028212.367 ms"
>>
>> That would seem to indicate that the problem is in your Python
>> functions.
>>
>> Some ideas for next steps:
>> * Perhaps it's just a few rows that have data in them that the
>>   function has difficulty with.  Add some debugging/logging to
>>   the function and see if the row it bombs on has anything unusual
>>   in it (such as a very large text field)
>> * While large, that function is fairly simplistic.  You may want
>>   to consider rewriting it as an SQL function, which should be
>>   more efficient in any event.
>>
>>>
>>>
>>> On 28 May 2010 19:39, Tom Wilcox <hungrytom@googlemail.com
>>> <mailto:hungrytom@googlemail.com>> wrote:
>>>
>>>     Oops. Sorry about that.
>>>
>>>     I am having this problem with multiple queries however I am
>>>     confident that a fair number may involve the custom plpython
>>>     "normalise" function which I have made myself. I didn't think it
>>>     would be complicated enough to produce a memory problem.. here
>>> it is:
>>>
>>>     -- Normalises common address words (i.e. 'Ground' maps to 'grd')
>>>     CREATE OR REPLACE FUNCTION normalise(s text) RETURNS text AS $$
>>>     ADDR_FIELD_DELIM = ' '
>>>
>>>     # Returns distinct list without null or empty elements
>>>     def distinct_str(list):
>>>          seen = set()
>>>          return [x for x in list if x not in seen and not seen.add(x)
>>>     and x!=None and len(x)>0]
>>>
>>>     # normalise common words in given address string
>>>     def normalise(match_data):
>>>          if match_data==None: return ''
>>>          import re
>>>          # Tokenise
>>>          toks = distinct_str(re.split(r'\s', match_data.lower()))
>>>          out = ''
>>>          for tok in toks:
>>>              ## full word replace
>>>              if tok == 'house' : out += 'hse'+ADDR_FIELD_DELIM
>>>              elif tok == 'ground' : out += 'grd'+ADDR_FIELD_DELIM
>>>              elif tok == 'gnd' : out += 'grd'+ADDR_FIELD_DELIM
>>>              elif tok == 'front' : out += 'fnt'+ADDR_FIELD_DELIM
>>>              elif tok == 'floor' : out += 'flr'+ADDR_FIELD_DELIM
>>>              elif tok == 'floors' : out += 'flr'+ADDR_FIELD_DELIM
>>>              elif tok == 'flrs' : out += 'flr'+ADDR_FIELD_DELIM
>>>              elif tok == 'fl' : out += 'flr'+ADDR_FIELD_DELIM
>>>              elif tok == 'basement' : out += 'bst'+ADDR_FIELD_DELIM
>>>              elif tok == 'subbasement' : out += 'sbst'+ADDR_FIELD_DELIM
>>>              elif tok == 'bsmt' : out += 'bst'+ADDR_FIELD_DELIM
>>>              elif tok == 'lbst' : out += 'lower bst'+ADDR_FIELD_DELIM
>>>              elif tok == 'street' : out += 'st'+ADDR_FIELD_DELIM
>>>              elif tok == 'road' : out += 'rd'+ADDR_FIELD_DELIM
>>>              elif tok == 'lane' : out += 'ln'+ADDR_FIELD_DELIM
>>>              elif tok == 'rooms' : out += 'rm'+ADDR_FIELD_DELIM
>>>              elif tok == 'room' : out += 'rm'+ADDR_FIELD_DELIM
>>>              elif tok == 'no' : pass
>>>              elif tok == 'number' : pass
>>>              elif tok == 'and' : out += '&'+ADDR_FIELD_DELIM
>>>              elif tok == 'rear' : out += 'rr'+ADDR_FIELD_DELIM
>>>              elif tok == 'part' : out += 'pt'+ADDR_FIELD_DELIM
>>>              elif tok == 'south' : out += 's'+ADDR_FIELD_DELIM
>>>              elif tok == 'sth' : out += 's'+ADDR_FIELD_DELIM
>>>              elif tok == 'north' : out += 'n'+ADDR_FIELD_DELIM
>>>              elif tok == 'nth' : out += 'n'+ADDR_FIELD_DELIM
>>>              elif tok == 'west' : out += 'w'+ADDR_FIELD_DELIM
>>>              elif tok == 'wst' : out += 'w'+ADDR_FIELD_DELIM
>>>              elif tok == 'east' : out += 'e'+ADDR_FIELD_DELIM
>>>              elif tok == 'est' : out += 'e'+ADDR_FIELD_DELIM
>>>              elif tok == 'first' : out += '1st'+ADDR_FIELD_DELIM
>>>              elif tok == 'second' : out += '2nd'+ADDR_FIELD_DELIM
>>>              elif tok == 'third' : out += '3rd'+ADDR_FIELD_DELIM
>>>              elif tok == 'fourth' : out += '4th'+ADDR_FIELD_DELIM
>>>              elif tok == 'fifth' : out += '5th'+ADDR_FIELD_DELIM
>>>              elif tok == 'sixth' : out += '6th'+ADDR_FIELD_DELIM
>>>              elif tok == 'seventh' : out += '7th'+ADDR_FIELD_DELIM
>>>              elif tok == 'eighth' : out += '8th'+ADDR_FIELD_DELIM
>>>              elif tok == 'ninth' : out += '9th'+ADDR_FIELD_DELIM
>>>              elif tok == 'tenth' : out += '10th'+ADDR_FIELD_DELIM
>>>              elif tok == 'eleventh' : out += '11th'+ADDR_FIELD_DELIM
>>>              elif tok == 'twelfth' : out += '12th'+ADDR_FIELD_DELIM
>>>              elif tok == 'thirteenth' : out += '13th'+ADDR_FIELD_DELIM
>>>              elif tok == 'fourteenth' : out += '14th'+ADDR_FIELD_DELIM
>>>              elif tok == 'fifteenth' : out += '15th'+ADDR_FIELD_DELIM
>>>              elif tok == 'sixteenth' : out += '16th'+ADDR_FIELD_DELIM
>>>              elif tok == 'seventeenth' : out += '17th'+ADDR_FIELD_DELIM
>>>              elif tok == 'eighteenth' : out += '18th'+ADDR_FIELD_DELIM
>>>              elif tok == 'nineteenth' : out += '19th'+ADDR_FIELD_DELIM
>>>              elif tok == 'twentieth' : out += '20th'+ADDR_FIELD_DELIM
>>>              # numbers 0 - 20
>>>              elif tok == 'one' : out += '1'+ADDR_FIELD_DELIM
>>>              elif tok == 'two' : out += '2'+ADDR_FIELD_DELIM
>>>              elif tok == 'three' : out += '3'+ADDR_FIELD_DELIM
>>>              elif tok == 'four' : out += '4'+ADDR_FIELD_DELIM
>>>              elif tok == 'five' : out += '5'+ADDR_FIELD_DELIM
>>>              elif tok == 'six' : out += '6'+ADDR_FIELD_DELIM
>>>              elif tok == 'seven' : out += '7'+ADDR_FIELD_DELIM
>>>              elif tok == 'eight' : out += '8'+ADDR_FIELD_DELIM
>>>              elif tok == 'nine' : out += '9'+ADDR_FIELD_DELIM
>>>              elif tok == 'ten' : out += '10'+ADDR_FIELD_DELIM
>>>              elif tok == 'eleven' : out += '11'+ADDR_FIELD_DELIM
>>>              elif tok == 'twelve' : out += '12'+ADDR_FIELD_DELIM
>>>              elif tok == 'thirteen' : out += '13'+ADDR_FIELD_DELIM
>>>              elif tok == 'fourteen' : out += '14'+ADDR_FIELD_DELIM
>>>              elif tok == 'fifteen' : out += '15'+ADDR_FIELD_DELIM
>>>              elif tok == 'sixteen' : out += '16'+ADDR_FIELD_DELIM
>>>              elif tok == 'seventeen' : out += '17'+ADDR_FIELD_DELIM
>>>              elif tok == 'eighteen' : out += '18'+ADDR_FIELD_DELIM
>>>              elif tok == 'nineteen' : out += '19'+ADDR_FIELD_DELIM
>>>              elif tok == 'twenty' : out += '20'+ADDR_FIELD_DELIM
>>>              # town dictionary items
>>>              elif tok == 'borough' : pass
>>>              elif tok == 'city' : pass
>>>              elif tok == 'of' : pass
>>>              elif tok == 'the' : pass
>>>              # a few extras (from looking at voa)
>>>              elif tok == 'at' : pass
>>>              elif tok == 'incl' : pass
>>>              elif tok == 'inc' : pass
>>>              else: out += tok+ADDR_FIELD_DELIM
>>>          return out
>>>
>>>     return normalise(s)
>>>     $$ LANGUAGE plpythonu;
>>>
>>>
>>>     Here's the create script for the table from pgAdmin (I hope that
>>>     will be good enough instead of \d as I can't do that right now)..
>>>
>>>     -- Table: nlpg.match_data
>>>
>>>     -- DROP TABLE nlpg.match_data;
>>>
>>>     CREATE TABLE nlpg.match_data
>>>     (
>>>        premise_id integer,
>>>        usrn bigint,
>>>        org text,
>>>        sao text,
>>>     "level" text,
>>>        pao text,
>>>     "name" text,
>>>        street text,
>>>        town text,
>>>        pc postcode,
>>>        postcode text,
>>>        match_data_id integer NOT NULL DEFAULT
>>>     nextval('nlpg.match_data_match_data_id_seq1'::regclass),
>>>        addr_str text,
>>>        tssearch_name tsvector,
>>>
>>>        CONSTRAINT match_data_pkey1 PRIMARY KEY (match_data_id)
>>>     )
>>>     WITH (
>>>        OIDS=FALSE
>>>     );
>>>     ALTER TABLE nlpg.match_data OWNER TO postgres;
>>>     ALTER TABLE nlpg.match_data ALTER COLUMN "name" SET STATISTICS
>>> 10000;
>>>
>>>
>>>     -- Index: nlpg.index_match_data_mid
>>>
>>>     -- DROP INDEX nlpg.index_match_data_mid;
>>>
>>>     CREATE INDEX index_match_data_mid
>>>        ON nlpg.match_data
>>>        USING btree
>>>        (match_data_id);
>>>
>>>     -- Index: nlpg.index_match_data_pc
>>>
>>>     -- DROP INDEX nlpg.index_match_data_pc;
>>>
>>>     CREATE INDEX index_match_data_pc
>>>        ON nlpg.match_data
>>>        USING btree
>>>        (pc);
>>>
>>>     -- Index: nlpg.index_match_data_pid
>>>
>>>     -- DROP INDEX nlpg.index_match_data_pid;
>>>
>>>     CREATE INDEX index_match_data_pid
>>>        ON nlpg.match_data
>>>        USING btree
>>>        (premise_id);
>>>
>>>     -- Index: nlpg.index_match_data_tssearch_name
>>>
>>>     -- DROP INDEX nlpg.index_match_data_tssearch_name;
>>>
>>>     CREATE INDEX index_match_data_tssearch_name
>>>        ON nlpg.match_data
>>>        USING gin
>>>        (tssearch_name);
>>>
>>>     -- Index: nlpg.index_match_data_usrn
>>>
>>>     -- DROP INDEX nlpg.index_match_data_usrn;
>>>
>>>     CREATE INDEX index_match_data_usrn
>>>        ON nlpg.match_data
>>>        USING btree
>>>        (usrn);
>>>
>>>     As you can see, no FKs or triggers..
>>>
>>>     I am running: EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;
>>>
>>>     However, as it should take around 90mins (if it is linear) then I
>>>     thought I would send this now and follow up with the results
>>> once it
>>>     finishes. (Has taken 2hours so far..)
>>>
>>>     Thanks very much for your help.
>>>
>>>     Tom
>>>
>>>
>>>     On 28 May 2010 17:54, "Bill Moran" <wmoran@potentialtech.com
>>> <mailto:wmoran@potentialtech.com>> wrote:
>>>>
>>>>     In response to Tom Wilcox <hungrytom@googlemail.com
>>>> <mailto:hungrytom@googlemail.com>>:
>>>>
>>>> > In addition, I have discovered that the update query that runs
>>>>     on each row
>>>> > of a 27million row ta...
>>>>
>>>>     You're not liable to get shit for answers if you omit the mailing
>>>>     list from
>>>>     the conversation, especially since I know almost nothing about
>>>> tuning
>>>>     PostgreSQL installed on Windows.
>>>>
>>>>     Are there multiple queries having this problem?  The original
>>>>     query didn't
>>>>     have normalise() in it, and I would be highly suspicious that a
>>>> custom
>>>>     function may have a memory leak or other memory-intensive
>>>>     side-effects.
>>>>     What is the code for that function?
>>>>
>>>>     For example, does:
>>>>     UPDATE nlpg.match_data SET org = org WHERE match_data_id;
>>>>     finish in a reasonable amount of time or exhibit the same out of
>>>>     memory
>>>>     problem?
>>>>
>>>>     It'd be nice to see a \d on that table ... does it have any
>>>>     triggers or
>>>>     cascading foreign keys?
>>>>
>>>>     And stop
>>>>
>>>>     --
>>>>
>>>>     Bill Moran
>>>>     http://www.potentialtech.com
>>>>     http://people.collaborativefusion.com/~wmoran/
>>>> <http://people.collaborativefusion.com/%7Ewmoran/>
>>>>
>>>
>>
>>
>

Re: Out of Memory and Configuration Problems (Big Computer)

От

Stephen Frost

Дата:

02 июня 2010 г., 08:41:28

* Tom Wilcox (hungrytom@googlemail.com) wrote:
> My plan now is to try increasing the shared_buffers, work_mem,
> maintenance_work_mem and apparently checkpoint_segments and see if that
> fixes it.

errrrr.  work_mem and maintenance_work_mem aren't *limits*, they're
more like *targets*.  The out of memory error you're getting isn't
because PG is hitting a limit you've set in postgresql.conf- it's
happening because PG is asking the OS for more memory (eg: malloc) and
getting told "sorry, no more available".  To that end, you probably want
to consider *lowering* the above parameters (in particular,
maintenance_work_mem, since that's what ANALYZE uses, iirc).  That will
cause PG to use less memory and/or spill things to disk instead of
trying to ask the OS for more memory than it has available.

What are those values currently set to?  How much memory is in the box?
Have you looked at PG's memory usage while these queries are running?
Do you have any swap?

    Thanks,

        Stephen

Вложения

signature.asc

Re: Out of Memory and Configuration Problems (Big Computer)

От

Tom Wilcox

Дата:

02 июня 2010 г., 11:26:48

Hi Stephen,

The impression I was getting from Magnus Hagander's blog was that a 32-bit version of Postgres could make use of >4Gb RAM when running on 64-bit Windows due to the way PG passes on the responsibility for caching onto the OS.. Is this definitely not the case then?

Here's where Im getting this from:
http://blog.hagander.net/archives/73-PostgreSQL-vs-64-bit-windows.html

Thanks,
Tom

On 2 June 2010 15:04, Stephen Frost <sfrost@snowman.net> wrote:

* Tom Wilcox (hungrytom@googlemail.com) wrote:
> My question now becomes.. Since it works now, do those memory usage stats
> from resource monitor show that postgres is using all the available memory
> (am I reading it wrong)? Is there a way to allocate 60GB of memory to the
> postgres process so that it can do all sorting, etc directly in RAM? Is
> there something I need to tell 64-bit Windows to get it to allocate more
> than 4GB of memory to a 32-bit postgres?

uhh, a 32-bit program (Postgres, or any other) can't use more than 4G of
RAM. That would be the crux of the problem here. Either get a 64bit
build of PG for Windows (I'm not sure what the status of that is at the
moment..), or get off Windows and on to a 64bit Linux with a 64bit PG.

Thanks,

Stephen

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkwGZQsACgkQrzgMPqB3kijNXgCfSVVSLUqUNs5gCIx0wk44hEmQ
0yIAoJYgfOqYZLjlftJ+0lU3WjUVoKHZ
=jdXN
-----END PGP SIGNATURE-----

Re: Fwd: Out of Memory and Configuration Problems (Big Computer)

От

Tom Wilcox

Дата:

02 июня 2010 г., 11:50:14

Stephen,

You're a legend! That is exactly the answer I needed to hear from someone who actually knows the score. I am now powering on with Plan B: Postgres64 on Linux64.

After relentless searching I have become fairly convinced that a stable release of 64-bit postgres for Windows doesn't exist yet. (I welcome anyone to show me otherwise). Since I am committed to postgres (already written the code, it works just not quickly), I will have to change my OS.

However, if these performance limitations on Windows were apparent to me from the start, I probably would have chosen MS SQL Server over Postgres (less pain to get the gain). Perhaps this is an argument in favour of 64-bit Windows port to be added to this list:

http://wiki.postgresql.org/wiki/64bit_Windows_port

Thanks again for all your help.

Tom

On 2 June 2010 15:27, Stephen Frost <sfrost@snowman.net> wrote:

Tom,

* Tom Wilcox (hungrytom@googlemail.com) wrote:
> The impression I was getting from Magnus Hagander's blog was that a 32-bit
> version of Postgres could make use of >4Gb RAM when running on 64-bit
> Windows due to the way PG passes on the responsibility for caching onto the
> OS.. Is this definitely not the case then?

Eh, sure, the data will be cache'd in the Windows OS, so more data will
be in memory, but you're never going to be able to use more than 4G for
any actual *processing*, like sorting, doing hash joins, having data in
shared buffers (to avoid having to go back to the OS and doing a system
call to get the data from the OS's cache..).

Not only that, but the shared_buffers are in *every* backend, so while
you'll only use 512MB for shared_buffers total, each backend will only
have 3.5G (or so) of memory to do 'other stuff'.

On a box with 16GB that's doing alot of relatively small activities
(OLTP type stuff), PG will work "alright". On a box with 96G with
terrabytes of data where you want to do data warehousing kind of work,
running a 32bit version of PG is going to suck.

Thanks,

Stephen

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkwGalUACgkQrzgMPqB3kigWugCfY411is3uy3grP6lSH3P+neaf
evYAn2vY8/V3GntpQA9Q434U79+GThSW
=ar57
-----END PGP SIGNATURE-----

Re: Out of Memory and Configuration Problems (Big Computer)

От

Magnus Hagander

Дата:

02 июня 2010 г., 12:08:59

It does when you have many sessions. But each individual session can
only use "32 bits worth of memory", and shaared memory counts in all
processes. The memory can be used for *os level cache*, not postgresql
buffercache.

//Magnus

On Wed, Jun 2, 2010 at 16:08, Tom Wilcox <hungrytom@googlemail.com> wrote:
> Hi Stephen,
>
> The impression I was getting from Magnus Hagander's blog was that a 32-bit
> version of Postgres could make use of >4Gb RAM when running on 64-bit
> Windows due to the way PG passes on the responsibility for caching onto the
> OS.. Is this definitely not the case then?
>
> Here's where Im getting this from:
> http://blog.hagander.net/archives/73-PostgreSQL-vs-64-bit-windows.html
>
> Thanks,
> Tom
>
>
> On 2 June 2010 15:04, Stephen Frost <sfrost@snowman.net> wrote:
>>
>> * Tom Wilcox (hungrytom@googlemail.com) wrote:
>> > My question now becomes.. Since it works now, do those memory usage
>> > stats
>> > from resource monitor show that postgres is using all the available
>> > memory
>> > (am I reading it wrong)? Is there a way to allocate 60GB of memory to
>> > the
>> > postgres process so that it can do all sorting, etc directly in RAM? Is
>> > there something I need to tell 64-bit Windows to get it to allocate more
>> > than 4GB of memory to a 32-bit postgres?
>>
>> uhh, a 32-bit program (Postgres, or any other) can't use more than 4G of
>> RAM.  That would be the crux of the problem here.  Either get a 64bit
>> build of PG for Windows (I'm not sure what the status of that is at the
>> moment..), or get off Windows and on to a 64bit Linux with a 64bit PG.
>>
>>        Thanks,
>>
>>                Stephen
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.9 (GNU/Linux)
>>
>> iEYEARECAAYFAkwGZQsACgkQrzgMPqB3kijNXgCfSVVSLUqUNs5gCIx0wk44hEmQ
>> 0yIAoJYgfOqYZLjlftJ+0lU3WjUVoKHZ
>> =jdXN
>> -----END PGP SIGNATURE-----
>>
>
>



--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Out of Memory and Configuration Problems (Big Computer)

От

Tom Wilcox

Дата:

02 июня 2010 г., 12:29:52

So for a system which was being used to serve many clients it would be
fine (web service, etc). But for my purposes where I am using a single
session to process large tables of data, (such as a mammoth update
statement normalising and encoding 25million rows of string data) the
32-bit version is not ideal..

If that is correct, then I think I am finally getting this.

Thanks,
Tom

On 02/06/2010 16:08, Magnus Hagander wrote:
> It does when you have many sessions. But each individual session can
> only use "32 bits worth of memory", and shaared memory counts in all
> processes. The memory can be used for *os level cache*, not postgresql
> buffercache.
>
> //Magnus
>
> On Wed, Jun 2, 2010 at 16:08, Tom Wilcox<hungrytom@googlemail.com>  wrote:
>
>> Hi Stephen,
>>
>> The impression I was getting from Magnus Hagander's blog was that a 32-bit
>> version of Postgres could make use of>4Gb RAM when running on 64-bit
>> Windows due to the way PG passes on the responsibility for caching onto the
>> OS.. Is this definitely not the case then?
>>
>> Here's where Im getting this from:
>> http://blog.hagander.net/archives/73-PostgreSQL-vs-64-bit-windows.html
>>
>> Thanks,
>> Tom
>>
>>
>> On 2 June 2010 15:04, Stephen Frost<sfrost@snowman.net>  wrote:
>>
>>> * Tom Wilcox (hungrytom@googlemail.com) wrote:
>>>
>>>> My question now becomes.. Since it works now, do those memory usage
>>>> stats
>>>> from resource monitor show that postgres is using all the available
>>>> memory
>>>> (am I reading it wrong)? Is there a way to allocate 60GB of memory to
>>>> the
>>>> postgres process so that it can do all sorting, etc directly in RAM? Is
>>>> there something I need to tell 64-bit Windows to get it to allocate more
>>>> than 4GB of memory to a 32-bit postgres?
>>>>
>>> uhh, a 32-bit program (Postgres, or any other) can't use more than 4G of
>>> RAM.  That would be the crux of the problem here.  Either get a 64bit
>>> build of PG for Windows (I'm not sure what the status of that is at the
>>> moment..), or get off Windows and on to a 64bit Linux with a 64bit PG.
>>>
>>>         Thanks,
>>>
>>>                 Stephen
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.9 (GNU/Linux)
>>>
>>> iEYEARECAAYFAkwGZQsACgkQrzgMPqB3kijNXgCfSVVSLUqUNs5gCIx0wk44hEmQ
>>> 0yIAoJYgfOqYZLjlftJ+0lU3WjUVoKHZ
>>> =jdXN
>>> -----END PGP SIGNATURE-----
>>>
>>>
>>
>>
>
>
>

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Out of Memory and Configuration Problems (Big Computer)

Вложения

Вложения