Обсуждение: Very Poor Insert Performance

Поиск
Список
Период
Сортировка

Very Poor Insert Performance

От
Damien Dougan
Дата:
Hi All,

We've been experiencing extremely poor batch upload performance on our
Postgres 7.3 (and 7.3.4) database, and I've not been able to improve matters
significantly using any suggestions I've gleamed off the mailing list
archives ... so I was wondering if anyone with a bigger brain in this area
could help :)

Our batch upload is performing a number of stored procedures to insert data on
the database. Initially, this results in quite good performance, but rapidly
spirals down to approximately 1 per second after some minutes.

I've got a script that generates stored procedure calls to upload test input
data, and the script is capable of inserting BEGIN and END at different
intervals, together with VACUUM ANALYZE commands as well.

I've tried varying the commit level from every operation, every 5, every 10,
every 25, every 100 operations (again, each operation is 5 SP calls) without
any noticeable improvement. Likewise, I've varied the VACUUM ANALYZE from
every 50 to every 100 operations - again without any measurable improvement.

top reports that CPU usage is pretty constant at 99%, and there is
approximately 1GB of free physical memory available to the OS (with
approximately 1GB of physical memory in use).

I've have been running postmaster with switched fsync off.

I also tried running with backbuffers of default (64), 128, 256, 512 and even
1024. Again, with no measurable change.

The typical metrics are (completed operations - each of these require 5 SP
calls):

1 min: 1036 (1036 operations)
2 min: 1426 (390 operations)
3 min: 1756 (330 operations)
4 min: 2026 (270 operations)
5 min: 2266 (240 operations)

When left running, its not too long before the code snails to 1 operation per
second.


Has anyone any ideas as to what could be causing the spiraling performance?


With approximately 20,000 operations commited in the database, it takes about
1 minute to upload a dump of the database - unfortunately we cannot use the
COPY command to upload brand new data - it really has to go through the
Stored Procedures to ensure relationships and data integrity across the
schema (it would be very difficult to develop and maintain code to generate
COPY commands for inserting new data). And whilst I appreciate INSERTs are
inherently slower than COPY, I was hoping for something significantly faster
than the 1 operation/second that things fairly quickly descend to...


Thanks for any advice!

Damien


Re: Very Poor Insert Performance

От
Greg Stark
Дата:
Damien Dougan <damien.dougan@mobilecohesion.com> writes:

> Our batch upload is performing a number of stored procedures to insert data on
> the database. Initially, this results in quite good performance, but rapidly
> spirals down to approximately 1 per second after some minutes.

It's fairly unlikely anyone will be able to help without you saying what
you're doing. What are these procedures doing? What do the tables look like?
What indexes exist?

At a guess the foreign key relationships you're enforcing don't have indexes
to help them. If they do perhaps postgres isn't using them.

--
greg

Re: Very Poor Insert Performance

От
Tom Lane
Дата:
Damien Dougan <damien.dougan@mobilecohesion.com> writes:
> Has anyone any ideas as to what could be causing the spiraling performance?

You really haven't provided any information that would allow anything
but guesses, but I'll guess anyway: poor plans for foreign key checks?
See nearby threads.

            regards, tom lane

Re: Very Poor Insert Performance

От
Vivek Khera
Дата:
>>>>> "GS" == Greg Stark <gsstark@mit.edu> writes:

GS> At a guess the foreign key relationships you're enforcing don't
GS> have indexes to help them. If they do perhaps postgres isn't using
GS> them.


Or, if you do have indexes, they've bloated to be way too big and are
overwhelming your shared buffers.  Reindex them and see it it helps.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: Very Poor Insert Performance

От
Damien Dougan
Дата:
On Monday 27 October 2003 8:12 pm, Tom Lane wrote:
> Damien Dougan <damien.dougan@mobilecohesion.com> writes:
> > Has anyone any ideas as to what could be causing the spiraling
> > performance?
>
> You really haven't provided any information that would allow anything
> but guesses, but I'll guess anyway: poor plans for foreign key checks?
> See nearby threads.
>
>             regards, tom lane


Apologies for not including more info - I had been hoping that spiralling performance was a known tell-tale sign of
something:) 


Here is some additional information - sorry if its overload, but I figured I should give an intro to the schema before
showingthe EXPLAIN results! 



Firstly, a quick schema overview for the relevant tables:

contact has many contactparts
address has many addressparts
contact has many address

Now, the above table relationships are connected via relationship tables (rather than foreign indexes directly to each
other),so we have: 

contact
rel_contact_has_contactpart
address
rel_address_has_addresspart

(The reasons behind this are for meta-data purposes - our database is intended to be very abstract from the code...)



                                 Table "public.contact"
     Column       |            Type             |                     Modifiers
------------------+-----------------------------+---------------------------------------------------
 id               | integer                     | default nextval('contact_id_seq'::text)
 version          | integer                     | default 1
 contactid        | character varying           |
 enddate          | timestamp without time zone |
 preferredaddress | character varying           |
 startdate        | timestamp without time zone |
Indexes:
    "contact_id_idx" unique, btree (id)
    "contact_key" unique, btree (contactid)


So we have an index on the meta-data related "id" and the externally visible "contactid" values. The "id" is used with
therel_contact_has_XXX tables (see below). 



                          Table "public.contactpart"
   Column    |       Type        |                   Modifiers
-------------+-------------------+------------------------------------------------
 id          | integer           | default nextval('contactpart_id_seq'::text)
 version     | integer           | default 1
 detailname  | character varying | not null
 detailvalue | character varying |
Indexes:
    "contactpart_id_idx" unique, btree (id)


So we have an index on the meta-data related "id".



                             Table "public.address"
 Column  |            Type             |                 Modifiers
---------+-----------------------------+--------------------------------------------
 id      | integer                     | default nextval('mc_address_id_seq'::text)
 version | integer                     | default 1
 enddate | timestamp without time zone |
 format  | character varying           |
 type    | character varying           | not null
 value   | character varying           |
Indexes:
    "address_id_idx" unique, btree (id)
    "address_value_key" btree (value)


So we have an index on the meta-data related "id".


                          Table "public.addresspart"
   Column    |       Type        |                   Modifiers
-------------+-------------------+------------------------------------------------
 id          | integer           | default nextval('addresspart_id_seq'::text)
 version     | integer           | default 1
 detailname  | character varying | not null
 detailvalue | character varying |
Indexes:
    "addresspart_id_idx" unique, btree (id)

So we have an index on the meta-data related "id". This is used with the rel_address_has_addresspart table (see below).




 Table "public.rel_contact_has_contactpart"
        Column        |  Type   | Modifiers
----------------------+---------+-----------
 contact_id           | integer |
 contactpart_id       | integer |
Indexes:
    "rel_contact_has_contactpart_idx2" unique, btree (contactpart_id)
    "rel_contact_has_contactpart_idx1" btree (contact_id)

So we have a unique index on the contactpart and a non-unique index on the contact (to reflect the 1:M relationship
contacthas contactparts) 



Table "public.rel_address_has_addresspart"
      Column       |  Type   | Modifiers
-------------------+---------+-----------
 address_id        | integer |
 addresspart_id    | integer |
Indexes:
    "rel_address_has_addresspart_idx2" unique, btree (addresspart_id)
    "rel_address_has_addresspart_idx1" btree (address_id)

So we have a unique index on the addresspart and a non-unique index on the address (to reflect the 1:M relationship
addresshas addressparts) 



   Table "public.rel_contact_has_address"
        Column        |  Type   | Modifiers
----------------------+---------+-----------
 contact_id           | integer |
 address_id           | integer |
Indexes:
    "rel_contact_has_address_idx2" unique, btree (address_id)
    "rel_contact_has_address_idx1" btree (contactdetails_id)

So we have a unique index on the address and a non-unique index on the contact (to reflect the 1:M relationship contact
hasaddresses) 


However, to add a layer of abstraction to the business logic, the underlying tables are never directly exposed through
anythingother than public views. The public views combine the <table> and <tablepart> into a single table. 

So we have 2 public views: PvContact which ties together the contact and contactparts, and PvAddress which ties
togetherthe address and addresspart. 

                  View "public.pvcontact"
     Column       |            Type             | Modifiers
------------------+-----------------------------+-----------
 version          | integer                     |
 contactid        | character varying           |
 startdate        | timestamp without time zone |
 enddate          | timestamp without time zone |
 preferredaddress | character varying           |
 firstname        | character varying           |
 lastname         | character varying           |


(Note - firstname and lastname are dervied from the detailnames of contactpart table)


              View "public.pvaddress"
  Column   |            Type             | Modifiers
-----------+-----------------------------+-----------
 version   | integer                     |
 contactid | character varying           |
 type      | character varying           |
 format    | character varying           |
 enddate   | timestamp without time zone |
 npi       | character varying           |
 ton       | character varying           |
 number    | character varying           |
 prefix    | character varying           |
 addrvalue | character varying           |
 link      | character varying           |
 house     | character varying           |
 street    | character varying           |
 town      | character varying           |
 city      | character varying           |
 county    | character varying           |
 postcode  | character varying           |
 state     | character varying           |
 zipcode   | character varying           |
 extension | character varying           |

 (Note - number, prefix, link, house, street, town, city, postcode etc are derived from the detailnames of addresspart
table)



For example, suppose we have 2 contactparts for a particular contact (with unique id = y): FirstName and LastName, then
thecontactpart table would have 2 rows like: 

id = x
version = 1
partname = 'FirstName'
partvalue = 'John'

id = x+1
version = 1
partname = 'LastName'
partvalue = 'Doe'


Then the public view, PvContact, would look like:

Version = 1
ContactId = y
StartDate = ...
EndDate = ...
PreferredAddress = ...
FirstName = John
LastName = Doe



All Create, Read, Update, Delete operations on the DB are performed by StoredProcedures (again, to help abstract the
codefrom the DB). The SPs are capable of dealing with externally (public view) advertised schemas, and ensuring data
integrityacross the underlying tables. 

Now, our problem seems to be the delays introduced by reading from the public views. I've taken some measurements of
rawINSERTS that mirror what the SPs are doing (but the data is invalid and its not correctly linked across tables
(whichis what the StoredProcs are responsible for) - but the INSERTS are happening in the same order and frequency for
avalid data upload). We can upload 2000 sets of users+contacts+addresses in about 17 seconds. But when it is done via
theStored Procedures (which do some inserts, some reads, some more inserts etc to ensure tables are properly linked via
therelationships), this drops to 2 minutes for 2000. And the performance spirals down to less than 1
user+contact+addressper second after a short while. 


First of all then, the definition of the PublicView PvAddress

View definition:
 SELECT address.id AS addressid, address."version", contact.id AS contactid, contact.contactid AS contactuuid,
address."type",address.format, address.enddate, address.value, rel_npi.npiid, rel_npi.npi, rel_ton.tonid, rel_ton.ton,
rel_number.numberid,rel_number.number, rel_prefix.prefixid, rel_prefix.prefix, rel_addrvalue.addrvalueid,
rel_addrvalue.addrvalue,rel_link.linkid, rel_link.link, rel_house.houseid, rel_house.house, rel_street.streetid,
rel_street.street,rel_town.townid, rel_town.town, rel_city.cityid, rel_city.city, rel_county.countyid,
rel_county.county,rel_postcode.postcodeid, rel_postcode.postcode, rel_state.stateid, rel_state.state,
rel_zipcode.zipcodeid,rel_zipcode.zipcode, rel_extension.extensionid, rel_extension.extension 
   FROM svcurrentcontactdetails contact, rel_contact_has_address rel_contact, svcurrentaddress address
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS npiid, det.detailvalue AS npi
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Npi'::text) rel_npi ON address.id =
rel_npi.addressid
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS tonid, det.detailvalue AS ton
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Ton'::text) rel_ton ON address.id =
rel_ton.addressid
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS numberid, det.detailvalue AS number
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Number'::text) rel_number ON address.id =
rel_number.addressid
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS prefixid, det.detailvalue AS prefix
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Prefix'::text) rel_prefix ON address.id =
rel_prefix.addressid
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS addrvalueid, det.detailvalue AS addrvalue
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'AddrValue'::text) rel_addrvalue ON
address.id= rel_addrvalue.addressid 
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS linkid, det.detailvalue AS link
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Link'::text) rel_link ON address.id =
rel_link.addressid
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS houseid, det.detailvalue AS house
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'House'::text) rel_house ON address.id =
rel_house.addressid
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS streetid, det.detailvalue AS street
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Street'::text) rel_street ON address.id =
rel_street.addressid
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS townid, det.detailvalue AS town
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Town'::text) rel_town ON address.id =
rel_town.addressid
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS cityid, det.detailvalue AS city
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'City'::text) rel_city ON address.id =
rel_city.addressid
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS countyid, det.detailvalue AS county
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'County'::text) rel_county ON address.id =
rel_county.addressid
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS postcodeid, det.detailvalue AS postcode
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Postcode'::text) rel_postcode ON
address.id= rel_postcode.addressid 
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS stateid, det.detailvalue AS state
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'State'::text) rel_state ON address.id =
rel_state.addressid
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS zipcodeid, det.detailvalue AS zipcode
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Zipcode'::text) rel_zipcode ON address.id
=rel_zipcode.addressid 
   LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS extensionid, det.detailvalue AS extension
                FROM rel_address_has_addresspart rel, addresspart det
               WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Extension'::text) rel_extension ON
address.id= rel_extension.addressid 
  WHERE contact.id = rel_contact.contact_id AND address.id = rel_contact.address_id;

(The JOINs are where our problems are below ...)




hydradb=# explain select * from pvaddress where contactuuid = 'test' and type = 'sms' and format is null ;

         QUERY PLAN 


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------
 Merge Join  (cost=42499.93..44975.38 rows=1 width=358)
   Merge Cond: ("outer".id = "inner".address_id)
   ->  Merge Left Join  (cost=42491.11..44957.05 rows=3795 width=323)
         Merge Cond: ("outer".id = "inner".address_id)
         ->  Merge Left Join  (cost=41822.20..44278.07 rows=3795 width=305)
               Merge Cond: ("outer".id = "inner".address_id)
               ->  Merge Left Join  (cost=41153.29..43599.08 rows=3795 width=287)
                     Merge Cond: ("outer".id = "inner".address_id)
                     ->  Merge Left Join  (cost=40484.39..42920.10 rows=3795 width=269)
                           Merge Cond: ("outer".id = "inner".address_id)
                           ->  Merge Left Join  (cost=39815.48..42241.12 rows=3795 width=251)
                                 Merge Cond: ("outer".id = "inner".address_id)
                                 ->  Merge Left Join  (cost=39146.58..41562.13 rows=3795 width=233)
                                       Merge Cond: ("outer".id = "inner".address_id)
                                       ->  Merge Left Join  (cost=38477.67..40883.15 rows=3795 width=215)
                                             Merge Cond: ("outer".id = "inner".address_id)
                                             ->  Merge Left Join  (cost=37808.76..40204.16 rows=3795 width=197)
                                                   Merge Cond: ("outer".id = "inner".address_id)
                                                   ->  Merge Left Join  (cost=37139.86..39525.18 rows=3795 width=179)
                                                         Merge Cond: ("outer".id = "inner".address_id)
                                                         ->  Merge Left Join  (cost=36470.95..38846.20 rows=3795
width=161)
                                                               Merge Cond: ("outer".id = "inner".address_id)
                                                               ->  Merge Left Join  (cost=35802.04..38167.21 rows=3795
width=143)
                                                                     Merge Cond: ("outer".id = "inner".address_id)
                                                                     ->  Merge Left Join  (cost=28634.40..30852.70
rows=3795width=125) 
                                                                           Merge Cond: ("outer".id =
"inner".address_id)
                                                                           ->  Merge Left Join
(cost=21495.85..23569.10rows=3795 width=107) 
                                                                                 Merge Cond: ("outer".id =
"inner".address_id)
                                                                                 ->  Merge Left Join
(cost=14328.21..16254.59rows=3795 width=89) 
                                                                                       Merge Cond: ("outer".id =
"inner".address_id)
                                                                                       ->  Merge Left Join
(cost=7102.23..8878.06rows=3795 width=71) 
                                                                                             Merge Cond: ("outer".id =
"inner".address_id)
                                                                                             ->  Index Scan using
address_id_idxon address  (cost=0.00..1633.07 rows=3795 width=53) 
                                                                                                   Filter: (((enddate
ISNULL) OR (('now'::text)::timestamp(6) with time zone < (enddate)::timestamp with time zone)) AND (("typ 
e")::text = 'sms'::text) AND (format IS NULL))
                                                                                             ->  Sort
(cost=7102.23..7159.65rows=22970 width=22) 
                                                                                                   Sort Key:
rel.address_id
                                                                                                   ->  Merge Join
(cost=0.00..5438.34rows=22970 width=22) 
                                                                                                         Merge Cond:
("outer".id= "inner".addressline_id) 
                                                                                                         ->  Index Scan
usingaddressline_id_idx on addressline det  (cost=0.00..2773.61 rows=22969 width=18) 
                                                                                                               Filter:
((detailname)::text= 'Npi'::text) 
                                                                                                         ->  Index Scan
usingrel_address_has_addressline_idx2 on rel_address_has_addressline rel  (cost=0.00..2082.13 rows=1181 
93 width=8)
                                                                                       ->  Sort  (cost=7225.98..7286.76
rows=24310width=22) 
                                                                                             Sort Key: rel.address_id
                                                                                             ->  Merge Join
(cost=0.00..5455.09rows=24310 width=22) 
                                                                                                   Merge Cond:
("outer".id= "inner".addressline_id) 
                                                                                                   ->  Index Scan using
addressline_id_idxon addressline det  (cost=0.00..2773.61 rows=24309 width=18) 
                                                                                                         Filter:
((detailname)::text= 'Ton'::text) 
                                                                                                   ->  Index Scan using
rel_address_has_addressline_idx2on rel_address_has_addressline rel  (cost=0.00..2082.13 rows=118193 wid 
th=8)
                                                                                 ->  Sort  (cost=7167.64..7226.84
rows=23679width=22) 
                                                                                       Sort Key: rel.address_id
                                                                                       ->  Merge Join
(cost=0.00..5447.20rows=23679 width=22) 
                                                                                             Merge Cond: ("outer".id =
"inner".addressline_id)
                                                                                             ->  Index Scan using
addressline_id_idxon addressline det  (cost=0.00..2773.61 rows=23678 width=18) 
                                                                                                   Filter:
((detailname)::text= 'Number'::text) 
                                                                                             ->  Index Scan using
rel_address_has_addressline_idx2on rel_address_has_addressline rel  (cost=0.00..2082.13 rows=118193 width=8) 
                                                                           ->  Sort  (cost=7138.56..7196.97 rows=23364
width=22)
                                                                                 Sort Key: rel.address_id
                                                                                 ->  Merge Join  (cost=0.00..5443.27
rows=23364width=22) 
                                                                                       Merge Cond: ("outer".id =
"inner".addressline_id)
                                                                                       ->  Index Scan using
addressline_id_idxon addressline det  (cost=0.00..2773.61 rows=23363 width=18) 
                                                                                             Filter:
((detailname)::text= 'Prefix'::text) 
                                                                                       ->  Index Scan using
rel_address_has_addressline_idx2on rel_address_has_addressline rel  (cost=0.00..2082.13 rows=118193 width=8) 
                                                                     ->  Sort  (cost=7167.64..7226.84 rows=23679
width=22)
                                                                           Sort Key: rel.address_id
                                                                           ->  Merge Join  (cost=0.00..5447.20
rows=23679width=22) 
                                                                                 Merge Cond: ("outer".id =
"inner".addressline_id)
                                                                                 ->  Index Scan using
addressline_id_idxon addressline det  (cost=0.00..2773.61 rows=23678 width=18) 
                                                                                       Filter: ((detailname)::text =
'AddrValue'::text)
                                                                                 ->  Index Scan using
rel_address_has_addressline_idx2on rel_address_has_addressline rel  (cost=0.00..2082.13 rows=118193 width=8) 
                                                               ->  Sort  (cost=668.91..669.16 rows=100 width=22)
                                                                     Sort Key: rel.address_id
                                                                     ->  Nested Loop  (cost=0.00..665.58 rows=100
width=22)
                                                                           ->  Index Scan using addressline_detail_idx
onaddressline det  (cost=0.00..366.01 rows=99 width=18) 
                                                                                 Index Cond: ((detailname)::text =
'Link'::text)
                                                                           ->  Index Scan using
rel_address_has_addressline_idx2on rel_address_has_addressline rel  (cost=0.00..3.01 rows=1 width=8) 
                                                                                 Index Cond: (rel.addressline_id =
"outer".id)
                                                         ->  Sort  (cost=668.91..669.16 rows=100 width=22)
                                                               Sort Key: rel.address_id
                                                               ->  Nested Loop  (cost=0.00..665.58 rows=100 width=22)
                                                                     ->  Index Scan using addressline_detail_idx on
addresslinedet  (cost=0.00..366.01 rows=99 width=18) 
                                                                           Index Cond: ((detailname)::text =
'House'::text)
                                                                     ->  Index Scan using
rel_address_has_addressline_idx2on rel_address_has_addressline rel  (cost=0.00..3.01 rows=1 width=8) 
                                                                           Index Cond: (rel.addressline_id =
"outer".id)
                                                   ->  Sort  (cost=668.91..669.16 rows=100 width=22)
                                                         Sort Key: rel.address_id
                                                         ->  Nested Loop  (cost=0.00..665.58 rows=100 width=22)
                                                               ->  Index Scan using addressline_detail_idx on
addresslinedet  (cost=0.00..366.01 rows=99 width=18) 
                                                                     Index Cond: ((detailname)::text = 'Street'::text)
                                                               ->  Index Scan using rel_address_has_addressline_idx2 on
rel_address_has_addresslinerel  (cost=0.00..3.01 rows=1 width=8) 
                                                                     Index Cond: (rel.addressline_id = "outer".id)
                                             ->  Sort  (cost=668.91..669.16 rows=100 width=22)
                                                   Sort Key: rel.address_id
                                                   ->  Nested Loop  (cost=0.00..665.58 rows=100 width=22)
                                                         ->  Index Scan using addressline_detail_idx on addressline det
(cost=0.00..366.01 rows=99 width=18) 
                                                               Index Cond: ((detailname)::text = 'Town'::text)
                                                         ->  Index Scan using rel_address_has_addressline_idx2 on
rel_address_has_addresslinerel  (cost=0.00..3.01 rows=1 width=8) 
                                                               Index Cond: (rel.addressline_id = "outer".id)
                                       ->  Sort  (cost=668.91..669.16 rows=100 width=22)
                                             Sort Key: rel.address_id
                                             ->  Nested Loop  (cost=0.00..665.58 rows=100 width=22)
                                                   ->  Index Scan using addressline_detail_idx on addressline det
(cost=0.00..366.01rows=99 width=18) 
                                                         Index Cond: ((detailname)::text = 'City'::text)
                                                   ->  Index Scan using rel_address_has_addressline_idx2 on
rel_address_has_addresslinerel  (cost=0.00..3.01 rows=1 width=8) 
                                                         Index Cond: (rel.addressline_id = "outer".id)
                                 ->  Sort  (cost=668.91..669.16 rows=100 width=22)
                                       Sort Key: rel.address_id
                                       ->  Nested Loop  (cost=0.00..665.58 rows=100 width=22)
                                             ->  Index Scan using addressline_detail_idx on addressline det
(cost=0.00..366.01rows=99 width=18) 
                                                   Index Cond: ((detailname)::text = 'County'::text)
                                             ->  Index Scan using rel_address_has_addressline_idx2 on
rel_address_has_addresslinerel  (cost=0.00..3.01 rows=1 width=8) 
                                                   Index Cond: (rel.addressline_id = "outer".id)
                           ->  Sort  (cost=668.91..669.16 rows=100 width=22)
                                 Sort Key: rel.address_id
                                 ->  Nested Loop  (cost=0.00..665.58 rows=100 width=22)
                                       ->  Index Scan using addressline_detail_idx on addressline det
(cost=0.00..366.01rows=99 width=18) 
                                             Index Cond: ((detailname)::text = 'Postcode'::text)
                                       ->  Index Scan using rel_address_has_addressline_idx2 on
rel_address_has_addresslinerel  (cost=0.00..3.01 rows=1 width=8) 
                                             Index Cond: (rel.addressline_id = "outer".id)
                     ->  Sort  (cost=668.91..669.16 rows=100 width=22)
                           Sort Key: rel.address_id
                           ->  Nested Loop  (cost=0.00..665.58 rows=100 width=22)
                                 ->  Index Scan using addressline_detail_idx on addressline det  (cost=0.00..366.01
rows=99width=18) 
                                       Index Cond: ((detailname)::text = 'State'::text)
                                 ->  Index Scan using rel_address_has_addressline_idx2 on rel_address_has_addressline
rel (cost=0.00..3.01 rows=1 width=8) 
                                       Index Cond: (rel.addressline_id = "outer".id)
               ->  Sort  (cost=668.91..669.16 rows=100 width=22)
                     Sort Key: rel.address_id
                     ->  Nested Loop  (cost=0.00..665.58 rows=100 width=22)
                           ->  Index Scan using addressline_detail_idx on addressline det  (cost=0.00..366.01 rows=99
width=18)
                                 Index Cond: ((detailname)::text = 'Zipcode'::text)
                           ->  Index Scan using rel_address_has_addressline_idx2 on rel_address_has_addressline rel
(cost=0.00..3.01rows=1 width=8) 
                                 Index Cond: (rel.addressline_id = "outer".id)
         ->  Sort  (cost=668.91..669.16 rows=100 width=22)
               Sort Key: rel.address_id
               ->  Nested Loop  (cost=0.00..665.58 rows=100 width=22)
                     ->  Index Scan using addressline_detail_idx on addressline det  (cost=0.00..366.01 rows=99
width=18)
                           Index Cond: ((detailname)::text = 'Extension'::text)
                     ->  Index Scan using rel_address_has_addressline_idx2 on rel_address_has_addressline rel
(cost=0.00..3.01rows=1 width=8) 
                           Index Cond: (rel.addressline_id = "outer".id)
   ->  Sort  (cost=8.83..8.83 rows=2 width=39)
         Sort Key: rel_contact.address_id
         ->  Nested Loop  (cost=0.00..8.82 rows=2 width=39)
               ->  Index Scan using contact_key on contact  (cost=0.00..5.77 rows=1 width=35)
                     Index Cond: ((contactid)::text = 'test'::text)
                     Filter: (((startdate IS NULL) OR (('now'::text)::timestamp(6) with time zone >=
(startdate)::timestampwith time zone)) AND ((enddate IS NULL) OR (('now'::text)::timestamp(6) with time zone <
(enddate)::
timestamp with time zone)))
               ->  Index Scan using rel_contact_has_address_idx1 on rel_contact_has_address rel_contact
(cost=0.00..3.02rows=2 width=8) 
                     Index Cond: ("outer".id = rel_contact.contact_id)
(147 rows)





As you can see, the PublicView is resulting in a huge nested loop, with an index scan of the contact only occurring at
theend. I would have expected something more like: 

(1) An index scan of the contact table to determine the correct contact
(2) An index scan of the address table using the rel_contact_has_address.address_id to obtain the (relatively small -
max16, and typically 2) addresses 
(3) A number of joins - at the same level rather than looping - to obtain the detailnames for the new column names of
thepublic view 



As I said in my original email, these delays are after applying all the performance related enhancements (fsync off,
increasedbackbuffers, sort memory etc) I have picked up from the archives and FAQ. The upload script was also modified
tocommit and vacuum analyze at different intervals without providing any significant improvement. top reports the CPU
usageat 99% - so I believe its all looping of the above intermediate SELECTs that is causing the spiralling delays as
thenumber of rows increases. 


Again, any help would be very much appreciated!

Damien


Re: Very Poor Insert Performance

От
Tom Lane
Дата:
Damien Dougan <damien.dougan@mobilecohesion.com> writes:
> Now, our problem seems to be the delays introduced by reading from the
> public views.

Your initial message stated plainly that the problem was in INSERTs;
it's not surprising that you got unhelpful advice.

> View definition:
> [ huge view full of LEFT JOINs ]

> As you can see, the PublicView is resulting in a huge nested loop,
> with an index scan of the contact only occurring at the end. I would
> have expected something more like:

> (1) An index scan of the contact table to determine the correct contact
> (2) An index scan of the address table using the rel_contact_has_address.address_id to obtain the (relatively small -
max16, and typically 2) addresses 
> (3) A number of joins - at the same level rather than looping - to obtain the detailnames for the new column names of
thepublic view 

Your LEFT JOINs are constraining the join order --- see
http://www.postgresql.org/docs/7.3/static/explicit-joins.html
You'll need to reorder the joins into something that does what you want.

            regards, tom lane

Re: Very Poor Insert Performance

От
Damien Dougan
Дата:
On Wednesday 29 October 2003 2:23 pm, Tom Lane wrote:

> Your initial message stated plainly that the problem was in INSERTs;
> it's not surprising that you got unhelpful advice.

But perhaps my use of the term "insert" to describe upload was a very bad call
given the domain of the list...

I assure you I wasn't setting out to deceive anyone! The only location i used
INSERT (ie as a Postgres keyword) was towards the end of my mail when I tried
to highlight the fact we couldn't use COPY to upload our data because of the
difficulty in maintaining the code to generate inter-table relations ahead of
time.

The problem was showing itself during database upload - so I assumed (ASS out
of U and ME and all that!) that the write delay was very large (hence the
disappointing improvements by switching off fsync etc). It was only after
further investigation that we discovered that simulated INSERTs were going
fine, but the Read delays between INSERTs where holding us up.


> Your LEFT JOINs are constraining the join order --- see
> http://www.postgresql.org/docs/7.3/static/explicit-joins.html
> You'll need to reorder the joins into something that does what you want.

Thanks very much for the heads-up, we'll reorder the joins into something more
effecient!

Damien