Обсуждение: slow query on tables with new columns added.

Поиск
Список
Период
Сортировка

slow query on tables with new columns added.

От
"M. D."
Дата:
Hi everyone,

I did a software upgrade, and with it came a new feature where when
selecting a customer it queries for the sum of a few columns.  This
takes 7 seconds for the 'Cash Sale' customer - by far the most active
customer. I'd like to see if it's possible to get it down a bit by
changing settings.

Query:
explain analyse select sum(item_points),sum(disc_points) from invoice
left join gltx on invoice.invoice_id = gltx.gltx_id
where gltx.inactive_on is null and gltx.posted = 'Y' and
gltx.customer_id = 'A0ZQ2gsACIsEKLI638ikyg'

item_points and disc_points are the 2 columns added, so they are mostly 0.

table info:
CREATE TABLE gltx   -- rows: 894,712
(
   gltx_id character(22) NOT NULL,
   "version" integer NOT NULL,
   created_by character varying(16) NOT NULL,
   updated_by character varying(16),
   inactive_by character varying(16),
   created_on date NOT NULL,
   updated_on date,
   inactive_on date,
   external_id numeric(14,0),
   data_type integer NOT NULL,
   "number" character varying(14) NOT NULL,
   reference_str character varying(14),
   post_date date NOT NULL,
   post_time time without time zone NOT NULL,
   work_date date NOT NULL,
   memo text,
   customer_id character(22),
   vendor_id character(22),
   station_id character(22),
   employee_id character(22),
   store_id character(22) NOT NULL,
   shift_id character(22),
   link_id character(22),
   link_num integer NOT NULL,
   printed character(1) NOT NULL,
   paid character(1) NOT NULL,
   posted character(1) NOT NULL,
   amount numeric(18,4) NOT NULL,
   card_amt numeric(18,4) NOT NULL,
   paid_amt numeric(18,4) NOT NULL,
   paid_date date,
   due_date date,
   CONSTRAINT gltx_pkey PRIMARY KEY (gltx_id),
   CONSTRAINT gltx_c0 FOREIGN KEY (customer_id)
       REFERENCES customer (customer_id) MATCH SIMPLE
       ON UPDATE NO ACTION ON DELETE NO ACTION,
   CONSTRAINT gltx_c1 FOREIGN KEY (vendor_id)
       REFERENCES vendor (vendor_id) MATCH SIMPLE
       ON UPDATE NO ACTION ON DELETE NO ACTION,
   CONSTRAINT gltx_c2 FOREIGN KEY (station_id)
       REFERENCES station (station_id) MATCH SIMPLE
       ON UPDATE NO ACTION ON DELETE NO ACTION,
   CONSTRAINT gltx_c3 FOREIGN KEY (employee_id)
       REFERENCES employee (employee_id) MATCH SIMPLE
       ON UPDATE NO ACTION ON DELETE NO ACTION,
   CONSTRAINT gltx_c4 FOREIGN KEY (store_id)
       REFERENCES store (store_id) MATCH SIMPLE
       ON UPDATE NO ACTION ON DELETE NO ACTION,
   CONSTRAINT gltx_c5 FOREIGN KEY (shift_id)
       REFERENCES shift (shift_id) MATCH SIMPLE
       ON UPDATE NO ACTION ON DELETE NO ACTION,
   CONSTRAINT gltx_c6 FOREIGN KEY (link_id)
       REFERENCES gltx (gltx_id) MATCH SIMPLE
       ON UPDATE NO ACTION ON DELETE SET NULL
)
WITH (
   OIDS=FALSE
);
ALTER TABLE gltx OWNER TO quasar;
GRANT ALL ON TABLE gltx TO quasar;

CREATE INDEX gltx_i0
   ON gltx
   USING btree
   (data_type);

CREATE INDEX gltx_i1
   ON gltx
   USING btree
   (post_date);

CREATE INDEX gltx_i2
   ON gltx
   USING btree
   (number);

CREATE INDEX gltx_i3
   ON gltx
   USING btree
   (data_type, number);

CREATE INDEX gltx_i4
   ON gltx
   USING btree
   (customer_id, paid);

CREATE INDEX gltx_i5
   ON gltx
   USING btree
   (vendor_id, paid);

CREATE INDEX gltx_i6
   ON gltx
   USING btree
   (work_date);

CREATE INDEX gltx_i7
   ON gltx
   USING btree
   (link_id);


CREATE TABLE invoice  -- 623,270 rows
(
   invoice_id character(22) NOT NULL,
   ship_id character(22),
   ship_via character varying(20),
   term_id character(22),
   promised_date date,
   tax_exempt_id character(22),
   customer_addr text,
   ship_addr text,
   comments text,
   item_points numeric(14,0) NOT NULL,
   disc_points numeric(14,0) NOT NULL,
   CONSTRAINT invoice_pkey PRIMARY KEY (invoice_id),
   CONSTRAINT invoice_c0 FOREIGN KEY (invoice_id)
       REFERENCES gltx (gltx_id) MATCH SIMPLE
       ON UPDATE NO ACTION ON DELETE CASCADE,
   CONSTRAINT invoice_c1 FOREIGN KEY (ship_id)
       REFERENCES customer (customer_id) MATCH SIMPLE
       ON UPDATE NO ACTION ON DELETE NO ACTION,
   CONSTRAINT invoice_c2 FOREIGN KEY (term_id)
       REFERENCES term (term_id) MATCH SIMPLE
       ON UPDATE NO ACTION ON DELETE NO ACTION,
   CONSTRAINT invoice_c3 FOREIGN KEY (tax_exempt_id)
       REFERENCES tax (tax_id) MATCH SIMPLE
       ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
   OIDS=FALSE
);

Both tables have mostly writes, some updates, very few deletes.

Explain analyse: (http://explain.depesz.com/s/SYW)

Aggregate  (cost=179199.52..179199.53 rows=1 width=10) (actual time=7520.922..7520.924 rows=1 loops=1)
   ->   Merge Join  (cost=9878.78..177265.66 rows=386771 width=10) (actual time=104.651..6690.194 rows=361463 loops=1)
         Merge Cond: (invoice.invoice_id = gltx.gltx_id)
         ->   Index Scan using invoice_pkey on invoice  (cost=0.00..86222.54 rows=623273 width=33) (actual
time=0.010..1316.507rows=623273 loops=1) 
         ->   Index Scan using gltx_pkey on gltx  (cost=0.00..108798.53 rows=386771 width=23) (actual
time=104.588..1822.886rows=361464 loops=1) 
               Filter: ((gltx.inactive_on IS NULL) AND (gltx.posted = 'Y'::bpchar) AND (gltx.customer_id =
'A0ZQ2gsACIsEKLI638ikyg'::bpchar))
Total runtime: 7521.026 ms


PostgreSQL: 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50), 64-bit  -
selfcompiled 
Linux: Linux server.domain.lan 2.6.18-238.12.1.el5xen #1 SMP Tue May 31 13:35:45 EDT 2011 x86_64 x86_64 x86_64
GNU/Linux

Hardware: single CPU: model name      : Intel(R) Xeon(R) CPU           E5335  @ 2.00GHz
RAM: 8GB
DB Size: 5876MB
HDs: Raid 1 Sata drives - dell PowerEdge 1900 - lower middle class server


Postgres config:
max_connections = 200                   #it's a bit high I know, but most connections are idle
shared_buffers = 2048MB                 #
work_mem = 8MB                          # tried up to 32MB, but no diff
maintenance_work_mem = 16MB             #
bgwriter_delay = 2000ms                 #
checkpoint_segments = 15                #
checkpoint_completion_target = 0.8      #
seq_page_cost = 5.0                     #
random_page_cost = 2.5                  #
effective_cache_size = 2048MB        # just upgraded to 2GB. had another aggressive memory using program before, so did
notwant to have this high 
log_destination = 'stderr'              #
logging_collector = off         #
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' #
log_rotation_age = 1d                   #
log_min_duration_statement = 10000      #
log_line_prefix='%t:%r:%u@%d:[%p]: '                    #
track_activities = on
track_counts = on
track_activity_query_size = 1024        #
autovacuum = on                     #
autovacuum_max_workers = 5              #
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'                     #
lc_monetary = 'en_US.UTF-8'                     #
lc_numeric = 'en_US.UTF-8'                      #
lc_time = 'en_US.UTF-8'                         #
default_text_search_config = 'pg_catalog.english'




Re: slow query on tables with new columns added.

От
Filip Rembiałkowski
Дата:

2011/9/23 M. D. <lists@turnkey.bz>

I did a software upgrade, and with it came a new feature where when selecting a customer it queries for the sum of a few columns.  This takes 7 seconds for the 'Cash Sale' customer - by far the most active customer. I'd like to see if it's possible to get it down a bit by changing settings.


To make things clear before we search for a solution. You wrote "by changing settings". Is it the only option? Can't you change the query in software? Can't you change database schema (add indexes etc)?
 

Query:
explain analyse select sum(item_points),sum(disc_points) from invoice left join gltx on invoice.invoice_id = gltx.gltx_id
where gltx.inactive_on is null and gltx.posted = 'Y' and gltx.customer_id = 'A0ZQ2gsACIsEKLI638ikyg'

Aside from other things, you know that LEFT join here is useless? - planner should collapse it to normal join but I'd check.


Filip

Re: slow query on tables with new columns added.

От
"M. D."
Дата:
I have full access to the database, but no access to the application source code.  If creating an index will help, I
cando that, but with the columns I don't see it helping as I don't have access to the application source to change
that.<br /><br /> So yes, by changing settings, I would like to know if there's any memory settings I can change to
helpor create an index.  There is an index on the customer_id column in the gltx table, so I'm not sure what else could
bedone.<br /><br /> If there was a way to create a select trigger, I would do it and return 0 for both columns on that
customer_idas it should always be 0.<br /><br /><br /> On 09/24/2011 12:10 AM, Filip Rembiałkowski wrote: <blockquote
cite="mid:CAP_rww=w1fHxy01-hMciengN=6ovxtZWN4FgRDNdywb-Htm3iQ@mail.gmail.com"type="cite"><br /><div
class="gmail_quote">2011/9/23M. D. <span dir="ltr"><<a href="mailto:lists@turnkey.bz"
moz-do-not-send="true">lists@turnkey.bz</a>></span><br/><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex;"><br /> I did a software upgrade, and with it came a new feature where
whenselecting a customer it queries for the sum of a few columns.  This takes 7 seconds for the 'Cash Sale' customer -
byfar the most active customer. I'd like to see if it's possible to get it down a bit by changing settings.<br /><br
/></blockquote><div><br/> To make things clear before we search for a solution. You wrote "by changing settings". Is it
theonly option? Can't you change the query in software? Can't you change database schema (add indexes etc)?<br />   <br
/><br/></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt         0.8ex; border-left: 1px solid rgb(204,
204,204);         padding-left: 1ex;"> Query:<br /> explain analyse select sum(item_points),sum(disc_points) from
invoiceleft join gltx on invoice.invoice_id = gltx.gltx_id<br /> where gltx.inactive_on is null and gltx.posted = 'Y'
andgltx.customer_id = 'A0ZQ2gsACIsEKLI638ikyg'<br /></blockquote><div><br /> Aside from other things, you know that
LEFTjoin here is useless? - planner should collapse it to normal join but I'd check.<br /><br /><br /> Filip<br /><br
/></div></div></blockquote><br/> 

Re: slow query on tables with new columns added.

От
Filip Rembiałkowski
Дата:


2011/9/26 M. D. <lists@turnkey.bz>
I have full access to the database, but no access to the application source code.  If creating an index will help, I can do that, but with the columns I don't see it helping as I don't have access to the application source to change that.

So yes, by changing settings, I would like to know if there's any memory settings I can change to help or create an index.  There is an index on the customer_id column in the gltx table, so I'm not sure what else could be done.

If there was a way to create a select trigger, I would do it and return 0 for both columns on that customer_id as it should always be 0.



Hi

I didn't respond earlier, because I actually don't see any easy way of speeding up the query.

The memory settings seem fine for this size of data.

It does not look like you can change things by simply adding indexes. I mean, you can certainly add a specially crafted partial index on gltx.customer_id WHERE (gltx.inactive_on IS NULL) AND (gltx.posted = 'Y') - this can earn you a few percent max.

The problem here might be the type of join columns - we can see they are about 10 characters which is not an ideal choice (that's one of reasons why I'm a fan of artificial integer pkeys).

You _could_ try running the query with enable_mergejoin = off and see what happens.

You can check if the problem persists after dumping and reloading to another db.

If app modification was possible, you could materialize the data _before_ it must be queried - using summary table and appropriate triggers for keeping it up-to-date.

Regarding your last comment - on that customer_id values should be 0 - if it's a persistent business rule, I would try to create a CHECK to reflect it. With some luck and fiddling, constraint_exclusion might come to help  with speeding up your query.

Also, if there is something special about customer_id distribution - table partitioning might be an option.

Ok, that's a long list - hope this helps, and good luck.

After all you can throw more hardware at the problem - or hire some Pg magician :-)