Обсуждение: slow query on tables with new columns added.
Hi everyone, I did a software upgrade, and with it came a new feature where when selecting a customer it queries for the sum of a few columns. This takes 7 seconds for the 'Cash Sale' customer - by far the most active customer. I'd like to see if it's possible to get it down a bit by changing settings. Query: explain analyse select sum(item_points),sum(disc_points) from invoice left join gltx on invoice.invoice_id = gltx.gltx_id where gltx.inactive_on is null and gltx.posted = 'Y' and gltx.customer_id = 'A0ZQ2gsACIsEKLI638ikyg' item_points and disc_points are the 2 columns added, so they are mostly 0. table info: CREATE TABLE gltx -- rows: 894,712 ( gltx_id character(22) NOT NULL, "version" integer NOT NULL, created_by character varying(16) NOT NULL, updated_by character varying(16), inactive_by character varying(16), created_on date NOT NULL, updated_on date, inactive_on date, external_id numeric(14,0), data_type integer NOT NULL, "number" character varying(14) NOT NULL, reference_str character varying(14), post_date date NOT NULL, post_time time without time zone NOT NULL, work_date date NOT NULL, memo text, customer_id character(22), vendor_id character(22), station_id character(22), employee_id character(22), store_id character(22) NOT NULL, shift_id character(22), link_id character(22), link_num integer NOT NULL, printed character(1) NOT NULL, paid character(1) NOT NULL, posted character(1) NOT NULL, amount numeric(18,4) NOT NULL, card_amt numeric(18,4) NOT NULL, paid_amt numeric(18,4) NOT NULL, paid_date date, due_date date, CONSTRAINT gltx_pkey PRIMARY KEY (gltx_id), CONSTRAINT gltx_c0 FOREIGN KEY (customer_id) REFERENCES customer (customer_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT gltx_c1 FOREIGN KEY (vendor_id) REFERENCES vendor (vendor_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT gltx_c2 FOREIGN KEY (station_id) REFERENCES station (station_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT gltx_c3 FOREIGN KEY (employee_id) REFERENCES employee (employee_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT gltx_c4 FOREIGN KEY (store_id) REFERENCES store (store_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT gltx_c5 FOREIGN KEY (shift_id) REFERENCES shift (shift_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT gltx_c6 FOREIGN KEY (link_id) REFERENCES gltx (gltx_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE SET NULL ) WITH ( OIDS=FALSE ); ALTER TABLE gltx OWNER TO quasar; GRANT ALL ON TABLE gltx TO quasar; CREATE INDEX gltx_i0 ON gltx USING btree (data_type); CREATE INDEX gltx_i1 ON gltx USING btree (post_date); CREATE INDEX gltx_i2 ON gltx USING btree (number); CREATE INDEX gltx_i3 ON gltx USING btree (data_type, number); CREATE INDEX gltx_i4 ON gltx USING btree (customer_id, paid); CREATE INDEX gltx_i5 ON gltx USING btree (vendor_id, paid); CREATE INDEX gltx_i6 ON gltx USING btree (work_date); CREATE INDEX gltx_i7 ON gltx USING btree (link_id); CREATE TABLE invoice -- 623,270 rows ( invoice_id character(22) NOT NULL, ship_id character(22), ship_via character varying(20), term_id character(22), promised_date date, tax_exempt_id character(22), customer_addr text, ship_addr text, comments text, item_points numeric(14,0) NOT NULL, disc_points numeric(14,0) NOT NULL, CONSTRAINT invoice_pkey PRIMARY KEY (invoice_id), CONSTRAINT invoice_c0 FOREIGN KEY (invoice_id) REFERENCES gltx (gltx_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE, CONSTRAINT invoice_c1 FOREIGN KEY (ship_id) REFERENCES customer (customer_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT invoice_c2 FOREIGN KEY (term_id) REFERENCES term (term_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT invoice_c3 FOREIGN KEY (tax_exempt_id) REFERENCES tax (tax_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH ( OIDS=FALSE ); Both tables have mostly writes, some updates, very few deletes. Explain analyse: (http://explain.depesz.com/s/SYW) Aggregate (cost=179199.52..179199.53 rows=1 width=10) (actual time=7520.922..7520.924 rows=1 loops=1) -> Merge Join (cost=9878.78..177265.66 rows=386771 width=10) (actual time=104.651..6690.194 rows=361463 loops=1) Merge Cond: (invoice.invoice_id = gltx.gltx_id) -> Index Scan using invoice_pkey on invoice (cost=0.00..86222.54 rows=623273 width=33) (actual time=0.010..1316.507rows=623273 loops=1) -> Index Scan using gltx_pkey on gltx (cost=0.00..108798.53 rows=386771 width=23) (actual time=104.588..1822.886rows=361464 loops=1) Filter: ((gltx.inactive_on IS NULL) AND (gltx.posted = 'Y'::bpchar) AND (gltx.customer_id = 'A0ZQ2gsACIsEKLI638ikyg'::bpchar)) Total runtime: 7521.026 ms PostgreSQL: 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50), 64-bit - selfcompiled Linux: Linux server.domain.lan 2.6.18-238.12.1.el5xen #1 SMP Tue May 31 13:35:45 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux Hardware: single CPU: model name : Intel(R) Xeon(R) CPU E5335 @ 2.00GHz RAM: 8GB DB Size: 5876MB HDs: Raid 1 Sata drives - dell PowerEdge 1900 - lower middle class server Postgres config: max_connections = 200 #it's a bit high I know, but most connections are idle shared_buffers = 2048MB # work_mem = 8MB # tried up to 32MB, but no diff maintenance_work_mem = 16MB # bgwriter_delay = 2000ms # checkpoint_segments = 15 # checkpoint_completion_target = 0.8 # seq_page_cost = 5.0 # random_page_cost = 2.5 # effective_cache_size = 2048MB # just upgraded to 2GB. had another aggressive memory using program before, so did notwant to have this high log_destination = 'stderr' # logging_collector = off # log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log_rotation_age = 1d # log_min_duration_statement = 10000 # log_line_prefix='%t:%r:%u@%d:[%p]: ' # track_activities = on track_counts = on track_activity_query_size = 1024 # autovacuum = on # autovacuum_max_workers = 5 # datestyle = 'iso, mdy' lc_messages = 'en_US.UTF-8' # lc_monetary = 'en_US.UTF-8' # lc_numeric = 'en_US.UTF-8' # lc_time = 'en_US.UTF-8' # default_text_search_config = 'pg_catalog.english'
2011/9/23 M. D. <lists@turnkey.bz>
To make things clear before we search for a solution. You wrote "by changing settings". Is it the only option? Can't you change the query in software? Can't you change database schema (add indexes etc)?
Aside from other things, you know that LEFT join here is useless? - planner should collapse it to normal join but I'd check.
Filip
I did a software upgrade, and with it came a new feature where when selecting a customer it queries for the sum of a few columns. This takes 7 seconds for the 'Cash Sale' customer - by far the most active customer. I'd like to see if it's possible to get it down a bit by changing settings.
To make things clear before we search for a solution. You wrote "by changing settings". Is it the only option? Can't you change the query in software? Can't you change database schema (add indexes etc)?
Query:
explain analyse select sum(item_points),sum(disc_points) from invoice left join gltx on invoice.invoice_id = gltx.gltx_id
where gltx.inactive_on is null and gltx.posted = 'Y' and gltx.customer_id = 'A0ZQ2gsACIsEKLI638ikyg'
Aside from other things, you know that LEFT join here is useless? - planner should collapse it to normal join but I'd check.
Filip
I have full access to the database, but no access to the application source code. If creating an index will help, I cando that, but with the columns I don't see it helping as I don't have access to the application source to change that.<br /><br /> So yes, by changing settings, I would like to know if there's any memory settings I can change to helpor create an index. There is an index on the customer_id column in the gltx table, so I'm not sure what else could bedone.<br /><br /> If there was a way to create a select trigger, I would do it and return 0 for both columns on that customer_idas it should always be 0.<br /><br /><br /> On 09/24/2011 12:10 AM, Filip Rembiałkowski wrote: <blockquote cite="mid:CAP_rww=w1fHxy01-hMciengN=6ovxtZWN4FgRDNdywb-Htm3iQ@mail.gmail.com"type="cite"><br /><div class="gmail_quote">2011/9/23M. D. <span dir="ltr"><<a href="mailto:lists@turnkey.bz" moz-do-not-send="true">lists@turnkey.bz</a>></span><br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex;"><br /> I did a software upgrade, and with it came a new feature where whenselecting a customer it queries for the sum of a few columns. This takes 7 seconds for the 'Cash Sale' customer - byfar the most active customer. I'd like to see if it's possible to get it down a bit by changing settings.<br /><br /></blockquote><div><br/> To make things clear before we search for a solution. You wrote "by changing settings". Is it theonly option? Can't you change the query in software? Can't you change database schema (add indexes etc)?<br /> <br /><br/></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204,204); padding-left: 1ex;"> Query:<br /> explain analyse select sum(item_points),sum(disc_points) from invoiceleft join gltx on invoice.invoice_id = gltx.gltx_id<br /> where gltx.inactive_on is null and gltx.posted = 'Y' andgltx.customer_id = 'A0ZQ2gsACIsEKLI638ikyg'<br /></blockquote><div><br /> Aside from other things, you know that LEFTjoin here is useless? - planner should collapse it to normal join but I'd check.<br /><br /><br /> Filip<br /><br /></div></div></blockquote><br/>
2011/9/26 M. D. <lists@turnkey.bz>
Hi
I didn't respond earlier, because I actually don't see any easy way of speeding up the query.
The memory settings seem fine for this size of data.
It does not look like you can change things by simply adding indexes. I mean, you can certainly add a specially crafted partial index on gltx.customer_id WHERE (gltx.inactive_on IS NULL) AND (gltx.posted = 'Y') - this can earn you a few percent max.
The problem here might be the type of join columns - we can see they are about 10 characters which is not an ideal choice (that's one of reasons why I'm a fan of artificial integer pkeys).
You _could_ try running the query with enable_mergejoin = off and see what happens.
You can check if the problem persists after dumping and reloading to another db.
If app modification was possible, you could materialize the data _before_ it must be queried - using summary table and appropriate triggers for keeping it up-to-date.
Regarding your last comment - on that customer_id values should be 0 - if it's a persistent business rule, I would try to create a CHECK to reflect it. With some luck and fiddling, constraint_exclusion might come to help with speeding up your query.
Also, if there is something special about customer_id distribution - table partitioning might be an option.
Ok, that's a long list - hope this helps, and good luck.
After all you can throw more hardware at the problem - or hire some Pg magician :-)
I have full access to the database, but no access to the application source code. If creating an index will help, I can do that, but with the columns I don't see it helping as I don't have access to the application source to change that.If there was a way to create a select trigger, I would do it and return 0 for both columns on that customer_id as it should always be 0.
So yes, by changing settings, I would like to know if there's any memory settings I can change to help or create an index. There is an index on the customer_id column in the gltx table, so I'm not sure what else could be done.
Hi
I didn't respond earlier, because I actually don't see any easy way of speeding up the query.
The memory settings seem fine for this size of data.
It does not look like you can change things by simply adding indexes. I mean, you can certainly add a specially crafted partial index on gltx.customer_id WHERE (gltx.inactive_on IS NULL) AND (gltx.posted = 'Y') - this can earn you a few percent max.
The problem here might be the type of join columns - we can see they are about 10 characters which is not an ideal choice (that's one of reasons why I'm a fan of artificial integer pkeys).
You _could_ try running the query with enable_mergejoin = off and see what happens.
You can check if the problem persists after dumping and reloading to another db.
If app modification was possible, you could materialize the data _before_ it must be queried - using summary table and appropriate triggers for keeping it up-to-date.
Regarding your last comment - on that customer_id values should be 0 - if it's a persistent business rule, I would try to create a CHECK to reflect it. With some luck and fiddling, constraint_exclusion might come to help with speeding up your query.
Also, if there is something special about customer_id distribution - table partitioning might be an option.
Ok, that's a long list - hope this helps, and good luck.
After all you can throw more hardware at the problem - or hire some Pg magician :-)