Обсуждение: 500x speed-down: Wrong query plan?
Hello gentlemen, Although this is my first post on the list, I am a fairly experienced PostgreSQL programmer. I am writing an ERP application suite using PostgreSQL as the preferred DBMS. Let me state that the SQL DDL is automatically generated by a CASE tool from an ER model. The generated schema contains the appropriate primary key and foreign key constraints, as defined by the original ER model, as well as "reverse indexes" on foreign keys, allowing (in theory) rapid backward navigation of foreign keys in joins. Let me show a sample join of two tables in the database schema. The information provided is quite extensive. I'm sorry for that, but I think it is better to provide the list with all the relevant information. Package: postgresql-7.4 Priority: optional Section: misc Installed-Size: 7860 Maintainer: Martin Pitt <mpitt@debian.org> Architecture: i386 Version: 1:7.4.9-2 Table "public.articolo" Column | Type | Modifiers -------------------------+-----------------------------+----------------------------------------------------- bigoid | bigint | not null default nextval('object_bigoid_seq'::text) metadata | text | finalized | timestamp without time zone | xdbs_created | timestamp without time zone | default now() xdbs_modified | timestamp without time zone | id_ente | text | not null barcode | text | tipo | text | id_produttore | text | not null id_articolo | text | not null venditore_id_ente | text | id_prodotto | text | aggregato_id_ente | text | aggregato_id_produttore | text | aggregato_id_articolo | text | descr | text | url | text | datasheet | text | scheda_sicurezza | text | peso | numeric | lunghezza | numeric | larghezza | numeric | altezza | numeric | volume | numeric | max_strati | numeric | um | text | Indexes: "articolo_pkey" primary key, btree (id_ente, id_produttore, id_articolo) "articolo_unique_barcode_index" unique, btree (barcode) "articolo_modified_index" btree (xdbs_modified) Foreign-key constraints: "$4" FOREIGN KEY (um) REFERENCES um(um) DEFERRABLE INITIALLY DEFERRED "$3" FOREIGN KEY (aggregato_id_ente, aggregato_id_produttore, aggregato_id_articolo) REFERENCES articolo(id_ente, id_produttore, id_articolo) DEFERRABLE INITIALLY DEFERRED "$2" FOREIGN KEY (venditore_id_ente, id_prodotto) REFERENCES prodotto(venditore_id_ente, id_prodotto) DEFERRABLE INITIALLY DEFERRED "$1" FOREIGN KEY (id_ente) REFERENCES ente(id_ente) DEFERRABLE INITIALLY DEFERRED Rules: articolo_delete_rule AS ON DELETE TO articolo DO INSERT INTO articolo_trash (id_ente, id_produttore, id_articolo, venditore_id_ente, id_prodotto, aggregato_id_ente, aggregato_id_produttore, aggregato_id_articolo, descr, url, datasheet, scheda_sicurezza, peso, lunghezza, larghezza, altezza, volume, max_strati, um, barcode, tipo, bigoid, metadata, finalized, xdbs_created, xdbs_modified) VALUES (old.id_ente, old.id_produttore, old.id_articolo, old.venditore_id_ente, old.id_prodotto, old.aggregato_id_ente, old.aggregato_id_produttore, old.aggregato_id_articolo, old.descr, old.url, old.datasheet, old.scheda_sicurezza, old.peso, old.lunghezza, old.larghezza, old.altezza, old.volume, old.max_strati, old.um, old.barcode, old.tipo, old.bigoid, old.metadata, old.finalized, old.xdbs_created, old.xdbs_modified) articolo_update_rule AS ON UPDATE TO articolo WHERE ((new.xdbs_modified)::timestamp with time zone <> now()) DO INSERT INTO articolo_trash (id_ente, id_produttore, id_articolo, venditore_id_ente, id_prodotto, aggregato_id_ente, aggregato_id_produttore, aggregato_id_articolo, descr, url, datasheet, scheda_sicurezza, peso, lunghezza, larghezza, altezza, volume, max_strati, um, barcode, tipo, bigoid, metadata, finalized, xdbs_created, xdbs_modified) VALUES (old.id_ente, old.id_produttore, old.id_articolo, old.venditore_id_ente, old.id_prodotto, old.aggregato_id_ente, old.aggregato_id_produttore, old.aggregato_id_articolo, old.descr, old.url, old.datasheet, old.scheda_sicurezza, old.peso, old.lunghezza, old.larghezza, old.altezza, old.volume, old.max_strati, old.um, old.barcode, old.tipo, old.bigoid, old.metadata, old.finalized, old.xdbs_created, old.xdbs_modified) Triggers: articolo_update_trigger BEFORE UPDATE ON articolo FOR EACH ROW EXECUTE PROCEDURE xdbs_update_trigger() Inherits: object, barcode Table "public.ubicazione" Column | Type | Modifiers ---------------+-----------------------------+----------------------------------------------------- bigoid | bigint | not null default nextval('object_bigoid_seq'::text) metadata | text | finalized | timestamp without time zone | xdbs_created | timestamp without time zone | default now() xdbs_modified | timestamp without time zone | id_ente | text | not null barcode | text | tipo | text | id_magazzino | text | not null id_settore | text | not null id_area | text | not null id_ubicazione | text | not null flavor | text | peso_max | numeric | lunghezza | numeric | larghezza | numeric | altezza | numeric | volume_max | numeric | inventario | integer | default 0 allarme | text | manutenzione | text | id_produttore | text | id_articolo | text | quantita | numeric | in_prelievo | numeric | in_deposito | numeric | lotto | text | scadenza | date | Indexes: "ubicazione_pkey" primary key, btree (id_ente, id_magazzino, id_settore, id_area, id_ubicazione) "ubicazione_id_ubicazione_key" unique, btree (id_ubicazione) "ubicazione_fkey_articolo" btree (id_ente, id_produttore, id_articolo) "ubicazione_modified_index" btree (xdbs_modified) Foreign-key constraints: "$5" FOREIGN KEY (id_ente, id_produttore, id_articolo) REFERENCES articolo(id_ente, id_produttore, id_articolo) DEFERRABLE INITIALLY DEFERRED "$4" FOREIGN KEY (manutenzione) REFERENCES manutenzione(manutenzione) DEFERRABLE INITIALLY DEFERRED "$3" FOREIGN KEY (allarme) REFERENCES allarme(allarme) DEFERRABLE INITIALLY DEFERRED "$2" FOREIGN KEY (flavor) REFERENCES flavor(flavor) DEFERRABLE INITIALLY DEFERRED "$1" FOREIGN KEY (id_ente, id_magazzino, id_settore, id_area) REFERENCES area(id_ente, id_magazzino, id_settore, id_area) DEFERRABLE INITIALLY DEFERRED Rules: ubicazione_delete_rule AS ON DELETE TO ubicazione DO INSERT INTO ubicazione_trash (id_ente, id_magazzino, id_settore, id_area, id_ubicazione, flavor, peso_max, lunghezza, larghezza, altezza, volume_max, inventario, allarme, manutenzione, id_produttore, id_articolo, quantita, in_prelievo, in_deposito, lotto, scadenza, barcode, tipo, bigoid, metadata, finalized, xdbs_created, xdbs_modified) VALUES (old.id_ente, old.id_magazzino, old.id_settore, old.id_area, old.id_ubicazione, old.flavor, old.peso_max, old.lunghezza, old.larghezza, old.altezza, old.volume_max, old.inventario, old.allarme, old.manutenzione, old.id_produttore, old.id_articolo, old.quantita, old.in_prelievo, old.in_deposito, old.lotto, old.scadenza, old.barcode, old.tipo, old.bigoid, old.metadata, old.finalized, old.xdbs_created, old.xdbs_modified) ubicazione_update_rule AS ON UPDATE TO ubicazione WHERE ((new.xdbs_modified)::timestamp with time zone <> now()) DO INSERT INTO ubicazione_trash (id_ente, id_magazzino, id_settore, id_area, id_ubicazione, flavor, peso_max, lunghezza, larghezza, altezza, volume_max, inventario, allarme, manutenzione, id_produttore, id_articolo, quantita, in_prelievo, in_deposito, lotto, scadenza, barcode, tipo, bigoid, metadata, finalized, xdbs_created, xdbs_modified) VALUES (old.id_ente, old.id_magazzino, old.id_settore, old.id_area, old.id_ubicazione, old.flavor, old.peso_max, old.lunghezza, old.larghezza, old.altezza, old.volume_max, old.inventario, old.allarme, old.manutenzione, old.id_produttore, old.id_articolo, old.quantita, old.in_prelievo, old.in_deposito, old.lotto, old.scadenza, old.barcode, old.tipo, old.bigoid, old.metadata, old.finalized, old.xdbs_created, old.xdbs_modified) Triggers: ubicazione_update_trigger BEFORE UPDATE ON ubicazione FOR EACH ROW EXECUTE PROCEDURE xdbs_update_trigger() Inherits: object, barcode ****************************************************************************** Here is the first join. This is planned correctly. Execution times are irrelevant. dmd-freerp-1-alex=# explain analyze SELECT * FROM articolo JOIN ubicazione USING (id_ente, id_produttore, id_articolo) WHERE ubicazione.id_ente = 'dmd' AND allarme IS NULL AND manutenzione IS NULL AND ubicazione.xdbs_modified > '2006-01-08 18:25:00+01'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..8.73 rows=1 width=1146) (actual time=0.247..0.247 rows=0 loops=1) -> Index Scan using ubicazione_modified_index on ubicazione (cost=0.00..3.03 rows=1 width=536) (actual time=0.239..0.239 rows=0 loops=1) Index Cond: (xdbs_modified > '2006-01-08 18:25:00'::timestamp without time zone) Filter: ((id_ente = 'dmd'::text) AND (allarme IS NULL) AND (manutenzione IS NULL)) -> Index Scan using articolo_pkey on articolo (cost=0.00..5.69 rows=1 width=653) (never executed) Index Cond: (('dmd'::text = articolo.id_ente) AND (articolo.id_produttore = "outer".id_produttore) AND (articolo.id_articolo = "outer".id_articolo)) Total runtime: 0.556 ms (7 rows) ********************************************************************* Here's the second join on the same tables. This times a different set of indexes should be used to perform the join, but even in this case I would expect the planner to generate a nested loop of two index scans. Instead, this is what happens. dmd-freerp-1-alex=# explain analyze SELECT * FROM articolo JOIN ubicazione USING (id_ente, id_produttore, id_articolo) WHERE ubicazione.id_ente = 'dmd' AND allarme IS NULL AND manutenzione IS NULL AND articolo.xdbs_modified > '2006-01-08 18:25:00+01'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..1017.15 rows=1 width=1146) (actual time=258.648..258.648 rows=0 loops=1) -> Seq Scan on ubicazione (cost=0.00..1011.45 rows=1 width=536) (actual time=0.065..51.617 rows=12036 loops=1) Filter: ((id_ente = 'dmd'::text) AND (allarme IS NULL) AND (manutenzione IS NULL)) -> Index Scan using articolo_pkey on articolo (cost=0.00..5.69 rows=1 width=653) (actual time=0.011..0.011 rows=0 loops=12036) Index Cond: (('dmd'::text = articolo.id_ente) AND (articolo.id_produttore = "outer".id_produttore) AND (articolo.id_articolo = "outer".id_articolo)) Filter: (xdbs_modified > '2006-01-08 18:25:00'::timestamp without time zone) Total runtime: 258.975 ms (7 rows) This time, a sequential scan on the rightmost table is used to perform the join. This is quite plainly a wrong choice, since the number of tuples in the articolo having xdbs_modified > '2006-01-08 18:25:00' is 0. I also tried increasing the amount of collected statistics to 1000 with "ALTER TABLE articolo ALTER COLUMN xdbs_modified SET STATISTICS 1000" and subsequently vacuum-analyzed the db, so as to give the planner as much information as possible to realize that articolo ought to be index-scanned with the articolo_modified_index B-tree index. The correct query plan is to perform a nested loop join with an index scan on articolo using xdbs_modified_index and a corresponding index scan on ubicazione using ubicazione_fkey_articolo. I am currently resorting to selecting from the single tables and performing the join in the application code rather than in the DB. This is currently the only viable alternative for me, as a 500x speed-down simply cannot be tolerated. What I do not understand is why the planner behaves so differently in the two cases. Any ideas? Would upgrading to more recent versions of postgresql make any difference? Alex -- ********************************************************************* http://www.barettadeit.com/ Baretta DE&IT A division of Baretta SRL tel. +39 02 370 111 55 fax. +39 02 370 111 54 Our technology: The Application System/Xcaml (AS/Xcaml) <http://www.asxcaml.org/> The FreerP Project <http://www.freerp.org/>
Hi Alessandro, > Nested Loop (cost=0.00..1017.15 rows=1 width=1146) (actual > time=258.648..258.648 rows=0 loops=1) > -> Seq Scan on ubicazione (cost=0.00..1011.45 rows=1 width=536) > (actual time=0.065..51.617 rows=12036 loops=1) > Filter: ((id_ente = 'dmd'::text) AND (allarme IS NULL) AND > (manutenzione IS NULL)) The problem seems here. The planner expects one matching row (and that's why it chooses a nested loop), but 12036 rows are matching this condition. Are you sure that you recentrly ANALYZED the table "ubicazione"? If so, try to increase statistics for the id_ente column. P.S. There is also an italian mailing list, if you are interested :) Best regards -- Matteo Beccati http://phpadsnew.com http://phppgads.com
On 1/9/06, Alessandro Baretta <a.baretta@barettadeit.com> wrote: > Hello gentlemen, > > Although this is my first post on the list, I am a fairly experienced PostgreSQL > programmer. I am writing an ERP application suite using PostgreSQL as the > preferred DBMS. Let me state that the SQL DDL is automatically generated by a > CASE tool from an ER model. The generated schema contains the appropriate > primary key and foreign key constraints, as defined by the original ER model, as > well as "reverse indexes" on foreign keys, allowing (in theory) rapid backward > navigation of foreign keys in joins. > > Let me show a sample join of two tables in the database schema. The information > provided is quite extensive. I'm sorry for that, but I think it is better to > provide the list with all the relevant information. > > Package: postgresql-7.4 maybe, because you are in developing state, you can start to think in upgrading to 8.1 -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
Matteo Beccati wrote: > Hi Alessandro, > >> Nested Loop (cost=0.00..1017.15 rows=1 width=1146) (actual >> time=258.648..258.648 rows=0 loops=1) >> -> Seq Scan on ubicazione (cost=0.00..1011.45 rows=1 width=536) >> (actual time=0.065..51.617 rows=12036 loops=1) >> Filter: ((id_ente = 'dmd'::text) AND (allarme IS NULL) AND >> (manutenzione IS NULL)) > > > The problem seems here. The planner expects one matching row (and that's > why it chooses a nested loop), but 12036 rows are matching this condition. > > Are you sure that you recentrly ANALYZED the table "ubicazione"? If so, > try to increase statistics for the id_ente column. No, this is not the problem. I increased the amount of statistics with ALTER TABLE ... SET STATISTICS 1000, which is as much as I can have. The problem is that the planner simply ignores the right query plan, which is orders of magnitude less costly. Keep in mind that the XDBS--the CASE tool I use--makes heavy use of indexes, and generates all relevant indexes in relation to the join paths which are implicit in the ER model "relations". In this case, both ubicazione and articolo have indexes on the join fields: Indexes: "articolo_pkey" primary key, btree (id_ente, id_produttore, id_articolo) "ubicazione_fkey_articolo" btree (id_ente, id_produttore, id_articolo) Notice that only the "articolo_pkey" is a unique index, while "ubicazione_fkey_articolo" allows duplicates. This second index is not used by the planner. Both tables also have a "bookkeeping" index on xdbs_modified. I am selecting "recently inserted or updated" tuples, which are usually a very small fraction of the table--if there are any. The index on xdbs_modified is B-tree allowing a very quick index scan to find the few tuples having xdbs_modified > '[some recent timestamp]'. Hence, the optimal plan for both queries is to perform an index scan using the <table_name>_modified_index on the table upon which I specify the xdbs_modified > '...' condition, and the join-fields index on the other table. Alex -- ********************************************************************* http://www.barettadeit.com/ Baretta DE&IT A division of Baretta SRL tel. +39 02 370 111 55 fax. +39 02 370 111 54 Our technology: The Application System/Xcaml (AS/Xcaml) <http://www.asxcaml.org/> The FreerP Project <http://www.freerp.org/>
Alessandro Baretta <a.baretta@barettadeit.com> writes: > Matteo Beccati wrote: >> Are you sure that you recentrly ANALYZED the table "ubicazione"? If so, >> try to increase statistics for the id_ente column. > No, this is not the problem. I increased the amount of statistics with ALTER > TABLE ... SET STATISTICS 1000, which is as much as I can have. What Matteo wanted to know is if you'd done an ANALYZE afterward. ALTER TABLE SET STATISTICS doesn't in itself update the statistics. What do you get from EXPLAIN SELECT * FROM articolo WHERE articolo.xdbs_modified > '2006-01-08 18:25:00+01'; I'm curious to see how many rows the planner thinks this will produce, and whether it will use the index or not. Also, I gather from the plan choices that the condition id_ente = 'dmd' isn't very selective ... what fraction of the rows in each table satisfy that? regards, tom lane
Tom Lane wrote: > Alessandro Baretta <a.baretta@barettadeit.com> writes: > >>Matteo Beccati wrote: >> >>>Are you sure that you recentrly ANALYZED the table "ubicazione"? If so, >>>try to increase statistics for the id_ente column. > > >>No, this is not the problem. I increased the amount of statistics with ALTER >>TABLE ... SET STATISTICS 1000, which is as much as I can have. > > > What Matteo wanted to know is if you'd done an ANALYZE afterward. ALTER > TABLE SET STATISTICS doesn't in itself update the statistics. I probably forgot to mention that I have vacuum-analyze the after this operation, and, since I did not manage to get the index to work, I vacuum-analyzed several times more, just to be on the safe side. > What do you get from > > EXPLAIN SELECT * FROM articolo WHERE articolo.xdbs_modified > '2006-01-08 18:25:00+01'; > > I'm curious to see how many rows the planner thinks this will produce, > and whether it will use the index or not. dmd-freerp-1-alex=# EXPLAIN ANALYZE SELECT * FROM articolo WHERE articolo.xdbs_modified > '2006-01-08 18:25:00+01'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using articolo_modified_index on articolo (cost=0.00..3914.91 rows=17697 width=653) (actual time=0.032..0.032 rows=0 loops=1) Index Cond: (xdbs_modified > '2006-01-08 18:25:00'::timestamp without time zone) Total runtime: 0.150 ms (3 rows) The planner gets tricked only by *SOME* join queries. > Also, I gather from the plan choices that the condition id_ente = 'dmd' > isn't very selective ... what fraction of the rows in each table > satisfy that? In most situation, this condition selects all the tuples. "id_ente" selects the "owner of the data". Since, in most situations, companies do not share a database between them--although the application allows it--filtering according to 'id_ente' is like to filtering at all. Yet, this field is used in the indexes, so the condition ought to be specified in the queries anyhow. -- ********************************************************************* http://www.barettadeit.com/ Baretta DE&IT A division of Baretta SRL tel. +39 02 370 111 55 fax. +39 02 370 111 54 Our technology: The Application System/Xcaml (AS/Xcaml) <http://www.asxcaml.org/> The FreerP Project <http://www.freerp.org/>
Alessandro Baretta <a.baretta@barettadeit.com> writes: > Tom Lane wrote: >> I'm curious to see how many rows the planner thinks this will produce, >> and whether it will use the index or not. > dmd-freerp-1-alex=# EXPLAIN ANALYZE SELECT * FROM articolo WHERE > articolo.xdbs_modified > '2006-01-08 18:25:00+01'; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------- > Index Scan using articolo_modified_index on articolo (cost=0.00..3914.91 > rows=17697 width=653) (actual time=0.032..0.032 rows=0 loops=1) > Index Cond: (xdbs_modified > '2006-01-08 18:25:00'::timestamp without time zone) > Total runtime: 0.150 ms > (3 rows) Well, there's your problem: 17697 rows estimated vs 0 actual. With a closer row estimate it would've probably done the right thing for the join problem. How many rows are really in the table, anyway? Could we see the pg_stats row for articolo.xdbs_modified? regards, tom lane
Tom Lane wrote: > Alessandro Baretta <a.baretta@barettadeit.com> writes: > >>Tom Lane wrote: >> >>>I'm curious to see how many rows the planner thinks this will produce, >>>and whether it will use the index or not. >>dmd-freerp-1-alex=# EXPLAIN ANALYZE SELECT * FROM articolo WHERE >>articolo.xdbs_modified > '2006-01-08 18:25:00+01'; >> QUERY PLAN >>------------------------------------------------------------------------------------------------------------------------------------------- >> Index Scan using articolo_modified_index on articolo (cost=0.00..3914.91 >>rows=17697 width=653) (actual time=0.032..0.032 rows=0 loops=1) >> Index Cond: (xdbs_modified > '2006-01-08 18:25:00'::timestamp without time zone) >> Total runtime: 0.150 ms >>(3 rows) > > > Well, there's your problem: 17697 rows estimated vs 0 actual. With a > closer row estimate it would've probably done the right thing for the > join problem. Hmmm, what you are telling me is very interesting, Tom. So, let me see if I got this straight: the first 'rows=... in the result from EXPLAIN ANALYZE gives me estimates, while the second gives the actual cardinality of the selected record set. Correct? If this is true, two questions arise: why is the estimated number of rows completele wrong, and why, given such a large estimated record set does PostgreSQL schedule an Index Scan as opposed to a Seq Scan? > > How many rows are really in the table, anyway? Could we see the > pg_stats row for articolo.xdbs_modified? dmd-freerp-1-alex=# select count(*) from articolo; count ------- 53091 (1 row) dmd-freerp-1-alex=# explain analyze select * from articolo; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on articolo (cost=0.00..1439.91 rows=53091 width=653) (actual time=0.013..151.189 rows=53091 loops=1) Total runtime: 295.152 ms (2 rows) Now let me get the pg_stats for xdbs_modified. dmd-freerp-1-alex=# select * from pg_stats where tablename = 'articolo' and attname = 'xdbs_modified'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation ------------+-----------+---------------+-----------+-----------+------------+--------------------------------+-------------------+------------------+------------- public | articolo | xdbs_modified | 0 | 8 | 1 | {"2006-01-10 08:12:58.605327"} | {1} | | 1 (1 row) For sake of simplicity I have re-timestamped all tuples in the table with the current timestamp, as you can see above. Now, obviously, the planner must estimate ~0 rows for queries posing a selection condition on xdbs_modified, for any value other than "2006-01-10 08:12:58.605327". Let me try selecting from articolo first. dmd-freerp-1-alex=# EXPLAIN ANALYZE SELECT * FROM articolo WHERE articolo.xdbs_modified > '2006-01-10 18:25:00+01'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Index Scan using articolo_modified_index on articolo (cost=0.00..2.01 rows=1 width=653) (actual time=0.139..0.139 rows=0 loops=1) Index Cond: (xdbs_modified > '2006-01-10 18:25:00'::timestamp without time zone) Total runtime: 0.257 ms (3 rows) The planner produces a sensible estimate of the number of rows and consequently chooses the appropriate query plan. Now, the join. dmd-freerp-1-alex=# explain analyze SELECT * FROM articolo JOIN ubicazione USING (id_ente, id_produttore, id_articolo) WHERE articolo.id_ente = 'dmd' AND allarme IS NULL AND manutenzione IS NULL AND articolo.xdbs_modified > '2006-01-10 18:25:00+01'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..5.05 rows=1 width=1146) (actual time=0.043..0.043 rows=0 loops=1) -> Index Scan using articolo_modified_index on articolo (cost=0.00..2.02 rows=1 width=653) (actual time=0.035..0.035 rows=0 loops=1) Index Cond: (xdbs_modified > '2006-01-10 18:25:00'::timestamp without time zone) Filter: (id_ente = 'dmd'::text) -> Index Scan using ubicazione_fkey_articolo on ubicazione (cost=0.00..3.02 rows=1 width=536) (never executed) Index Cond: (('dmd'::text = ubicazione.id_ente) AND ("outer".id_produttore = ubicazione.id_produttore) AND ("outer".id_articolo = ubicazione.id_articolo)) Filter: ((allarme IS NULL) AND (manutenzione IS NULL)) Total runtime: 0.382 ms (8 rows) Dear Tom, you're my hero! I have no clue as to how or why the statistics were wrong yesterday--as I vacuum-analyzed continuously out of lack of any better idea--and I was stupid enough to re-timestamp everything before selecting from pg_stats. Supposedly, the timestamps in the table were a random sampling taken from the month of December 2005, so that any date in January would be greater than all the timestamps in xdbs_modified. There must a bug in the my rule/trigger system, which is responsible to maintain these timestamps as appropriate. > regards, tom lane > Thank you very much Tom and Matteo. Your help has been very precious to me. Thanks to your wisdom, my application will now have a 500x speed boost on a very common class of queries. The end result is the following query plan, allowing me to rapidly select only the tuples in a join which have changed since the application last updated its notion of this dataset. dmd-freerp-1-alex=# explain analyze SELECT * FROM articolo JOIN ubicazione USING (id_ente, id_produttore, id_articolo) WHERE articolo.id_ente = 'dmd' AND allarme IS NULL AND manutenzione IS NULL AND articolo.xdbs_modified > '2006-01-10 18:25:00+01' UNION SELECT * FROM articolo JOIN ubicazione USING (id_ente, id_produttore, id_articolo) WHERE articolo.id_ente = 'dmd' AND allarme IS NULL AND manutenzione IS NULL AND ubicazione.xdbs_modified > '2006-01-10 18:25:00+01'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=11.13..11.39 rows=2 width=1146) (actual time=0.519..0.519 rows=0 loops=1) -> Sort (cost=11.13..11.14 rows=2 width=1146) (actual time=0.512..0.512 rows=0 loops=1) Sort Key: id_ente, id_produttore, id_articolo, bigoid, metadata, finalized, xdbs_created, xdbs_modified, barcode, tipo, venditore_id_ente, id_prodotto, aggregato_id_ente, aggregato_id_produttore, aggregato_id_articolo, descr, url, datasheet, scheda_sicurezza, peso, lunghezza, larghezza, altezza, volume, max_strati, um, bigoid, metadata, finalized, xdbs_created, xdbs_modified, barcode, tipo, id_magazzino, id_settore, id_area, id_ubicazione, flavor, peso_max, lunghezza, larghezza, altezza, volume_max, inventario, allarme, manutenzione, quantita, in_prelievo, in_deposito, lotto, scadenza -> Append (cost=0.00..11.12 rows=2 width=1146) (actual time=0.305..0.305 rows=0 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..5.06 rows=1 width=1146) (actual time=0.157..0.157 rows=0 loops=1) -> Nested Loop (cost=0.00..5.05 rows=1 width=1146) (actual time=0.149..0.149 rows=0 loops=1) -> Index Scan using articolo_modified_index on articolo (cost=0.00..2.02 rows=1 width=653) (actual time=0.142..0.142 rows=0 loops=1) Index Cond: (xdbs_modified > '2006-01-10 18:25:00'::timestamp without time zone) Filter: (id_ente = 'dmd'::text) -> Index Scan using ubicazione_fkey_articolo on ubicazione (cost=0.00..3.02 rows=1 width=536) (never executed) Index Cond: (('dmd'::text = ubicazione.id_ente) AND ("outer".id_produttore = ubicazione.id_produttore) AND ("outer".id_articolo = ubicazione.id_articolo)) Filter: ((allarme IS NULL) AND (manutenzione IS NULL)) -> Subquery Scan "*SELECT* 2" (cost=0.00..6.06 rows=1 width=1146) (actual time=0.137..0.137 rows=0 loops=1) -> Nested Loop (cost=0.00..6.05 rows=1 width=1146) (actual time=0.131..0.131 rows=0 loops=1) -> Index Scan using ubicazione_modified_index on ubicazione (cost=0.00..3.02 rows=1 width=536) (actual time=0.123..0.123 rows=0 loops=1) Index Cond: (xdbs_modified > '2006-01-10 18:25:00'::timestamp without time zone) Filter: ((allarme IS NULL) AND (manutenzione IS NULL) AND ('dmd'::text = id_ente)) -> Index Scan using articolo_pkey on articolo (cost=0.00..3.02 rows=1 width=653) (never executed) Index Cond: ((articolo.id_ente = 'dmd'::text) AND (articolo.id_produttore = "outer".id_produttore) AND (articolo.id_articolo = "outer".id_articolo)) Total runtime: 1.609 ms (20 rows) Since the previous approach used in my application was to select the whole relation every time the user needed to access this data, the above result must be compared with the following naive one. dmd-freerp-1-alex=# explain analyze SELECT * FROM articolo JOIN ubicazione USING (id_ente, id_produttore, id_articolo) WHERE articolo.id_ente = 'dmd' AND allarme IS NULL AND manutenzione IS NULL; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..1014.49 rows=1 width=1146) (actual time=0.210..283.272 rows=3662 loops=1) -> Seq Scan on ubicazione (cost=0.00..1011.45 rows=1 width=536) (actual time=0.070..51.223 rows=12036 loops=1) Filter: ((allarme IS NULL) AND (manutenzione IS NULL) AND ('dmd'::text = id_ente)) -> Index Scan using articolo_pkey on articolo (cost=0.00..3.02 rows=1 width=653) (actual time=0.008..0.009 rows=0 loops=12036) Index Cond: ((articolo.id_ente = 'dmd'::text) AND (articolo.id_produttore = "outer".id_produttore) AND (articolo.id_articolo = "outer".id_articolo)) Total runtime: 292.544 ms (6 rows) This amounts to a ~200x speedup for the end user. Alex -- ********************************************************************* http://www.barettadeit.com/ Baretta DE&IT A division of Baretta SRL tel. +39 02 370 111 55 fax. +39 02 370 111 54 Our technology: The Application System/Xcaml (AS/Xcaml) <http://www.asxcaml.org/> The FreerP Project <http://www.freerp.org/>
Alessandro Baretta <a.baretta@barettadeit.com> writes: > I have no clue as to how or why the statistics were wrong > yesterday--as I vacuum-analyzed continuously out of lack of any better > idea--and I was stupid enough to re-timestamp everything before > selecting from pg_stats. Too bad. I would be interested to find out how, if the stats were up-to-date, the thing was still getting the row estimate so wrong. If you manage to get the database back into its prior state please do send along the pg_stats info. regards, tom lane
Tom Lane wrote: > Alessandro Baretta <a.baretta@barettadeit.com> writes: > >>I have no clue as to how or why the statistics were wrong >>yesterday--as I vacuum-analyzed continuously out of lack of any better >>idea--and I was stupid enough to re-timestamp everything before >>selecting from pg_stats. > > > Too bad. I would be interested to find out how, if the stats were > up-to-date, the thing was still getting the row estimate so wrong. > If you manage to get the database back into its prior state please > do send along the pg_stats info. I have some more information on this issue, which clears PostgreSQL's planner of all suspects. I am observing severe corruption of the bookkeeping fields managed by the xdbs rule/trigger "complex". I am unable to pinpoint the cause, right now, but the effect is that after running a few hours' test on the end-user application (which never interacts directly with xdbs_* fields, and thus cannot possibly mangle them) most tuples (the older ones, apparently) get thei timestamps set to NULL. Before vacuum-analyzing the table, yesterday's statistics were in effect, and the planner used the appropriate indexes. Now, after vacuum-analyzing the table, the pg_stats row for the xdbs_modified field no longer exists (!), and the planner has reverted to the Nested Loop Seq Scan join strategy. Hence, all the vacuum-analyzing I was doing when complaining against the planner was actually collecting completely screwed statistics, and this is why the ALTER TABLE ... SET STATISTICS 1000 did not help at all! Ok. I plead guilty and ask for the clemency of the court. I'll pay my debt with society with a long term of pl/pgsql code debugging... Alex