Обсуждение: Speed depending of Join Order.
I will explain my question usin an example. I have two tables as
follows:
Table "public.image_mode" Column | Type | Modifiers
-------------+---------------+-----------mis_id | character(5) | not nullins_id | character(5) | not
nullimg_id | character(25) | not nullmod_mis_id | character(5) | not nullmod_ins_id | character(5) | not
nullmod_id | character(5) | not nullmod_valueid | character(5) | not null
Indexes: pk_imgmode primary key btree (mis_id, ins_id, img_id,
mod_mis_id, mod_ins_id, mod_id, mod_valueid), image_mode_fk_image btree (mis_id, ins_id, img_id),
image_mode_fk_modebtree (mod_mis_id, mod_ins_id, mod_id,
mod_valueid)
Table "public.mode" Column | Type | Modifiers
-----------------+------------------------+-----------mis_id | character(5) | not nullins_id
| character(5) | not nullmod_id | character(5) | not nullmod_valueid | character(5)
| not nullmod_name | character varying(50) | not nullmod_value | character varying(25) | not
nullvmod_valuenr | double precision | vmod_valueunits | character varying(25) | vmod_obs | character
varying(255)|
Indexes: pk_mode primary key btree (mis_id, ins_id, mod_id, mod_valueid)
Ten I perform the same search in two different ways:
SELECT mod.mod_id, mod.mod_value FROM image_mode imod, mode mod WHERE imod.mod_mis_id = mod.mis_id AND
imod.mod_ins_id= mod.ins_id AND imod.mod_id = mod.mod_id AND imod.mod_valueid= mod.mod_valueid
AND imod.mis_id='XXX' AND imod.ins_id='YYY' AND imod.img_id='ZZZ';
SELECT mod.mod_id, mod.mod_value FROM image_mode imod, mode mod WHERE mod.mis_id = imod.mod_mis_id AND
mod.ins_id= imod.mod_ins_id AND mod.mod_id = imod.mod_id AND mod.mod_valueid= imod.mod_valueid AND
imod.mis_id='XXX' AND imod.ins_id='YYY' AND imod.img_id='ZZZ';
Note that the only difference is the order of the join elements. Using
version 7.2.2, which I have been using untill now, the time expended in
both of them was the same, using the right indexes. However, using
version 7.3.1 which I have instaled recently, the results of the explain
are the following:
-------- Case 1: ------------
Merge Join (cost=1.79..1.92 rows=1 width=79) (actual
time=404.29..4109.78 rows=2 loops=1) Merge Cond: (("outer".mod_mis_id = "inner".mis_id) AND
("outer".mod_ins_id = "inner".ins_id) AND ("outer".mod_id =
"inner".mod_id) AND ("outer".mod_valueid = "inner".mod_valueid)) -> Index Scan using image_mode_fk_mode on image_mode
imod
(cost=0.00..606979.14 rows=1 width=36) (actual time=403.42..4108.67
rows=2 loops=1) Filter: ((mis_id = 'IUE'::bpchar) AND (ins_id = 'LWP'::bpchar)
AND (img_id = 'HL28915'::bpchar)) -> Sort (cost=1.79..1.85 rows=24 width=43) (actual time=0.81..0.81
rows=5 loops=1) Sort Key: mod.mis_id, mod.ins_id, mod.mod_id, mod.mod_valueid -> Seq Scan on "mode" mod
(cost=0.00..1.24rows=24 width=43)
(actual time=0.10..0.19 rows=24 loops=1)Total runtime: 4109.96 msec
-------- Case 2: ---------
Merge Join (cost=5.69..5.71 rows=1 width=79) (actual time=1.12..1.30
rows=2 loops=1) Merge Cond: (("outer".mis_id = "inner".mod_mis_id) AND
("outer".ins_id = "inner".mod_ins_id) AND ("outer".mod_id =
"inner".mod_id) AND ("outer".mod_valueid = "inner".mod_valueid)) -> Index Scan using pk_mode on "mode" mod
(cost=0.00..6.08rows=24
width=43) (actual time=0.27..0.30 rows=5 loops=1) -> Sort (cost=5.69..5.70 rows=1 width=36) (actual time=0.81..0.81
rows=2 loops=1) Sort Key: imod.mod_mis_id, imod.mod_ins_id, imod.mod_id,
imod.mod_valueid -> Index Scan using image_mode_fk_image on image_mode imod
(cost=0.00..5.68 rows=1 width=36) (actual time=0.58..0.61 rows=2
loops=1) Index Cond: ((mis_id = 'IUE'::bpchar) AND (ins_id =
'LWP'::bpchar) AND (img_id = 'HL28915'::bpchar))Total runtime: 1.45 msec
As you can see, there is a great differece in the time it takes to
execute each of them since a sequential scan is performed in Case 1
instead an Index scan. I have run vacuum analyze so I am sure this is
not the problem.
Thank you very much in advance,
Raul Gutierrez
Raúl Gutiérrez Sánchez <raul@laeff.esa.es> writes:
> Note that the only difference is the order of the join elements. Using
> version 7.2.2, which I have been using untill now, the time expended in
> both of them was the same, using the right indexes. However, using
> version 7.3.1 which I have instaled recently, the results of the explain
> are the following:
That seems like a bug. Are the tables small enough that you could send
me a pg_dump of them? I doubt I can reproduce this without the specific
test case.
regards, tom lane
Raúl Gutiérrez Sánchez <raul@laeff.esa.es> writes:
> Note that the only difference is the order of the join elements. Using
> version 7.2.2, which I have been using untill now, the time expended in
> both of them was the same, using the right indexes. However, using
> version 7.3.1 which I have instaled recently, the results of the explain
> are the following:
It turns out that the problem was inaccuracy in some recently-added code
that tries to account for the possibility that a mergejoin won't scan
all the way to the end. Your sample data had only one possible value
for the mis_id and mod_mis_id columns; this boundary case exposed the
fact that the code was testing for "x < max" where it should be testing
"x <= max". Coupled with a lack of sanity-checking, the bogus
calculation affected the estimated costs in an asymmetrical way. This
is why the choice of a bad plan only occurred in one case out of two.
I've applied the attached patch for 7.3.2. Thanks for the report!
regards, tom lane
*** src/backend/optimizer/path/costsize.c.orig Wed Sep 4 16:31:20 2002
--- src/backend/optimizer/path/costsize.c Wed Jan 22 15:10:20 2003
***************
*** 645,652 **** innerscansel = firstclause->left_mergescansel; }
! outer_rows = outer_path->parent->rows * outerscansel;
! inner_rows = inner_path->parent->rows * innerscansel; /* cost of source data */
--- 645,666 ---- innerscansel = firstclause->left_mergescansel; }
! /* convert selectivity to row count; must scan at least one row */
!
! outer_rows = ceil(outer_path->parent->rows * outerscansel);
! if (outer_rows < 1)
! outer_rows = 1;
! inner_rows = ceil(inner_path->parent->rows * innerscansel);
! if (inner_rows < 1)
! inner_rows = 1;
!
! /*
! * Readjust scan selectivities to account for above rounding. This is
! * normally an insignificant effect, but when there are only a few rows
! * in the inputs, failing to do this makes for a large percentage error.
! */
! outerscansel = outer_rows / outer_path->parent->rows;
! innerscansel = inner_rows / inner_path->parent->rows; /* cost of source data */
*** src/backend/utils/adt/selfuncs.c.orig Fri Oct 18 22:56:16 2002
--- src/backend/utils/adt/selfuncs.c Wed Jan 22 15:12:05 2003
***************
*** 1740,1746 **** rsortop, ltop, gtop,
! revltop; Datum leftmax, rightmax; double selec;
--- 1740,1748 ---- rsortop, ltop, gtop,
! leop,
! revgtop,
! revleop; Datum leftmax, rightmax; double selec;
***************
*** 1779,1813 **** /* Look up the "left < right" and "left > right" operators */ op_mergejoin_crossops(opno,
<op,>op, NULL, NULL);
! /* Look up the "right < left" operator */
! revltop = get_commutator(gtop);
! if (!OidIsValid(revltop))
! return; /* shouldn't happen */ /* * Now, the fraction of the left variable that
willbe scanned is the * fraction that's <= the right-side maximum value. But only believe * non-default
estimates,else stick with our 1.0. */
! selec = scalarineqsel(root, ltop, false, left, rightmax, right->vartype); if (selec
!=DEFAULT_INEQ_SEL) *leftscan = selec; /* And similarly for the right variable. */
! selec = scalarineqsel(root, revltop, false, right, leftmax, left->vartype); if
(selec!= DEFAULT_INEQ_SEL) *rightscan = selec; /* * Only one of the two fractions can really be less
than1.0; believe
! * the smaller estimate and reset the other one to exactly 1.0. */ if (*leftscan > *rightscan)
*leftscan= 1.0;
! else *rightscan = 1.0; } /*
--- 1781,1829 ---- /* Look up the "left < right" and "left > right" operators */ op_mergejoin_crossops(opno,
<op,>op, NULL, NULL);
! /* Look up the "left <= right" operator */
! leop = get_negator(gtop);
! if (!OidIsValid(leop))
! return; /* insufficient info in catalogs */
!
! /* Look up the "right > left" operator */
! revgtop = get_commutator(ltop);
! if (!OidIsValid(revgtop))
! return; /* insufficient info in catalogs */
!
! /* Look up the "right <= left" operator */
! revleop = get_negator(revgtop);
! if (!OidIsValid(revleop))
! return; /* insufficient info in catalogs */ /* * Now, the fraction of the left
variablethat will be scanned is the * fraction that's <= the right-side maximum value. But only believe *
non-defaultestimates, else stick with our 1.0. */
! selec = scalarineqsel(root, leop, false, left, rightmax, right->vartype); if (selec
!=DEFAULT_INEQ_SEL) *leftscan = selec; /* And similarly for the right variable. */
! selec = scalarineqsel(root, revleop, false, right, leftmax, left->vartype); if
(selec!= DEFAULT_INEQ_SEL) *rightscan = selec; /* * Only one of the two fractions can really be less
than1.0; believe
! * the smaller estimate and reset the other one to exactly 1.0. If we
! * get exactly equal estimates (as can easily happen with self-joins),
! * believe neither. */ if (*leftscan > *rightscan) *leftscan = 1.0;
! else if (*leftscan < *rightscan) *rightscan = 1.0;
+ else
+ *leftscan = *rightscan = 1.0; } /*