Обсуждение: Why this Query Plan is different

Поиск
Список
Период
Сортировка

Why this Query Plan is different

От
"Thilina Gunasekara"
Дата:
Dear All,

I have two PostgreSQL Database Servers running on Gentoo which versions are
PostgreSQL 7.4.5 (Box A) and PostgreSQL 7.3.9 (Box B)

Due to a recent query posted by a developer I was interested to know which
factors are affecting these two servers to behave in a different manner to
the same query. Please shed light on this.

Box A (PostgreSQL 7.4.5) -

EXPLAIN SELECT count(airport_code) FROM properties WHERE
airport_code::character varying IN (SELECT airport_code::character varying
FROM airport_codes);
                                   QUERY PLAN
----------------------------------------------------------------------------
Aggregate  (cost=3605245.81..3605245.81 rows=1 width=4)
   ->  Seq Scan on properties  (cost=0.00..3605195.64 rows=20068 width=4)
         Filter: (subplan)
         SubPlan
           ->  Seq Scan on airport_codes  (cost=0.00..179.54 rows=7654
width=7)

Box B (PostgreSQL 7.3.9) - This take much to process
production=# EXPLAIN SELECT count(airport_code) FROM properties WHERE
airport_code::character varying IN (SELECT airport_code::character varying
FROM airport_codes);
                                   QUERY PLAN
----------------------------------------------------------------------------
Aggregate  (cost=9421.05..9421.05 rows=1 width=13)
   ->  Hash IN Join  (cost=228.67..9320.65 rows=40158 width=13)
         Hash Cond: ((("outer".airport_code)::character varying)::text =
(("inner".airport_code)::character varying)::text)
         ->  Seq Scan on properties  (cost=0.00..2137.58 rows=40158
width=13)
         ->  Hash  (cost=179.54..179.54 rows=7654 width=7)
               ->  Seq Scan on airport_codes  (cost=0.00..179.54 rows=7654
width=7)

Thilina Gunasekara

Database Administrator


Re: Why this Query Plan is different

От
Tom Lane
Дата:
"Thilina Gunasekara" <tgunasekara@openworld.org> writes:
> I have two PostgreSQL Database Servers running on Gentoo which versions are
> PostgreSQL 7.4.5 (Box A) and PostgreSQL 7.3.9 (Box B)
> Due to a recent query posted by a developer I was interested to know which
> factors are affecting these two servers to behave in a different manner to
> the same query. Please shed light on this.

7.4 knows a great deal more about optimizing IN queries than 7.3 does.
I think you have misidentified the two query plans, because "Hash IN
Join" is not a plan type that 7.3 has at all.  What is not clear to me
is which version is really slower.  Could we see EXPLAIN ANALYZE output
(correctly labeled ;-)) not just EXPLAIN?

            regards, tom lane

Re: Why this Query Plan is different

От
"Thilina Gunasekara"
Дата:
Hi Tom,

Begging your pardon for mislabeling the query plan output, please find the
correctly labeled output.

**** This is Box A which runs on 7.4.5 and delivers very quick result.

production=# SELECT version();
                                                             version

----------------------------------------------------------------------------
-----------------------------------------------------
 PostgreSQL 7.4.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.4
20040623 (Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6)
(1 row)

production=# EXPLAIN ANALYSE SELECT count(airport_code) FROM properties
WHERE airport_code::character varying IN (SELECT airport_code::character
varying FROM airport_codes);
                                                            QUERY PLAN

----------------------------------------------------------------------------
------------------------------------------------------
 Aggregate  (cost=7494.95..7494.95 rows=1 width=4) (actual
time=285.821..285.823 rows=1 loops=1)
   ->  Hash IN Join  (cost=228.67..7393.46 rows=40593 width=4) (actual
time=24.994..279.818 rows=5833 loops=1)
         Hash Cond: (("outer".airport_code)::text =
(("inner".airport_code)::character varying)::text)
         ->  Seq Scan on properties  (cost=0.00..2156.93 rows=40593 width=4)
(actual time=0.035..83.152 rows=40593 loops=1)
         ->  Hash  (cost=179.54..179.54 rows=7654 width=7) (actual
time=24.491..24.491 rows=0 loops=1)
               ->  Seq Scan on airport_codes  (cost=0.00..179.54 rows=7654
width=7) (actual time=0.046..12.481 rows=7654 loops=1)
 Total runtime: 286.234 ms
(7 rows)


****** And this is the Box B which runs on 7.3.9 and takes hours to deliver
the result. Actually I didn't execute EXPLAIN ANALYSE on Box B, because when
I tried, it went for more than 20-30 Minutes.

production=> SELECT version();
                                                                   version

----------------------------------------------------------------------------
-----------------------------------------------------------------
 PostgreSQL 7.3.9 on i686-pc-linux-gnu, compiled by GCC
i686-pc-linux-gnu-gcc (GCC) 3.3.5  (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3,
pie-8.7.7.1)
(1 row)

production=> EXPLAIN SELECT count(airport_code) FROM properties WHERE
airport_code::character varying IN (SELECT airport_code::character varying
FROM airport_codes);
                                   QUERY PLAN
----------------------------------------------------------------------------
----
 Aggregate  (cost=3605245.81..3605245.81 rows=1 width=4)
   ->  Seq Scan on properties  (cost=0.00..3605195.64 rows=20068 width=4)
         Filter: (subplan)
         SubPlan
           ->  Seq Scan on airport_codes  (cost=0.00..179.54 rows=7654
width=7)
(5 rows)

Regards,

Thilina

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Tom Lane
Sent: 24 March 2005 11:59
To: tgunasekara@openworld.org
Cc: PGSQL Admin
Subject: Re: [ADMIN] Why this Query Plan is different

"Thilina Gunasekara" <tgunasekara@openworld.org> writes:
> I have two PostgreSQL Database Servers running on Gentoo which versions
are
> PostgreSQL 7.4.5 (Box A) and PostgreSQL 7.3.9 (Box B)
> Due to a recent query posted by a developer I was interested to know which
> factors are affecting these two servers to behave in a different manner to
> the same query. Please shed light on this.

7.4 knows a great deal more about optimizing IN queries than 7.3 does.
I think you have misidentified the two query plans, because "Hash IN
Join" is not a plan type that 7.3 has at all.  What is not clear to me
is which version is really slower.  Could we see EXPLAIN ANALYZE output
(correctly labeled ;-)) not just EXPLAIN?

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings


Re: Why this Query Plan is different

От
Tom Lane
Дата:
"Thilina Gunasekara" <tgunasekara@openworld.org> writes:
> **** This is Box A which runs on 7.4.5 and delivers very quick result.
> ...
> ****** And this is the Box B which runs on 7.3.9 and takes hours to deliver
> the result.

Well, that's what I'd expect.  There's a reason why we put a great deal
of effort into fixing IN in 7.4 ;-)

            regards, tom lane