Обсуждение: PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster

Поиск
Список
Период
Сортировка

PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster

От
Christian Brink
Дата:
I previously posted 'forcing index scan on query produces 16x faster'
and it seemed that the consensus was that 8.0.x series had an issue. I
have upgraded to the highest practical version for our distro. But we
seem to have the same issue.

If I force the 'enable_seqscan' off our actual time is 9ms where if
'enable_seqscan' is on the performance is 2200ms ( the good news is the
Seq Scan query on 8.2 is 1/2 the time of the 8.0 query ).


The paste is below - I reloaded the table from scratch after the 8.2
upgrade. Then I ran a 'REINDEX DATABASE' and a 'VACUUM ANALYZE' (then
ran some queries and reran the vac analyze).



postream=> SELECT version();
                                                  version
---------------------------------------------------------------------------------------------------------
  PostgreSQL 8.2.11 on i386-redhat-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20070925 (Red Hat 4.1.2-33)
(1 row)

postream=> SET enable_seqscan = false;
SET
postream=> EXPLAIN ANALYZE
postream-> SELECT si.group1_id as name, sum(si.qty) as count,
sum(si.amt) as amt
postream->    FROM salesitems si, sales s, sysstrings
postream->   WHERE si.id = s.id
postream->    AND si.group1_id != ''
postream->    AND si.group1_id IS NOT NULL
postream->    AND NOT si.void
postream->    AND NOT s.void
postream->    AND NOT s.suspended
postream->    AND s.tranzdate >= (cast('2010-02-15' as date) +
cast(sysstrings.data as time))
postream->    AND s.tranzdate < ((cast('2010-02-15' as date) + 1) +
cast(sysstrings.data as time))
postream->    AND sysstrings.id='net/Console/Employee/Day End Time'
postream-> GROUP BY name;
                                                                                                 QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=426973.65..426973.86 rows=14 width=35) (actual
time=9.424..9.438 rows=12 loops=1)
    ->  Nested Loop  (cost=0.01..426245.31 rows=97113 width=35) (actual
time=0.653..6.954 rows=894 loops=1)
          ->  Nested Loop  (cost=0.01..2416.59 rows=22477 width=4)
(actual time=0.595..2.150 rows=225 loops=1)
                ->  Index Scan using sysstrings_pkey on sysstrings
(cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1 loops=1)
                      Index Cond: (id = 'net/Console/Employee/Day End
Time'::text)
                ->  Index Scan using sales_tranzdate_index on sales s
(cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687
rows=225 loops=1)
                      Index Cond: ((s.tranzdate >= ('2010-02-15'::date +
(sysstrings.data)::time without time zone)) AND (s.tranzdate <
('2010-02-16'::date + (sysstrings.data)::time without time zone)))
                      Filter: ((NOT void) AND (NOT suspended))
          ->  Index Scan using salesitems_pkey on salesitems si
(cost=0.00..18.54 rows=25 width=39) (actual time=0.007..0.013 rows=4
loops=225)
                Index Cond: (si.id = s.id)
                Filter: (((group1_id)::text <> ''::text) AND (group1_id
IS NOT NULL) AND (NOT void))
  Total runtime: 9.585 ms
(12 rows)

postream=> SET enable_seqscan = true;
SET
postream=> EXPLAIN ANALYZE
postream-> SELECT si.group1_id as name, sum(si.qty) as count,
sum(si.amt) as amt
postream->    FROM salesitems si, sales s, sysstrings
postream->   WHERE si.id = s.id
postream->    AND si.group1_id != ''
postream->    AND si.group1_id IS NOT NULL
postream->    AND NOT si.void
postream->    AND NOT s.void
postream->    AND NOT s.suspended
postream->    AND s.tranzdate >= (cast('2010-02-15' as date) +
cast(sysstrings.data as time))
postream->    AND s.tranzdate < ((cast('2010-02-15' as date) + 1) +
cast(sysstrings.data as time))
postream->    AND sysstrings.id='net/Console/Employee/Day End Time'
postream-> GROUP BY name;
                                                                                                    QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=38315.09..38315.30 rows=14 width=35) (actual
time=2206.531..2206.545 rows=12 loops=1)
    ->  Hash Join  (cost=2697.55..37586.74 rows=97113 width=35) (actual
time=2128.070..2204.048 rows=894 loops=1)
          Hash Cond: (si.id = s.id)
          ->  Seq Scan on salesitems si  (cost=0.00..30578.15
rows=890646 width=39) (actual time=0.047..1487.688 rows=901281 loops=1)
                Filter: (((group1_id)::text <> ''::text) AND (group1_id
IS NOT NULL) AND (NOT void))
          ->  Hash  (cost=2416.59..2416.59 rows=22477 width=4) (actual
time=1.823..1.823 rows=225 loops=1)
                ->  Nested Loop  (cost=0.01..2416.59 rows=22477 width=4)
(actual time=0.477..1.592 rows=225 loops=1)
                      ->  Index Scan using sysstrings_pkey on
sysstrings  (cost=0.00..8.27 rows=1 width=182) (actual time=0.039..0.040
rows=1 loops=1)
                            Index Cond: (id = 'net/Console/Employee/Day
End Time'::text)
                      ->  Index Scan using sales_tranzdate_index on
sales s  (cost=0.01..1846.40 rows=22477 width=12) (actual
time=0.410..1.187 rows=225 loops=1)
                            Index Cond: ((s.tranzdate >=
('2010-02-15'::date + (sysstrings.data)::time without time zone)) AND
(s.tranzdate < ('2010-02-16'::date + (sysstrings.data)::time without
time zone)))
                            Filter: ((NOT void) AND (NOT suspended))
  Total runtime: 2206.706 ms
(13 rows)

postream=> \d salesitems;
                     Table "public.salesitems"
     Column    |           Type           |       Modifiers
--------------+--------------------------+------------------------
  id           | integer                  | not null
  lineno       | smallint                 | not null
  plu          | character varying(35)    |
  qty          | numeric(8,3)             | not null
  amt          | numeric(10,2)            |
  last_updated | timestamp with time zone | default now()
  group1_id    | character varying(64)    |
  group2_id    | text                     |
  group3_id    | text                     |
  void         | boolean                  | not null default false
  hash         | boolean                  | not null default false
  component    | boolean                  | not null default false
  subitem      | boolean                  | not null default false
Indexes:
     "salesitems_pkey" PRIMARY KEY, btree (id, lineno)
     "idx_si_group_id" btree (group1_id)
     "salesitems_last_updated_index" btree (last_updated)

--
Christian Brink



Re: PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster

От
Tom Lane
Дата:
Christian Brink <cbrink@r-stream.com> writes:
>           ->  Nested Loop  (cost=0.01..2416.59 rows=22477 width=4)
> (actual time=0.595..2.150 rows=225 loops=1)
>                 ->  Index Scan using sysstrings_pkey on sysstrings
> (cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1 loops=1)
>                       Index Cond: (id = 'net/Console/Employee/Day End
> Time'::text)
>                 ->  Index Scan using sales_tranzdate_index on sales s
> (cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687
> rows=225 loops=1)
>                       Index Cond: ((s.tranzdate >= ('2010-02-15'::date +
> (sysstrings.data)::time without time zone)) AND (s.tranzdate <
> ('2010-02-16'::date + (sysstrings.data)::time without time zone)))
>                       Filter: ((NOT void) AND (NOT suspended))

The fundamental reason why you're getting a bad plan choice is the
factor-of-100 estimation error here.  I'm not sure you can do a whole
lot about that without rethinking the query --- in particular I would
suggest trying to get rid of the non-constant range bounds.  You're
apparently already plugging in an external variable for the date,
so maybe you could handle the time of day similarly instead of joining
to sysstrings for it.

            regards, tom lane

Re: PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster

От
"Eger, Patrick"
Дата:
Not to beat a dead horse excessively, but I think the below is a pretty
good argument for index hints? I know the general optimizer wants to be
highest priority (I very much agree with this), but I think there are
fully legitimate cases like the below. Asking the user to rewrite the
query in an unnatural way (or to change optimizer params that may work
for 99% of queries) is, IMO not a good thing. Given that the postgres
optimizer can never be perfect (as it will never have the perfect
knowledge necessary for a perfect decision), I would request that index
hints be reconsidered (for 9.0?). I know many users (myself included)
are doing this in a very rudimentary way by disabling particular access
types on a per session basis "set enable_seqscan=off; set
enable_hashjoin=off; QUERY set enable_seqscan=on; set
enable_hashjoin=on;"... I'd hack up a patch if I had the time at least
=)

Best regards, Patrick

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tom Lane
Sent: Monday, March 22, 2010 12:22 PM
To: Christian Brink
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan
on query produces faster

Christian Brink <cbrink@r-stream.com> writes:
>           ->  Nested Loop  (cost=0.01..2416.59 rows=22477 width=4)
> (actual time=0.595..2.150 rows=225 loops=1)
>                 ->  Index Scan using sysstrings_pkey on sysstrings
> (cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1
loops=1)
>                       Index Cond: (id = 'net/Console/Employee/Day End
> Time'::text)
>                 ->  Index Scan using sales_tranzdate_index on sales s

> (cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687
> rows=225 loops=1)
>                       Index Cond: ((s.tranzdate >= ('2010-02-15'::date
+
> (sysstrings.data)::time without time zone)) AND (s.tranzdate <
> ('2010-02-16'::date + (sysstrings.data)::time without time zone)))
>                       Filter: ((NOT void) AND (NOT suspended))

The fundamental reason why you're getting a bad plan choice is the
factor-of-100 estimation error here.  I'm not sure you can do a whole
lot about that without rethinking the query --- in particular I would
suggest trying to get rid of the non-constant range bounds.  You're
apparently already plugging in an external variable for the date,
so maybe you could handle the time of day similarly instead of joining
to sysstrings for it.

            regards, tom lane

--
Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster

От
Christian Brink
Дата:
On 03/22/2010 03:21 PM, Tom Lane wrote:
> The fundamental reason why you're getting a bad plan choice is the
> factor-of-100 estimation error here.  I'm not sure you can do a whole
> lot about that without rethinking the query --- in particular I would
> suggest trying to get rid of the non-constant range bounds.  You're
> apparently already plugging in an external variable for the date,
> so maybe you could handle the time of day similarly instead of joining
> to sysstrings for it.
>
>

Tom & Peter,

I thought you might like to know the outcome of this. I was able to get
the 8.0 and the 8.2 planner to correctly run the query. There were 2
issues. As Tom pointed out the the 'systrings' lookup seems to be the
main culprit. Which makes sense. How can the planner know how to run the
query when it doesn't know approximately what it will bracket the until
the query has started?

The other part of the solution is bit concerning. I had to do a 'dump
and load' (and vacuum analyze)  to get the planner to work correctly
even after I rewrote the query. FYI I had run 'VACUUM ANALYZE' (and
sometimes 'REINDEX TABLE x') between each test.


--
Christian Brink