Обсуждение: Different plan for very similar queries

Поиск
Список
Период
Сортировка

Different plan for very similar queries

От
"Peter J. Holzer"
Дата:
wdsah=> select version();
                                            version
-----------------------------------------------------------------------------------------------
 PostgreSQL 9.1.15 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
(1 row)

I plan to upgrade to Debian 8 (with Postgres 9.4) soon, so the problem
may go away, but I would still like to understand what is happening
here.

IRL the queries are a bit more complicated (they involve two additional
tables), but I can demonstrate it with just two:

wdsah=> \d facttable_stat_fta4
              Table "public.facttable_stat_fta4"
       Column        |            Type             | Modifiers
---------------------+-----------------------------+-----------
 macrobondtimeseries | character varying(255)      | not null
 date                | date                        | not null
 value               | double precision            |
 berechnungsart      | character varying           |
 einheit             | character varying           |
 kurzbezeichnung     | character varying           |
 partnerregion       | character varying           |
 og                  | character varying           |
 sitcr4              | character varying           |
 warenstrom          | character varying           |
 valid_from          | timestamp without time zone |
 from_job_queue_id   | integer                     |
 kommentar           | character varying           |
Indexes:
    "facttable_stat_fta4_pkey" PRIMARY KEY, btree (macrobondtimeseries, date)
    "facttable_stat_fta4_berechnungsart_idx" btree (berechnungsart)
    "facttable_stat_fta4_einheit_idx" btree (einheit)
    "facttable_stat_fta4_og_idx" btree (og)
    "facttable_stat_fta4_partnerregion_idx" btree (partnerregion)
    "facttable_stat_fta4_sitcr4_idx" btree (sitcr4)
    "facttable_stat_fta4_warenstrom_idx" btree (warenstrom)

wdsah=> select count(*) from facttable_stat_fta4;
  count
----------
 43577941
(1 row)

wdsah=> \d term
                              Table "public.term"
         Column         |            Type             |       Modifiers
------------------------+-----------------------------+------------------------
 facttablename          | character varying           |
 columnname             | character varying           |
 term                   | character varying           |
 concept_id             | integer                     | not null
 language               | character varying           |
 register               | character varying           |
 hidden                 | boolean                     |
 cleansing_job_queue_id | integer                     | not null default (-1)
 meta_insert_dt         | timestamp without time zone | not null default now()
 meta_update_dt         | timestamp without time zone |
 valid_from             | timestamp without time zone |
 from_job_queue_id      | integer                     |
Indexes:
    "term_concept_id_idx" btree (concept_id)
    "term_facttablename_columnname_idx" btree (facttablename, columnname)
    "term_facttablename_idx" btree (facttablename)
    "term_facttablename_idx1" btree (facttablename) WHERE facttablename IS NOT NULL AND columnname::text =
'macrobondtimeseries'::text
    "term_language_idx" btree (language)
    "term_register_idx" btree (register)
    "term_term_ftidx" gin (to_tsvector('simple'::regconfig, term::text))
    "term_term_idx" btree (term)
Check constraints:
    "term_facttablename_needs_columnname_chk" CHECK (facttablename IS NULL OR columnname IS NOT NULL)
Foreign-key constraints:
    "term_concept_id_fkey" FOREIGN KEY (concept_id) REFERENCES concept(id) DEFERRABLE

wdsah=> select count(*) from term;
  count
---------
 6109087
(1 row)

The purpose of the query is to find all terms which occur is a given
column of the facttable (again, IRL this is a bit more complicated),
basically an optimized version of select distinct.

Some of my columns have very few distinct members:

wdsah=> select * from pg_stats where tablename='facttable_stat_fta4' and attname in ('einheit', 'berechnungsart',
'warenstrom');
 schemaname |      tablename      |    attname     | inherited | null_frac | avg_width | n_distinct | most_common_vals
| most_common_freqs  | histogram_bounds | correlation  

------------+---------------------+----------------+-----------+-----------+-----------+------------+------------------+---------------------+------------------+-------------
 public     | facttable_stat_fta4 | berechnungsart | f         |         0 |         2 |          2 | {n,m}
|{0.515167,0.484833} |                  |    0.509567 
 public     | facttable_stat_fta4 | einheit        | f         |         0 |         3 |          2 | {EUR,kg}
|{0.515167,0.484833} |                  |    0.491197 
 public     | facttable_stat_fta4 | warenstrom     | f         |         0 |         2 |          2 | {X,M}
|{0.580267,0.419733} |                  |   -0.461344 
(3 rows)


And for some of them my query is indeed very fast:

wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
        from term t where facttablename='facttable_stat_fta4' and columnname='einheit' and exists (select 1 from
facttable_stat_fta4f where f.einheit=t.term ); 
                                                                               QUERY PLAN
                                                 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=0.00..384860.48 rows=1 width=81) (actual time=0.061..0.119 rows=2 loops=1)
   ->  Index Scan using term_facttablename_columnname_idx on term t  (cost=0.00..391.46 rows=636 width=81) (actual
time=0.028..0.030rows=3 loops=1) 
         Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text = 'einheit'::text))
   ->  Index Scan using facttable_stat_fta4_einheit_idx on facttable_stat_fta4 f  (cost=0.00..384457.80 rows=21788970
width=3)(actual time=0.027..0.027 rows=1 loops=3) 
         Index Cond: ((einheit)::text = (t.term)::text)
 Total runtime: 0.173 ms
(6 rows)

0.17 ms. Much faster than a plain select distinct over a table with 43
million rows could ever hope to be.

warenstrom is very similar and the columns with more distinct values
aren't that bad either.

But for column berechnungsart the result is bad:

wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
        from term t where facttablename='facttable_stat_fta4' and columnname='berechnungsart' and exists (select 1 from
facttable_stat_fta4f where f.berechnungsart=t.term ); 
                                                                                         QUERY PLAN
                                                                    

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Semi Join  (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1)
   Merge Cond: ((t.term)::text = (f.berechnungsart)::text)
   ->  Index Scan using term_term_idx on term t  (cost=0.00..319880.73 rows=636 width=81) (actual
time=7703.809..7703.938rows=3 loops=1) 
         Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'berechnungsart'::text))
   ->  Index Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f  (cost=0.00..2545748.85
rows=43577940width=2) (actual time=0.089..16263.582 rows=21336180 loops=1) 
 Total runtime: 30948.648 ms
(6 rows)

Over 30 seconds! That's almost 200'000 times slower.

The weird thing is that for this particular table einheit and
berechnungsart actually have a 1:1 correspondence. Not only is the
frequency the same, every row where einheit='kg' has berechnungsart='m'
and every row where einheit='EUR' has berechnungsart='n'. So I don't see
why two different execution plans are chosen.

    hp

--
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Вложения

Re: Different plan for very similar queries

От
"Peter J. Holzer"
Дата:
On 2015-05-29 10:55:44 +0200, Peter J. Holzer wrote:
> wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
>         from term t where facttablename='facttable_stat_fta4' and columnname='einheit' and exists (select 1 from
facttable_stat_fta4f where f.einheit=t.term ); 
>                                                                                QUERY PLAN
                                                   
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop Semi Join  (cost=0.00..384860.48 rows=1 width=81) (actual time=0.061..0.119 rows=2 loops=1)
>    ->  Index Scan using term_facttablename_columnname_idx on term t  (cost=0.00..391.46 rows=636 width=81) (actual
time=0.028..0.030rows=3 loops=1) 
>          Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'einheit'::text))
>    ->  Index Scan using facttable_stat_fta4_einheit_idx on facttable_stat_fta4 f  (cost=0.00..384457.80 rows=21788970
width=3)(actual time=0.027..0.027 rows=1 loops=3) 
>          Index Cond: ((einheit)::text = (t.term)::text)
>  Total runtime: 0.173 ms
> (6 rows)
>
[...]
> wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
>         from term t where facttablename='facttable_stat_fta4' and columnname='berechnungsart' and exists (select 1
fromfacttable_stat_fta4 f where f.berechnungsart=t.term ); 
>                                                                                          QUERY PLAN
                                                                      
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Merge Semi Join  (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1)
>    Merge Cond: ((t.term)::text = (f.berechnungsart)::text)
>    ->  Index Scan using term_term_idx on term t  (cost=0.00..319880.73 rows=636 width=81) (actual
time=7703.809..7703.938rows=3 loops=1) 
>          Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'berechnungsart'::text))
>    ->  Index Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f  (cost=0.00..2545748.85
rows=43577940width=2) (actual time=0.089..16263.582 rows=21336180 loops=1) 
>  Total runtime: 30948.648 ms
> (6 rows)

A couple of additional observations:

The total cost of both queries is quite similar, so random variations
might push into one direction or the other. Indeed, after dropping and
recreating indexes (I tried GIN indexes as suggested by Heikki on [1])
and calling analyze after each change, I have now reached a state where
both queries use the fast plan.

In the first case the query planner seems to add the cost of the two
index scans to get the total cost, despite the fact that for a semi join
the second index scan can be aborted after the first hit (so either the
cost of the second scan should be a lot less than 384457.80 or it needs
to be divided by a large factor for the semi join).

In the second case the cost of the second index scan (2545748.85) is
either completely ignored or divided by a large factor: It doesn't seem
to contribute much to the total cost.

    hp


[1] http://hlinnaka.iki.fi/2014/03/28/gin-as-a-substitute-for-bitmap-indexes/

--
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Вложения

Re: Different plan for very similar queries

От
Tomas Vondra
Дата:
Hi,

On 05/29/15 11:51, Peter J. Holzer wrote:
> A couple of additional observations:
>
> The total cost of both queries is quite similar, so random variations
> might push into one direction or the other. Indeed, after dropping
> and recreating indexes (I tried GIN indexes as suggested by Heikki on
> [1]) and calling analyze after each change, I have now reached a
> state where both queries use the fast plan.

I don't think bitmap indexes are particularly good match for this use
case. The queries need to check an existence of a few records, and btree
indexes are great for that - the first plan is very fast.

Why exactly does the second query use a much slower plan I'm not sure. I
believe I've found an issue in planning semi joins (reported to
pgsql-hackers a few minutes ago), but may be wrong and the code is OK.

Can you try forcing the same plan for the second query, using "enable"
flags? E.g.

    SET enable_mergejoin = off;

will disable the merge join, and push the optimizer towards a different
join type. You may have to disable a few more node types until you get
the same plan as for the first query, i.e.

    nestloop semi join
      -> index scan
      -> index scan

See this for more info:

    http://www.postgresql.org/docs/9.1/static/runtime-config-query.html

Also, have you tuned the PostgreSQL configuration? How?

Can you provide the dataset? Not necessarily all the columns, it should
be sufficient to provide the columns used in the join/where clauses:

     term -> facttablename, columnname, term
     facttable_stat_fta4 -> einheit, berechnungsart

That'd make reproducing the problem much easier.

> In the first case the query planner seems to add the cost of the two
> index scans to get the total cost, despite the fact that for a semi
> join the second index scan can be aborted after the first hit (so
> either the cost of the second scan should be a lot less than
> 384457.80 or it needs to be divided by a large factor for the semi
> join).
>
> In the second case the cost of the second index scan (2545748.85) is
> either completely ignored or divided by a large factor: It doesn't
> seem to contribute much to the total cost.

I believe this is a consequence of the semi join semantics, because the
explain plan contains "total" costs and row counts, as if the whole
relation was scanned (in this case all the 43M rows), but the optimizer
only propagates fraction of the cost estimate (depending on how much of
the relation it expects to scan). In this case it expects to scan a tiny
part of the index scan, so the impact on the total cost is small.

A bit confusing, yeah.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Different plan for very similar queries

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> Why exactly does the second query use a much slower plan I'm not sure. I
> believe I've found an issue in planning semi joins (reported to
> pgsql-hackers a few minutes ago), but may be wrong and the code is OK.

I think you are probably right that there's a bug there: the planner is
vastly overestimating the cost of the nestloop-with-inner-indexscan
plan.  However, the reason why the mergejoin plan gets chosen in some
cases seems to be that an additional estimation error is needed to make
that happen; otherwise the nestloop still comes out looking cheaper.
The undesirable case looks like:

>>  Merge Semi Join  (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1)
>>    Merge Cond: ((t.term)::text = (f.berechnungsart)::text)
>>    ->  Index Scan using term_term_idx on term t  (cost=0.00..319880.73 rows=636 width=81) (actual
time=7703.809..7703.938rows=3 loops=1) 
>>          Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'berechnungsart'::text))
>>    ->  Index Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f  (cost=0.00..2545748.85
rows=43577940width=2) (actual time=0.089..16263.582 rows=21336180 loops=1) 
>>  Total runtime: 30948.648 ms

Notice that it's estimating the cost of the join as significantly less
than the cost of the inner-side indexscan.  This means it believes that
the inner indexscan will not be run to completion.  That's not because of
semijoin semantics; there's no stop-after-first-match benefit for mergejoins.
It must be that it thinks the range of keys on the outer side of the join
is much less than the range of keys on the inner.  Given that it knows
that facttable_stat_fta4.berechnungsart only contains the values "m"
and "n", this implies that it thinks term.term only contains "m" and
not "n".  So this estimation error presumably comes from "n" not having
been seen in ANALYZE's last sample of term.term, and raising the stats
target for term.term would probably be a way to fix that.

However, this would all be moot if the cost estimate for the nestloop
plan were nearer to reality.  Since you started a separate -hackers
thread for that issue, let's go discuss that there.

            regards, tom lane


Re: Different plan for very similar queries

От
"Peter J. Holzer"
Дата:
[I've seen in -hackers that you already seem to have a fix]

On 2015-05-30 15:04:34 -0400, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> > Why exactly does the second query use a much slower plan I'm not sure. I
> > believe I've found an issue in planning semi joins (reported to
> > pgsql-hackers a few minutes ago), but may be wrong and the code is OK.
>
> I think you are probably right that there's a bug there: the planner is
> vastly overestimating the cost of the nestloop-with-inner-indexscan
> plan.  However, the reason why the mergejoin plan gets chosen in some
> cases seems to be that an additional estimation error is needed to make
> that happen; otherwise the nestloop still comes out looking cheaper.
> The undesirable case looks like:
>
> >>  Merge Semi Join  (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1)
> >>    Merge Cond: ((t.term)::text = (f.berechnungsart)::text)
> >>    ->  Index Scan using term_term_idx on term t  (cost=0.00..319880.73 rows=636 width=81) (actual
time=7703.809..7703.938rows=3 loops=1) 
> >>          Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'berechnungsart'::text))

Just noticed that this is a bit strange, too:

This scans the whole index term_term_idx and for every row found it
checks the table for the filter condition. So it has to read the whole
index and the whole table, right? But the planner estimates that it will
return only 636 rows (out of 6.1E6), so using
term_facttablename_columnname_idx to extract those 636 and then sorting
them should be quite a bit faster (even just a plain full table scan
and then sorting should be faster).

> >>    ->  Index Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f  (cost=0.00..2545748.85
rows=43577940width=2) (actual time=0.089..16263.582 rows=21336180 loops=1) 
> >>  Total runtime: 30948.648 ms
>
> Notice that it's estimating the cost of the join as significantly less
> than the cost of the inner-side indexscan.  This means it believes that
> the inner indexscan will not be run to completion.  That's not because of
> semijoin semantics; there's no stop-after-first-match benefit for mergejoins.
> It must be that it thinks the range of keys on the outer side of the join
> is much less than the range of keys on the inner.  Given that it knows
> that facttable_stat_fta4.berechnungsart only contains the values "m"
> and "n", this implies that it thinks term.term only contains "m" and
> not "n".  So this estimation error presumably comes from "n" not having
> been seen in ANALYZE's last sample of term.term, and raising the stats
> target for term.term would probably be a way to fix that.

The term column has a relatively flat distribution and about 3.5 million
distinct values (in 6.1 million rows). So it's unlikely for any specific
value to be included in an ANALYZE sample, and an error to assume that a
value doesn't occur at all just because it wasn't seen by ANALYZE. OTOH,
a value which is seen by ANALYZE will have its frequency overestimated
by quite a bit. But that shouldn't influence whether the whole index
needs to be scanned or not.

Another thought: For the merge semi join postgresql doesn't actually
have to scan the whole inner index. It can skip from the first 'm' entry
to the first 'n' entry reading only a few non-leaf blocks, skipping many
leaf blocks in the process. The times (7703.917..30948.271) indicate that
it doesn't actually do this, but maybe the planner assumes it does?

I also suspected that the culprit is the "columnname" column. That one has a very
skewed distribution:

wdsah=> select columnname, count(*) from term group by columnname order by count(*);
     columnname      |  count
---------------------+---------
 warenstrom          |       3
 einheit             |       3
 berechnungsart      |       3
 og                  |      26
 berichtsregion      |     242
 partnerregion       |     246
 sitcr4              |    4719
 kurzbezeichnung     | 1221319
 macrobondtimeseries | 1221320
                     | 3661206
(10 rows)

So random variation in the sample could throw off the estimated
frequencies of the the least frequent columnnames by quite a bit.

But given that both plans estimated the number of rows returned by the
outer index scan as 636, that was probably a red herring.

But there does seem to be a connection to this column: In one case
pg_stats contained n_distinct=7 and only the two most common values.
Then the plan looked like this:

wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
        from term t where facttablename='facttable_stat_fta4' and columnname='warenstrom' and exists (select 1 from
facttable_stat_fta4f where f.warenstrom=t.term ); 
                                                                                 QUERY PLAN
                                                    

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=0.00..386141.13 rows=1 width=81) (actual time=0.202..0.253 rows=2 loops=1)
   ->  Index Scan using term_facttablename_columnname_idx on term t  (cost=0.00..264.03 rows=437 width=81) (actual
time=0.097..0.099rows=3 loops=1) 
         Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'warenstrom'::text))
   ->  Index Scan using facttable_stat_fta4_warenstrom_idx on facttable_stat_fta4 f  (cost=0.00..385869.36
rows=21787688width=2) (actual time=0.033..0.033 rows=1 loops=3) 
         Index Cond: ((warenstrom)::text = (t.term)::text)
 Total runtime: 0.314 ms

But after another analye, pg_stats contained n_distinct=5 and the 5 most
common values. And now the plan looks like this (after disabling
bitmapscan and hashagg):

wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
        from term t where facttablename='facttable_stat_fta4' and columnname='warenstrom' and exists (select 1 from
facttable_stat_fta4f where f.warenstrom=t.term ); 
                                                                                 QUERY PLAN
                                                     

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=0.00..2124104.23 rows=1 width=81) (actual time=0.080..0.129 rows=2 loops=1)
   ->  Index Scan using term_facttablename_columnname_idx on term t  (cost=0.00..3.23 rows=1 width=81) (actual
time=0.030..0.031rows=3 loops=1) 
         Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'warenstrom'::text))
   ->  Index Scan using facttable_stat_fta4_warenstrom_idx on facttable_stat_fta4 f  (cost=0.00..2124100.90
rows=21787688width=2) (actual time=0.029..0.029 rows=1 loops=3) 
         Index Cond: ((warenstrom)::text = (t.term)::text)
 Total runtime: 0.180 ms
(6 rows)

The estimated number of rows in the outer scan is way more accurate in
the second plan (1 instead of 437), but for some reason the cost for the
inner scan is higher (2124100.90 instead of 385869.36) although it
should be lower (we only need to search for 1 value, not 437)

(There was no analyze on facttable_stat_fta4 (automatic or manual) on
facttable_stat_fta4 between those two tests, so the statistics on
facttable_stat_fta4 shouldn't have changed - only those for term.)

    hp

--
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Вложения

Re: Different plan for very similar queries

От
Tom Lane
Дата:
"Peter J. Holzer" <hjp-pgsql@hjp.at> writes:
>>> Merge Semi Join  (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1)
>>>   Merge Cond: ((t.term)::text = (f.berechnungsart)::text)
>>>   ->  Index Scan using term_term_idx on term t  (cost=0.00..319880.73 rows=636 width=81) (actual
time=7703.809..7703.938rows=3 loops=1) 
>>>         Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'berechnungsart'::text))

> Just noticed that this is a bit strange, too:

> This scans the whole index term_term_idx and for every row found it
> checks the table for the filter condition. So it has to read the whole
> index and the whole table, right? But the planner estimates that it will
> return only 636 rows (out of 6.1E6), so using
> term_facttablename_columnname_idx to extract those 636 and then sorting
> them should be quite a bit faster (even just a plain full table scan
> and then sorting should be faster).

Hm.  I do not see that here with Tomas' sample data, neither on HEAD nor
9.1: I always get a scan using term_facttablename_columnname_idx.  I agree
your plan looks strange.  Can you create some sample data that reproduces
that particular misbehavior?

            regards, tom lane


Re: Different plan for very similar queries

От
Tomas Vondra
Дата:
On 05/31/15 13:00, Peter J. Holzer wrote:
> [I've seen in -hackers that you already seem to have a fix]
>
> On 2015-05-30 15:04:34 -0400, Tom Lane wrote:
>> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>> Why exactly does the second query use a much slower plan I'm not sure. I
>>> believe I've found an issue in planning semi joins (reported to
>>> pgsql-hackers a few minutes ago), but may be wrong and the code is OK.
>>
>> I think you are probably right that there's a bug there: the planner is
>> vastly overestimating the cost of the nestloop-with-inner-indexscan
>> plan.  However, the reason why the mergejoin plan gets chosen in some
>> cases seems to be that an additional estimation error is needed to make
>> that happen; otherwise the nestloop still comes out looking cheaper.
>> The undesirable case looks like:
>>
>>>>   Merge Semi Join  (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1)
>>>>     Merge Cond: ((t.term)::text = (f.berechnungsart)::text)
>>>>     ->  Index Scan using term_term_idx on term t  (cost=0.00..319880.73 rows=636 width=81) (actual
time=7703.809..7703.938rows=3 loops=1) 
>>>>           Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'berechnungsart'::text))
>
> Just noticed that this is a bit strange, too:
>
> This scans the whole index term_term_idx and for every row found it
> checks the table for the filter condition. So it has to read the whole
> index and the whole table, right? But the planner estimates that it will
> return only 636 rows (out of 6.1E6), so using
> term_facttablename_columnname_idx to extract those 636 and then sorting
> them should be quite a bit faster (even just a plain full table scan
> and then sorting should be faster).

That seems a bit strange, yes. I don't see why a simple index scan (with
Index Cond), expected to produce 636, should be more expensive than
scanning the whole index (with a Filter). Even if there's an additional
Sort node, sorting those 636 rows.

But I've been unable to reproduce that (both on 9.1 and HEAD) without
significant 'SET enable_*' gymnastics, so I'm not sure why that happens.
Don't you have some 'enable_sort=off' or something like that?

A test case with a data set would help a lot, in this case.


> Another thought: For the merge semi join postgresql doesn't actually
> have to scan the whole inner index. It can skip from the first 'm' entry
> to the first 'n' entry reading only a few non-leaf blocks, skipping many
> leaf blocks in the process. The times (7703.917..30948.271) indicate that
> it doesn't actually do this, but maybe the planner assumes it does?

How would it know how far to skip? I mean, assume you're on the first
'n' entry - how do you know where is the first 'm' entry?

If you only really need to check existence, a nested loop with an inner
index scan is probably the right thing anyway, especially if the number
of outer rows (and thus loops performed) is quite low. This is clearly
demonstrated by the first plan in this thread:

                   QUERY PLAN
------------------------------------------------------------- ...
  Nested Loop Semi Join  (cost=0.00..384860.48 rows=1 width=81 ...
    ->  Index Scan using term_facttablename_columnname_idx on  ...
          Index Cond: (((facttablename)::text = 'facttable_sta ...
    ->  Index Scan using facttable_stat_fta4_einheit_idx on fa ...
          Index Cond: ((einheit)::text = (t.term)::text)
  Total runtime: 0.173 ms
(6 rows)

This is probably the best plan you can get in cases like this ...

>
> I also suspected that the culprit is the "columnname" column. That one has a very
> skewed distribution:
>
> wdsah=> select columnname, count(*) from term group by columnname order by count(*);
>       columnname      |  count
> ---------------------+---------
>   warenstrom          |       3
>   einheit             |       3
>   berechnungsart      |       3
>   og                  |      26
>   berichtsregion      |     242
>   partnerregion       |     246
>   sitcr4              |    4719
>   kurzbezeichnung     | 1221319
>   macrobondtimeseries | 1221320
>                       | 3661206
> (10 rows)
>
> So random variation in the sample could throw off the estimated
> frequencies of the the least frequent columnnames by quite a bit.
>
> But given that both plans estimated the number of rows returned by the
> outer index scan as 636, that was probably a red herring.
>
> But there does seem to be a connection to this column: In one case
> pg_stats contained n_distinct=7 and only the two most common values.
> Then the plan looked like this:
>
> wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
>          from term t where facttablename='facttable_stat_fta4' and columnname='warenstrom' and exists (select 1 from
facttable_stat_fta4f where f.warenstrom=t.term ); 
>                                                                                   QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Nested Loop Semi Join  (cost=0.00..386141.13 rows=1 width=81) (actual time=0.202..0.253 rows=2 loops=1)
>     ->  Index Scan using term_facttablename_columnname_idx on term t  (cost=0.00..264.03 rows=437 width=81) (actual
time=0.097..0.099rows=3 loops=1) 
>           Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'warenstrom'::text))
>     ->  Index Scan using facttable_stat_fta4_warenstrom_idx on facttable_stat_fta4 f  (cost=0.00..385869.36
rows=21787688width=2) (actual time=0.033..0.033 rows=1 loops=3) 
>           Index Cond: ((warenstrom)::text = (t.term)::text)
>   Total runtime: 0.314 ms
>
> But after another analye, pg_stats contained n_distinct=5 and the 5 most
> common values. And now the plan looks like this (after disabling
> bitmapscan and hashagg):
>
> wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
>          from term t where facttablename='facttable_stat_fta4' and columnname='warenstrom' and exists (select 1 from
facttable_stat_fta4f where f.warenstrom=t.term ); 
>                                                                                   QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Nested Loop Semi Join  (cost=0.00..2124104.23 rows=1 width=81) (actual time=0.080..0.129 rows=2 loops=1)
>     ->  Index Scan using term_facttablename_columnname_idx on term t  (cost=0.00..3.23 rows=1 width=81) (actual
time=0.030..0.031rows=3 loops=1) 
>           Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'warenstrom'::text))
>     ->  Index Scan using facttable_stat_fta4_warenstrom_idx on facttable_stat_fta4 f  (cost=0.00..2124100.90
rows=21787688width=2) (actual time=0.029..0.029 rows=1 loops=3) 
>           Index Cond: ((warenstrom)::text = (t.term)::text)
>   Total runtime: 0.180 ms
> (6 rows)
>
> The estimated number of rows in the outer scan is way more accurate in
> the second plan (1 instead of 437), but for some reason the cost for the
> inner scan is higher (2124100.90 instead of 385869.36) although it
> should be lower (we only need to search for 1 value, not 437)

As I explained in the pgsql-hackers thread (sorry for the confusion, it
seemed like a more appropriate place for discussion on planner
internals), I believe this happens because of only comparing total costs
of the inner paths. That is a problem, because in this care we only
really care about the first tuple, not about all the tuples. Because
that's what semijoin needs.

Could you post the plan with bitmapscan enabled? I'd bet the cost will
be somewhere between 385869.36 and 2124100.90, so that small variations
in the statistics (and thus costs) cause such plan changes. When the
indexscan gets below bitmapscan, you get the first (good) plan,
otherwise you get the other one.

Also, this may be easily caused by variations within the same
statistics, e.g. between columns or between values within the same
column (so a MCV item with 51% gets one plan, item with 49% gets a
different plan).

This might be improved by using a larger sample - 9.1 uses default
statistics target 100, so samples with 30k rows. Try increasing that to
1000 (SET default_statistics_target=1000) - that should give more
consistent statistics and hopefully stable plans (but maybe in the wrong
direction).

Also, you might tweak the cost variables a bit, to make the cost
differences more significant. But that's secondary I guess, as the costs
(385869 vs. 2124100) are quite far away.

> (There was no analyze on facttable_stat_fta4 (automatic or manual) on
> facttable_stat_fta4 between those two tests, so the statistics on
> facttable_stat_fta4 shouldn't have changed - only those for term.)

So maybe there was autoanalyze, because otherwise it really should be
the same in both plans ...


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Different plan for very similar queries

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 05/31/15 13:00, Peter J. Holzer wrote:
>> (There was no analyze on facttable_stat_fta4 (automatic or manual) on
>> facttable_stat_fta4 between those two tests, so the statistics on
>> facttable_stat_fta4 shouldn't have changed - only those for term.)

> So maybe there was autoanalyze, because otherwise it really should be
> the same in both plans ...

No, because that's the inside of a nestloop with significantly different
outer-side rowcount estimates.  The first case gets a benefit from the
expectation that it will be re-executed many times (see the impact of
loop_count on cost_index).

            regards, tom lane


Re: Different plan for very similar queries

От
Tomas Vondra
Дата:

On 05/31/15 18:22, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On 05/31/15 13:00, Peter J. Holzer wrote:
>>> (There was no analyze on facttable_stat_fta4 (automatic or manual) on
>>> facttable_stat_fta4 between those two tests, so the statistics on
>>> facttable_stat_fta4 shouldn't have changed - only those for term.)
>
>> So maybe there was autoanalyze, because otherwise it really should be
>> the same in both plans ...
>
> No, because that's the inside of a nestloop with significantly different
> outer-side rowcount estimates.  The first case gets a benefit from the
> expectation that it will be re-executed many times (see the impact of
> loop_count on cost_index).

Meh, I got confused by the plan a bit - I thought there's a problem in
the outer path (e.g. change of row count). But actually this is the path
scanning the 'term' table, so the change is expected there.

The fact that the index scan cost 'suddenly' grows from 386k to 2M is
confusing at first, but yeah - it's caused by the 'averaging' in
cost_index() depending on loop_count.

But I think this does not really change the problem with eliminating
inner paths solely on the basis of total cost - in fact it probably
makes it slightly worse, because the cost also depends on estimates in
the outer path (while the bitmapscan does not).


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Different plan for very similar queries

От
"Peter J. Holzer"
Дата:
On 2015-05-29 10:55:44 +0200, Peter J. Holzer wrote:
> wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
>         from term t where facttablename='facttable_stat_fta4' and columnname='einheit' and exists (select 1 from
facttable_stat_fta4f where f.einheit=t.term ); 
>                                                                                QUERY PLAN
                                                   
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop Semi Join  (cost=0.00..384860.48 rows=1 width=81) (actual time=0.061..0.119 rows=2 loops=1)
>    ->  Index Scan using term_facttablename_columnname_idx on term t  (cost=0.00..391.46 rows=636 width=81) (actual
time=0.028..0.030rows=3 loops=1) 
>          Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'einheit'::text))
>    ->  Index Scan using facttable_stat_fta4_einheit_idx on facttable_stat_fta4 f  (cost=0.00..384457.80 rows=21788970
width=3)(actual time=0.027..0.027 rows=1 loops=3) 
>          Index Cond: ((einheit)::text = (t.term)::text)
>  Total runtime: 0.173 ms
> (6 rows)
>
> 0.17 ms. Much faster than a plain select distinct over a table with 43
> million rows could ever hope to be.
>
> warenstrom is very similar and the columns with more distinct values
> aren't that bad either.
>
> But for column berechnungsart the result is bad:
>
> wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
>         from term t where facttablename='facttable_stat_fta4' and columnname='berechnungsart' and exists (select 1
fromfacttable_stat_fta4 f where f.berechnungsart=t.term ); 
>                                                                                          QUERY PLAN
                                                                      
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Merge Semi Join  (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1)
>    Merge Cond: ((t.term)::text = (f.berechnungsart)::text)
>    ->  Index Scan using term_term_idx on term t  (cost=0.00..319880.73 rows=636 width=81) (actual
time=7703.809..7703.938rows=3 loops=1) 
>          Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'berechnungsart'::text))
>    ->  Index Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f  (cost=0.00..2545748.85
rows=43577940width=2) (actual time=0.089..16263.582 rows=21336180 loops=1) 
>  Total runtime: 30948.648 ms
> (6 rows)
>
> Over 30 seconds! That's almost 200'000 times slower.

First I'd like to apologize for dropping out of the thread without
providing a test data set. I actually had one prepared (without
confidential data), but I wanted to make sure that I could reproduce the
problem with the test data, and I didn't get around to it for a week or
two and then I went on vacation ...

Anyway, in the meantime you released 9.5alpha (thanks for that, I
probably would have compiled a snapshot sooner or later, but installing
debian packages is just a lot more convenient - I hope you get a lot of
useful feedback) and I installed that this weekend.

I am happy to report that the problem appears to be solved. All the
queries of this type I threw at the database finish in a few
milliseconds now.

    hp


--
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Вложения