Обсуждение: BUG #13817: Query planner strange choose while select/count small part of big table - complete

Поиск
Список
Период
Сортировка

BUG #13817: Query planner strange choose while select/count small part of big table - complete

От
sienkomarcin@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      13817
Logged by:          Marcin_S
Email address:      sienkomarcin@gmail.com
PostgreSQL version: 9.4.5
Operating system:   Windows 7 x64
Description:

Hi,

First sorry for not completed last bug (browser handle it too fast :). Here
is complete version:


I've check todo list but i can't find exact problem i'm reporting. It seems
like query planner fires not needed sequence scan by all rows in table when
only a few rows were picked. I will explain on example. I have 2 tables
with
relation 1 - n. Index on foreign key exist. On n side there are a lot o
rows
(about 4 000 000 in my db). When i select/count rows form n table
joined/subselected from 1-side than planner traverse every of 4 000 000
rows
although it has only 3 in "valid from where conditions". Execution takes
2191 ms on my pc. When i turn off sequence scan it takes 12 ms.

Below i include query and explain analyze output with sequence scan on and
off. Similar situation is without subquery (with joins).

Query:

select
        *
        --count(this_.id) as y0_
    from
--
        shipment_order_sub_item this_
    left outer join
        shipment_order_item orderitem1_
            on this_.shipment_order_item_id=orderitem1_.id
    where
        orderitem1_.id in (
            select
                oi_.id as y0_
            from
                shipment_order_item oi_
            inner join
                shipment_order order1_
                    on oi_.order_id=order1_.id
            inner join
                court_department courtdepar3_
                    on
order1_.court_department_id=courtdepar3_.department_id
            inner join
                application_user user2_
                    on order1_.user_id=user2_.users_id
            where
                order1_.id = 610
                and order1_.court_department_id in (1,292,32768 )
        );

sequence scan on (default):

"Hash Join  (cost=12.88..108087.13 rows=3992515 width=177) (actual
time=2426.511..2426.515 rows=3 loops=1)"
"  Hash Cond: (this_.shipment_order_item_id = orderitem1_.id)"
"  ->  Seq Scan on shipment_order_sub_item this_  (cost=0.00..90031.15
rows=3992515 width=125) (actual time=0.022..1071.889 rows=3992110 loops=1)"
"  ->  Hash  (cost=12.87..12.87 rows=1 width=60) (actual time=0.175..0.175
rows=3 loops=1)"
"        Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"        ->  Hash Semi Join  (cost=11.69..12.87 rows=1 width=60) (actual
time=0.166..0.173 rows=3 loops=1)"
"              Hash Cond: (orderitem1_.id = oi_.id)"
"              ->  Seq Scan on shipment_order_item orderitem1_
(cost=0.00..1.13 rows=13 width=52) (actual time=0.009..0.010 rows=13
loops=1)"
"              ->  Hash  (cost=11.68..11.68 rows=1 width=8) (actual
time=0.144..0.144 rows=3 loops=1)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"                    ->  Nested Loop  (cost=0.28..11.68 rows=1 width=8)
(actual time=0.079..0.139 rows=3 loops=1)"
"                          Join Filter: (order1_.user_id =
user2_.users_id)"
"                          Rows Removed by Join Filter: 9"
"                          ->  Nested Loop  (cost=0.28..10.59 rows=1
width=16) (actual time=0.063..0.105 rows=3 loops=1)"
"                                ->  Nested Loop  (cost=0.00..2.29 rows=1
width=24) (actual time=0.027..0.052 rows=3 loops=1)"
"                                      ->  Seq Scan on shipment_order_item
oi_  (cost=0.00..1.16 rows=1 width=16) (actual time=0.007..0.010 rows=3
loops=1)"
"                                            Filter: (order_id = 610)"
"                                            Rows Removed by Filter: 10"
"                                      ->  Seq Scan on shipment_order
order1_  (cost=0.00..1.11 rows=1 width=24) (actual time=0.006..0.007 rows=1
loops=3)"
"                                            Filter: ((id = 610) AND
(court_department_id = ANY ('{1,292,32768}'::bigint[])))"
"                                            Rows Removed by Filter: 6"
"                                ->  Index Only Scan using
court_department_pkey on court_department courtdepar3_  (cost=0.28..8.29
rows=1 width=8) (actual time=0.013..0.014 rows=1 loops=3)"
"                                      Index Cond: (department_id =
order1_.court_department_id)"
"                                      Heap Fetches: 3"
"                          ->  Seq Scan on application_user user2_
(cost=0.00..1.04 rows=4 width=8) (actual time=0.003..0.004 rows=4 loops=3)"
"Planning time: 1.765 ms"
"Execution time: 2426.724 ms"


sequence scan off:

"Nested Loop  (cost=1.24..138607.34 rows=3992515 width=177) (actual
time=0.151..0.168 rows=3 loops=1)"
"  ->  Nested Loop Semi Join  (cost=0.81..45.31 rows=1 width=60) (actual
time=0.139..0.143 rows=3 loops=1)"
"        Join Filter: (orderitem1_.id = oi_.id)"
"        Rows Removed by Join Filter: 33"
"        ->  Index Scan using shipment_order_item_pkey on
shipment_order_item orderitem1_  (cost=0.14..12.33 rows=13 width=52) (actual
time=0.018..0.023 rows=13 loops=1)"
"        ->  Materialize  (cost=0.67..32.79 rows=1 width=8) (actual
time=0.004..0.007 rows=3 loops=13)"
"              ->  Nested Loop  (cost=0.67..32.78 rows=1 width=8) (actual
time=0.049..0.086 rows=3 loops=1)"
"                    ->  Nested Loop  (cost=0.54..24.62 rows=1 width=16)
(actual time=0.042..0.069 rows=3 loops=1)"
"                          ->  Nested Loop  (cost=0.27..16.32 rows=1
width=24) (actual time=0.028..0.044 rows=3 loops=1)"
"                                ->  Index Scan using
fk_fk_mt3v1s9gl7rr0lo83il8gy00d_idx on shipment_order_item oi_
(cost=0.14..8.15 rows=1 width=16) (actual time=0.014..0.017 rows=3
loops=1)"
"                                      Index Cond: (order_id = 610)"
"                                ->  Index Scan using shipment_order_pkey on
shipment_order order1_  (cost=0.13..8.15 rows=1 width=24) (actual
time=0.006..0.007 rows=1 loops=3)"
"                                      Index Cond: (id = 610)"
"                                      Filter: (court_department_id = ANY
('{1,292,32768}'::bigint[]))"
"                          ->  Index Only Scan using court_department_pkey
on court_department courtdepar3_  (cost=0.28..8.29 rows=1 width=8) (actual
time=0.005..0.006 rows=1 loops=3)"
"                                Index Cond: (department_id =
order1_.court_department_id)"
"                                Heap Fetches: 3"
"                    ->  Index Only Scan using application_user_pkey on
application_user user2_  (cost=0.13..8.15 rows=1 width=8) (actual
time=0.003..0.004 rows=1 loops=3)"
"                          Index Cond: (users_id = order1_.user_id)"
"                          Heap Fetches: 3"
"  ->  Index Scan using fk_fk_rr5k2n8n892ye3uposkh3xp6v_idx on
shipment_order_sub_item this_  (cost=0.43..98636.88 rows=3992515 width=125)
(actual time=0.006..0.006 rows=1 loops=3)"
"        Index Cond: (shipment_order_item_id = orderitem1_.id)"
"Planning time: 1.552 ms"
"Execution time: 0.311 ms"

Re: BUG #13817: Query planner strange choose while select/count small part of big table - complete

От
Feike Steenbergen
Дата:
Hi,

Looking at the explained plans, it makes sense the seq scan is preferred as
it is expected to be cheaper.

Seq scan enabled:
Hash Join  (cost=12.88..108087.13 rows=3992515 width=177)

The main costs (83%) of this plan are:
Seq Scan on shipment_order_sub_item this_ (cost=0.00..90,031.15
rows=3,992,515 width=125)

Seq scan disabled:
Nested Loop  (cost=1.24..138607.34 rows=3992515 width=177)

The main costs (71%) of this plan are:
Index Scan using fk_fk_rr5k2n8n892ye3uposkh3xp6v_idx on
 shipment_order_sub_item this_  (cost=0.43..98636.88 rows=3992515 width=125)

The expected costs for the seq scan enabled is 78% of that of the disable
seq scan.



Questions:

- What kind of disks do you have (ssd's?)
- Is the seq scan slow if you repeat it immediately after the first run?
- What is your current random_page_cost
- Could you try to reissue the query after lowering the value of
random_page_cost, like so:

    SET random_page_cost TO 1.5;

- Could you redo the explain with

    EXPLAIN (ANALYZE ON, BUFFERS ON)

regards,

Feike

Re: BUG #13817: Query planner strange choose while select/count small part of big table - complete

От
Feike Steenbergen
Дата:
Hi,

When analysing your plain using explain.depesz.com, we can clearly see where
the reason lies the seq scan is preferred:

http://explain.depesz.com/s/Rus

This line clearly stands out on the "rows x" column:

(cost=1.24..126,838.50 rows=3,992,515 width=8) (actual time=1.242..1.265
rows=3 loops=1)

The optimizer expects 4 million rows to be returned, which would mean 4
million
index scans on using fk_fk_rr5k2n8n892ye3uposkh3xp6v_idx, yet in reality 3
rows
are returned.
That's quite a difference.

- Could you ANALYZE all tables involved and reissue the query?

Looking at your query, it seems a rewrite may help some,
how does the following perform and look like when explain analyzed?

SELECT *
  FROM shipment_order_sub_item this_
  -- Dropping LEFT JOIN, as we are later filtering on orderitem1_.id, which
  -- would make this an INNER JOIN again
  JOIN shipment_order_item orderitem1_ ON
(this_.shipment_order_item_id=orderitem1_.id)
  JOIN shipment_order_item oi_ ON (orderitem1_.id=oi_.id)
  JOIN shipment_order order_1 ON (oi_.order_id=order1_.id)
  JOIN court_department courtdepar3_ ON
(order1_.court_department_id=courtdepar3_.department_id)
  JOIN application_user user2_ ON (order1_.user_id=user2_.users_id)
 WHERE order1_.id = 610
   AND order1_.court_department_id in (1,292,32768 );

regards,

Feike

Re: BUG #13817: Query planner strange choose while select/count small part of big table - complete

От
Marcin Sieńko
Дата:
Hi,

I've done it already. I've also already check rewrited query. No changes. But i have a idea. There is generated base and there are "normal" shipment_order_item with 3 - 100 maybe 10000 shipment_order_sub_item but one has 3 992 102. Could it be a problem?

If yes that is my bad to ask You because I'belive there will not occur in production and counting could bye match faster.

regards,
Marcin

Pozdrawiam,
Marcin

2015-12-17 16:00 GMT+01:00 Feike Steenbergen <feikesteenbergen@gmail.com>:
Hi,

When analysing your plain using explain.depesz.com, we can clearly see where
the reason lies the seq scan is preferred:


This line clearly stands out on the "rows x" column:

(cost=1.24..126,838.50 rows=3,992,515 width=8) (actual time=1.242..1.265 rows=3 loops=1)

The optimizer expects 4 million rows to be returned, which would mean 4 million
index scans on using fk_fk_rr5k2n8n892ye3uposkh3xp6v_idx, yet in reality 3 rows
are returned.
That's quite a difference.

- Could you ANALYZE all tables involved and reissue the query?

Looking at your query, it seems a rewrite may help some,
how does the following perform and look like when explain analyzed?

SELECT *
  FROM shipment_order_sub_item this_
  -- Dropping LEFT JOIN, as we are later filtering on orderitem1_.id, which
  -- would make this an INNER JOIN again
  JOIN shipment_order_item orderitem1_ ON (this_.shipment_order_item_id=orderitem1_.id)
  JOIN shipment_order_item oi_ ON (orderitem1_.id=oi_.id)
  JOIN shipment_order order_1 ON (oi_.order_id=order1_.id)
  JOIN court_department courtdepar3_ ON (order1_.court_department_id=courtdepar3_.department_id)
  JOIN application_user user2_ ON (order1_.user_id=user2_.users_id)
 WHERE order1_.id = 610
   AND order1_.court_department_id in (1,292,32768 );

regards,

Feike






Re: BUG #13817: Query planner strange choose while select/count small part of big table - complete

От
Marcin Sieńko
Дата:
Hi,

Thanks for reply.

First about questions:
- Yes I'have ssd disk but my co-worker has the same base on traditional disk and behaviour is similar (but 2 times slower).
- It is always slow regardless of repetitions and disks.
- I think default. I didn't change it.  Yes it is 4.0.
- After setting random_page_cost to 1.5 - no difference.
- There are result with buffers on:

sequence scan on:

"Aggregate  (cost=118063.42..118063.43 rows=1 width=8) (actual time=2478.105..2478.105 rows=1 loops=1)"
"  Buffers: shared hit=401 read=49722"
"  ->  Hash Join  (cost=7.88..108082.13 rows=3992515 width=8) (actual time=2478.087..2478.091 rows=3 loops=1)"
"        Hash Cond: (this_.shipment_order_item_id = orderitem1_.id)"
"        Buffers: shared hit=401 read=49722"
"        ->  Seq Scan on shipment_order_sub_item this_  (cost=0.00..90031.15 rows=3992515 width=16) (actual time=0.081..1134.453 rows=3992110 loops=1)"
"              Buffers: shared hit=384 read=49722"
"        ->  Hash  (cost=7.87..7.87 rows=1 width=16) (actual time=0.151..0.151 rows=3 loops=1)"
"              Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"              Buffers: shared hit=17"
"              ->  Hash Semi Join  (cost=6.69..7.87 rows=1 width=16) (actual time=0.143..0.149 rows=3 loops=1)"
"                    Hash Cond: (orderitem1_.id = oi_.id)"
"                    Buffers: shared hit=17"
"                    ->  Seq Scan on shipment_order_item orderitem1_  (cost=0.00..1.13 rows=13 width=8) (actual time=0.005..0.006 rows=13 loops=1)"
"                          Buffers: shared hit=1"
"                    ->  Hash  (cost=6.68..6.68 rows=1 width=8) (actual time=0.122..0.122 rows=3 loops=1)"
"                          Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"                          Buffers: shared hit=16"
"                          ->  Nested Loop  (cost=0.28..6.68 rows=1 width=8) (actual time=0.059..0.116 rows=3 loops=1)"
"                                Join Filter: (order1_.user_id = user2_.users_id)"
"                                Rows Removed by Join Filter: 9"
"                                Buffers: shared hit=16"
"                                ->  Nested Loop  (cost=0.28..5.59 rows=1 width=16) (actual time=0.047..0.086 rows=3 loops=1)"
"                                      Buffers: shared hit=13"
"                                      ->  Nested Loop  (cost=0.00..2.29 rows=1 width=24) (actual time=0.026..0.051 rows=3 loops=1)"
"                                            Buffers: shared hit=4"
"                                            ->  Seq Scan on shipment_order_item oi_  (cost=0.00..1.16 rows=1 width=16) (actual time=0.007..0.011 rows=3 loops=1)"
"                                                  Filter: (order_id = 610)"
"                                                  Rows Removed by Filter: 10"
"                                                  Buffers: shared hit=1"
"                                            ->  Seq Scan on shipment_order order1_  (cost=0.00..1.11 rows=1 width=24) (actual time=0.006..0.006 rows=1 loops=3)"
"                                                  Filter: ((id = 610) AND (court_department_id = ANY ('{1,292,32768}'::bigint[])))"
"                                                  Rows Removed by Filter: 6"
"                                                  Buffers: shared hit=3"
"                                      ->  Index Only Scan using court_department_pkey on court_department courtdepar3_  (cost=0.28..3.29 rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=3)"
"                                            Index Cond: (department_id = order1_.court_department_id)"
"                                            Heap Fetches: 3"
"                                            Buffers: shared hit=9"
"                                ->  Seq Scan on application_user user2_  (cost=0.00..1.04 rows=4 width=8) (actual time=0.002..0.003 rows=4 loops=3)"
"                                      Buffers: shared hit=3"
"Planning time: 1.547 ms"
"Execution time: 2478.306 ms"


sequence scan off:

"Aggregate  (cost=136819.78..136819.79 rows=1 width=8) (actual time=1.274..1.274 rows=1 loops=1)"
"  Buffers: shared hit=32 read=7"
"  ->  Nested Loop  (cost=1.24..126838.50 rows=3992515 width=8) (actual time=1.242..1.265 rows=3 loops=1)"
"        Buffers: shared hit=32 read=7"
"        ->  Nested Loop Semi Join  (cost=0.81..17.81 rows=1 width=16) (actual time=1.112..1.119 rows=3 loops=1)"
"              Join Filter: (orderitem1_.id = oi_.id)"
"              Rows Removed by Join Filter: 33"
"              Buffers: shared hit=24 read=4"
"              ->  Index Only Scan using shipment_order_item_pkey on shipment_order_item orderitem1_  (cost=0.14..4.83 rows=13 width=8) (actual time=0.508..0.515 rows=13 loops=1)"
"                    Heap Fetches: 13"
"                    Buffers: shared hit=1 read=1"
"              ->  Materialize  (cost=0.67..12.79 rows=1 width=8) (actual time=0.040..0.044 rows=3 loops=13)"
"                    Buffers: shared hit=23 read=3"
"                    ->  Nested Loop  (cost=0.67..12.78 rows=1 width=8) (actual time=0.509..0.556 rows=3 loops=1)"
"                          Buffers: shared hit=23 read=3"
"                          ->  Nested Loop  (cost=0.54..9.62 rows=1 width=16) (actual time=0.217..0.253 rows=3 loops=1)"
"                                Buffers: shared hit=18 read=2"
"                                ->  Nested Loop  (cost=0.27..6.32 rows=1 width=24) (actual time=0.198..0.221 rows=3 loops=1)"
"                                      Buffers: shared hit=9 read=2"
"                                      ->  Index Scan using fk_fk_mt3v1s9gl7rr0lo83il8gy00d_idx on shipment_order_item oi_  (cost=0.14..3.15 rows=1 width=16) (actual time=0.129..0.132 rows=3 loops=1)"
"                                            Index Cond: (order_id = 610)"
"                                            Buffers: shared hit=4 read=1"
"                                      ->  Index Scan using shipment_order_pkey on shipment_order order1_  (cost=0.13..3.15 rows=1 width=24) (actual time=0.026..0.027 rows=1 loops=3)"
"                                            Index Cond: (id = 610)"
"                                            Filter: (court_department_id = ANY ('{1,292,32768}'::bigint[]))"
"                                            Buffers: shared hit=5 read=1"
"                                ->  Index Only Scan using court_department_pkey on court_department courtdepar3_  (cost=0.28..3.29 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=3)"
"                                      Index Cond: (department_id = order1_.court_department_id)"
"                                      Heap Fetches: 3"
"                                      Buffers: shared hit=9"
"                          ->  Index Only Scan using application_user_pkey on application_user user2_  (cost=0.13..3.15 rows=1 width=8) (actual time=0.097..0.098 rows=1 loops=3)"
"                                Index Cond: (users_id = order1_.user_id)"
"                                Heap Fetches: 3"
"                                Buffers: shared hit=5 read=1"
"        ->  Index Scan using fk_fk_rr5k2n8n892ye3uposkh3xp6v_idx on shipment_order_sub_item this_  (cost=0.43..86895.54 rows=3992515 width=16) (actual time=0.045..0.046 rows=1 loops=3)"
"              Index Cond: (shipment_order_item_id = orderitem1_.id)"
"              Buffers: shared hit=8 read=3"
"Planning time: 1.684 ms"
"Execution time: 1.448 ms"


I see that calculated cost is less for seq scan but i don't get how it is possible if fk_fk_rr5k2n8n892ye3uposkh3xp6v_idx intable shipment_order_sub_item is on shipment_order_item_id.  If I need to pick 8 of 4 millions rows exactly by this index why it is not used and sequence scan instead? :|. Disabling seq scan works 2216/12=185 times faster.

If it could help I can sent my data base (about 75 MB) on ftp or whenever.

Thanks again.

Regards,
Marcin Sieńko



Pozdrawiam,
Marcin

2015-12-16 17:18 GMT+01:00 Feike Steenbergen <feikesteenbergen@gmail.com>:
Hi,

Looking at the explained plans, it makes sense the seq scan is preferred as it is expected to be cheaper.

Seq scan enabled:
Hash Join  (cost=12.88..108087.13 rows=3992515 width=177)

The main costs (83%) of this plan are:
Seq Scan on shipment_order_sub_item this_ (cost=0.00..90,031.15 rows=3,992,515 width=125)

Seq scan disabled:
Nested Loop  (cost=1.24..138607.34 rows=3992515 width=177)

The main costs (71%) of this plan are:
Index Scan using fk_fk_rr5k2n8n892ye3uposkh3xp6v_idx on  shipment_order_sub_item this_  (cost=0.43..98636.88 rows=3992515 width=125)

The expected costs for the seq scan enabled is 78% of that of the disable seq scan.



Questions:

- What kind of disks do you have (ssd's?)
- Is the seq scan slow if you repeat it immediately after the first run?
- What is your current random_page_cost
- Could you try to reissue the query after lowering the value of random_page_cost, like so:

    SET random_page_cost TO 1.5;

- Could you redo the explain with

    EXPLAIN (ANALYZE ON, BUFFERS ON)

regards,

Feike

Re: BUG #13817: Query planner strange choose while select/count small part of big table - complete

От
"Peter J. Holzer"
Дата:
On 2015-12-14 11:45:27 +0000, sienkomarcin@gmail.com wrote:
> The following bug has been logged on the website:
>=20
> Bug reference:      13817
> Logged by:          Marcin_S
> Email address:      sienkomarcin@gmail.com
> PostgreSQL version: 9.4.5
> Operating system:   Windows 7 x64
> Description:       =20
>=20
> Hi,
>=20
> First sorry for not completed last bug (browser handle it too fast :). He=
re
> is complete version:
>=20
>=20
> I've check todo list but i can't find exact problem i'm reporting. It see=
ms
> like query planner fires not needed sequence scan by all rows in table wh=
en
> only a few rows were picked.

Can you try this with version 9.5 (currently in beta)? This looks very
similar to a problem I reported some time ago and which is fixed in 9.5.

    hp

--=20
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants=20
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Re: BUG #13817: Query planner strange choose while select/count small part of big table - complete

От
Marcin Sieńko
Дата:
Hi,

On 9.5 beta it still works similar. Did You see my last reply?:

I've done it already. I've also already check rewrited query. No changes. But i have a idea. There is generated base and there are "normal" shipment_order_item with 3 - 100 maybe 10000 shipment_order_sub_item but one has 3 992 102. Could it be a problem?

If yes that is my bad to ask You because I'belive there will not occur in production and counting could bye match faster.

I think it could be this problem. Additional I checked number of sub items and there is 6 items and 5 contains 2 sub items and 1 contains 3 992 102 sub items. For me that is only "not normal" data problem and i think it will be working correct when  data is normal.

P.S. I checked all cases on 9.5 and there are no differences.

Regards,
Marcin

2015-12-18 8:09 GMT+01:00 Peter J. Holzer <hjp-pgsql@hjp.at>:
On 2015-12-14 11:45:27 +0000, sienkomarcin@gmail.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      13817
> Logged by:          Marcin_S
> Email address:      sienkomarcin@gmail.com
> PostgreSQL version: 9.4.5
> Operating system:   Windows 7 x64
> Description:
>
> Hi,
>
> First sorry for not completed last bug (browser handle it too fast :). Here
> is complete version:
>
>
> I've check todo list but i can't find exact problem i'm reporting. It seems
> like query planner fires not needed sequence scan by all rows in table when
> only a few rows were picked.

Can you try this with version 9.5 (currently in beta)? This looks very
similar to a problem I reported some time ago and which is fixed in 9.5.

        hp

--
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Re: BUG #13817: Query planner strange choose while select/count small part of big table - complete

От
"Peter J. Holzer"
Дата:
On 2015-12-17 16:19:24 +0100, Marcin Sie=C5=84ko wrote:
> I've done it already. I've also already check rewrited query. No changes.=
 But i
> have a idea. There is generated base and there are "normal" shipment_orde=
r_item
> with 3 - 100 maybe 10000 shipment_order_sub_item but one has 3 992 102. C=
ould
> it be a problem?

A very skewed distribution can be a problem, yes. If the planner doesn't
know whether you will select a shipment_order_item with 3 subitems or
one with 3 million subitems, it can only use the average number of
subitems - which may be bad for either case.

But your plan looks worse than a single outlier of 3992102.

The nested loop semi join in line 3 of http://explain.depesz.com/s/Rus
(I'm using that as the reference because it's easier to read than your
line-wrapped plan and because the lines are numbered and easier to
reference) is expected to return 1 row. But the nested loop at line 2 is
expected to return 3992515 rows. Unless I'm missing something this means
that planner thinks that there are *on average* 3992515 subitems per
item.=20

Can you post the output of these queries?

select * from pg_stats
where tablename=3D'shipment_order_sub_item' and attname=3D'shipment_order_i=
tem_id';

select count(*) from shipment_order_sub_item;

select count(distinct(shipment_order_item_id))
=66rom shipment_order_sub_item;


> If yes that is my bad to ask You because I'belive there will not occur in
> production and counting could bye match faster.

Yeah, if you have unrealistic test data, it will produce unrealistic
results and you can probably ignore them. Although in my experience it
is usually the other way around: The test data has a nice, regular
distribution and the production data is skewed a lot more than anyone
expected.

    hp

--=20
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants=20
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Re: BUG #13817: Query planner strange choose while select/count small part of big table - complete

От
"Peter J. Holzer"
Дата:
On 2015-12-18 10:28:23 +0100, Marcin Sie=C5=84ko wrote:
> On 9.5 beta it still works similar.

Pity.

    hp

--=20
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants=20
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/