Обсуждение: Query became very slow after 9.6 -> 10 upgrade

От:
Dmitry Shalashov
Дата:

Hi!

I've seen few letters like this on mailing list and for some reason thought that probably it won't happen to us, but here I am lol.

It's "nestloop hits again" situation.

I'll try to provide plan from 9.6 later, but right now I have only plan from 10.1.

It was running under 3 seconds (it's our default timeout) and now it runs for 12 minutes.

\d adroom: https://pastebin.com/vBrPGtxT (3800 rows)
\d adroom_stat: https://pastebin.com/CkBArCC9 (47mln rows, 1.5mln satisfy condition on day column)
\d domains: https://pastebin.com/65hk7YCm (73000 rows)

All three tables are analyzed.

EXPLAIN ANALYZE with nestloop off: https://pastebin.com/zX35CPCV (0.8s)

Regarding server parameters - it's a mighty beast with 2x E5-2630 v3, 192Gb of RAM and two very, very fast NVME server class SSD's in RAID1.

What can I do with it?


Also maybe this will be useful:

1st query, runs under 1ms
select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts

2nd query that uses 1st one, runs under 3 ms
select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1

3rd query which returns 1.5mln rows, runs in about 0.6s
SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp)

BUT if I'll add to 3rd query one additional condition, which is basically 2nd query, it will ran same 12 minutes:
SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp) AND domain_id IN (select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1)

Plan of last query:
 Nested Loop  (cost=88.63..25617.31 rows=491 width=16) (actual time=3.512..733248.271 rows=1442797 loops=1)
   ->  HashAggregate  (cost=88.06..88.07 rows=1 width=4) (actual time=3.380..13.561 rows=3043 loops=1)
         Group Key: (unnest(adroom.domain_ids))
         ->  HashAggregate  (cost=88.03..88.04 rows=1 width=4) (actual time=2.199..2.607 rows=3043 loops=1)
               Group Key: unnest(adroom.domain_ids)
               ->  ProjectSet  (cost=0.28..87.78 rows=100 width=4) (actual time=0.701..1.339 rows=3173 loops=1)
                     ->  Index Scan using adroom_active_idx on adroom  (cost=0.28..87.27 rows=1 width=167) (actual time=0.688..1.040 rows=4 loops=1)
                           Index Cond: ((CURRENT_TIMESTAMP >= start_ts) AND (CURRENT_TIMESTAMP <= stop_ts))
                           Filter: (((groups -> 0) ->> 'provider'::text) ~ '^target_mail_ru'::text)
                           Rows Removed by Filter: 41
   ->  Index Scan using adroom_stat_day_adroom_id_domain_id_url_id_is_wlabp_idx on adroom_stat  (cost=0.58..25524.33 rows=491 width=16) (actual time=104.847..240.846 rows=474 loops=3043)
         Index Cond: ((day >= date_trunc('day'::text, (CURRENT_TIMESTAMP - '7 days'::interval))) AND (day <= date_trunc('day'::text, CURRENT_TIMESTAMP)) AND (domain_id = (unnest(adroom.domain_ids))))
 Planning time: 1.580 ms
 Execution time: 733331.740 ms

Dmitry Shalashov, relap.io & surfingbird.ru
От:
"Alex Ignatov"
Дата:

Hello!

What about :

select name,setting from pg_settings where name like '%_cost';

 

--

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

 

 

From: Dmitry Shalashov [mailto:]
Sent: Wednesday, November 22, 2017 5:14 PM
To:
Subject: Query became very slow after 9.6 -> 10 upgrade

 

Hi!

 

I've seen few letters like this on mailing list and for some reason thought that probably it won't happen to us, but here I am lol.

 

It's "nestloop hits again" situation.

 

I'll try to provide plan from 9.6 later, but right now I have only plan from 10.1.

 

It was running under 3 seconds (it's our default timeout) and now it runs for 12 minutes.

 

\d adroom: https://pastebin.com/vBrPGtxT (3800 rows)

\d adroom_stat: https://pastebin.com/CkBArCC9 (47mln rows, 1.5mln satisfy condition on day column)

\d domains: https://pastebin.com/65hk7YCm (73000 rows)

 

All three tables are analyzed.

 

EXPLAIN ANALYZE with nestloop off: https://pastebin.com/zX35CPCV (0.8s)

 

Regarding server parameters - it's a mighty beast with 2x E5-2630 v3, 192Gb of RAM and two very, very fast NVME server class SSD's in RAID1.

 

What can I do with it?

 

 

Also maybe this will be useful:

 

1st query, runs under 1ms

select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts

 

2nd query that uses 1st one, runs under 3 ms

select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1

 

3rd query which returns 1.5mln rows, runs in about 0.6s

SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp)

 

BUT if I'll add to 3rd query one additional condition, which is basically 2nd query, it will ran same 12 minutes:

SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp) AND domain_id IN (select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1)

 

Plan of last query:

 Nested Loop  (cost=88.63..25617.31 rows=491 width=16) (actual time=3.512..733248.271 rows=1442797 loops=1)

   ->  HashAggregate  (cost=88.06..88.07 rows=1 width=4) (actual time=3.380..13.561 rows=3043 loops=1)

         Group Key: (unnest(adroom.domain_ids))

         ->  HashAggregate  (cost=88.03..88.04 rows=1 width=4) (actual time=2.199..2.607 rows=3043 loops=1)

               Group Key: unnest(adroom.domain_ids)

               ->  ProjectSet  (cost=0.28..87.78 rows=100 width=4) (actual time=0.701..1.339 rows=3173 loops=1)

                     ->  Index Scan using adroom_active_idx on adroom  (cost=0.28..87.27 rows=1 width=167) (actual time=0.688..1.040 rows=4 loops=1)

                           Index Cond: ((CURRENT_TIMESTAMP >= start_ts) AND (CURRENT_TIMESTAMP <= stop_ts))

                           Filter: (((groups -> 0) ->> 'provider'::text) ~ '^target_mail_ru'::text)

                           Rows Removed by Filter: 41

   ->  Index Scan using adroom_stat_day_adroom_id_domain_id_url_id_is_wlabp_idx on adroom_stat  (cost=0.58..25524.33 rows=491 width=16) (actual time=104.847..240.846 rows=474 loops=3043)

         Index Cond: ((day >= date_trunc('day'::text, (CURRENT_TIMESTAMP - '7 days'::interval))) AND (day <= date_trunc('day'::text, CURRENT_TIMESTAMP)) AND (domain_id = (unnest(adroom.domain_ids))))

 Planning time: 1.580 ms

 Execution time: 733331.740 ms

 

Dmitry Shalashov, relap.io & surfingbird.ru

От:
Dmitry Shalashov
Дата:

Sure, here it goes:

         name         | setting
----------------------+---------
 cpu_index_tuple_cost | 0.005
 cpu_operator_cost    | 0.0025
 cpu_tuple_cost       | 0.01
 parallel_setup_cost  | 1000
 parallel_tuple_cost  | 0.1
 random_page_cost     | 1
 seq_page_cost        | 1


Dmitry Shalashov, relap.io & surfingbird.ru

2017-11-22 17:24 GMT+03:00 Alex Ignatov <>:

Hello!

What about :

select name,setting from pg_settings where name like '%_cost';

 

--

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

 

 

From: Dmitry Shalashov [mailto:]
Sent: Wednesday, November 22, 2017 5:14 PM
To: pgsql-performance@postgresql.org
Subject: Query became very slow after 9.6 -> 10 upgrade

 

Hi!

 

I've seen few letters like this on mailing list and for some reason thought that probably it won't happen to us, but here I am lol.

 

It's "nestloop hits again" situation.

 

I'll try to provide plan from 9.6 later, but right now I have only plan from 10.1.

 

It was running under 3 seconds (it's our default timeout) and now it runs for 12 minutes.

 

\d adroom: https://pastebin.com/vBrPGtxT (3800 rows)

\d adroom_stat: https://pastebin.com/CkBArCC9 (47mln rows, 1.5mln satisfy condition on day column)

\d domains: https://pastebin.com/65hk7YCm (73000 rows)

 

All three tables are analyzed.

 

EXPLAIN ANALYZE with nestloop off: https://pastebin.com/zX35CPCV (0.8s)

 

Regarding server parameters - it's a mighty beast with 2x E5-2630 v3, 192Gb of RAM and two very, very fast NVME server class SSD's in RAID1.

 

What can I do with it?

 

 

Also maybe this will be useful:

 

1st query, runs under 1ms

select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts

 

2nd query that uses 1st one, runs under 3 ms

select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1

 

3rd query which returns 1.5mln rows, runs in about 0.6s

SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp)

 

BUT if I'll add to 3rd query one additional condition, which is basically 2nd query, it will ran same 12 minutes:

SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp) AND domain_id IN (select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1)

 

Plan of last query:

 Nested Loop  (cost=88.63..25617.31 rows=491 width=16) (actual time=3.512..733248.271 rows=1442797 loops=1)

   ->  HashAggregate  (cost=88.06..88.07 rows=1 width=4) (actual time=3.380..13.561 rows=3043 loops=1)

         Group Key: (unnest(adroom.domain_ids))

         ->  HashAggregate  (cost=88.03..88.04 rows=1 width=4) (actual time=2.199..2.607 rows=3043 loops=1)

               Group Key: unnest(adroom.domain_ids)

               ->  ProjectSet  (cost=0.28..87.78 rows=100 width=4) (actual time=0.701..1.339 rows=3173 loops=1)

                     ->  Index Scan using adroom_active_idx on adroom  (cost=0.28..87.27 rows=1 width=167) (actual time=0.688..1.040 rows=4 loops=1)

                           Index Cond: ((CURRENT_TIMESTAMP >= start_ts) AND (CURRENT_TIMESTAMP <= stop_ts))

                           Filter: (((groups -> 0) ->> 'provider'::text) ~ '^target_mail_ru'::text)

                           Rows Removed by Filter: 41

   ->  Index Scan using adroom_stat_day_adroom_id_domain_id_url_id_is_wlabp_idx on adroom_stat  (cost=0.58..25524.33 rows=491 width=16) (actual time=104.847..240.846 rows=474 loops=3043)

         Index Cond: ((day >= date_trunc('day'::text, (CURRENT_TIMESTAMP - '7 days'::interval))) AND (day <= date_trunc('day'::text, CURRENT_TIMESTAMP)) AND (domain_id = (unnest(adroom.domain_ids))))

 Planning time: 1.580 ms

 Execution time: 733331.740 ms

 

Dmitry Shalashov, relap.io & surfingbird.ru


От:
"Alex Ignatov"
Дата:

Here is my select right after initdb:

 

postgres=# select name,setting from pg_settings where name like '%_cost';

         name         | setting

----------------------+---------

cpu_index_tuple_cost | 0.005

cpu_operator_cost    | 0.0025

cpu_tuple_cost       | 0.01

parallel_setup_cost  | 1000

parallel_tuple_cost  | 0.1

random_page_cost     | 4

seq_page_cost        | 1

 

 

Can you generate plan with random_page_cost     = 4?

 

 

--

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

 

From: Dmitry Shalashov [mailto:]
Sent: Wednesday, November 22, 2017 5:29 PM
To: Alex Ignatov <>
Cc:
Subject: Re: Query became very slow after 9.6 -> 10 upgrade

 

Sure, here it goes:

 

         name         | setting

----------------------+---------

 cpu_index_tuple_cost | 0.005

 cpu_operator_cost    | 0.0025

 cpu_tuple_cost       | 0.01

 parallel_setup_cost  | 1000

 parallel_tuple_cost  | 0.1

 random_page_cost     | 1

 seq_page_cost        | 1


 

Dmitry Shalashov, relap.io & surfingbird.ru

 

2017-11-22 17:24 GMT+03:00 Alex Ignatov <>:

Hello!

What about :

select name,setting from pg_settings where name like '%_cost';

 

--

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

 

 

From: Dmitry Shalashov [mailto:]
Sent: Wednesday, November 22, 2017 5:14 PM
To:
Subject: Query became very slow after 9.6 -> 10 upgrade

 

Hi!

 

I've seen few letters like this on mailing list and for some reason thought that probably it won't happen to us, but here I am lol.

 

It's "nestloop hits again" situation.

 

I'll try to provide plan from 9.6 later, but right now I have only plan from 10.1.

 

It was running under 3 seconds (it's our default timeout) and now it runs for 12 minutes.

 

\d adroom: https://pastebin.com/vBrPGtxT (3800 rows)

\d adroom_stat: https://pastebin.com/CkBArCC9 (47mln rows, 1.5mln satisfy condition on day column)

\d domains: https://pastebin.com/65hk7YCm (73000 rows)

 

All three tables are analyzed.

 

EXPLAIN ANALYZE with nestloop off: https://pastebin.com/zX35CPCV (0.8s)

 

Regarding server parameters - it's a mighty beast with 2x E5-2630 v3, 192Gb of RAM and two very, very fast NVME server class SSD's in RAID1.

 

What can I do with it?

 

 

Also maybe this will be useful:

 

1st query, runs under 1ms

select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts

 

2nd query that uses 1st one, runs under 3 ms

select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1

 

3rd query which returns 1.5mln rows, runs in about 0.6s

SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp)

 

BUT if I'll add to 3rd query one additional condition, which is basically 2nd query, it will ran same 12 minutes:

SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp) AND domain_id IN (select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1)

 

Plan of last query:

 Nested Loop  (cost=88.63..25617.31 rows=491 width=16) (actual time=3.512..733248.271 rows=1442797 loops=1)

   ->  HashAggregate  (cost=88.06..88.07 rows=1 width=4) (actual time=3.380..13.561 rows=3043 loops=1)

         Group Key: (unnest(adroom.domain_ids))

         ->  HashAggregate  (cost=88.03..88.04 rows=1 width=4) (actual time=2.199..2.607 rows=3043 loops=1)

               Group Key: unnest(adroom.domain_ids)

               ->  ProjectSet  (cost=0.28..87.78 rows=100 width=4) (actual time=0.701..1.339 rows=3173 loops=1)

                     ->  Index Scan using adroom_active_idx on adroom  (cost=0.28..87.27 rows=1 width=167) (actual time=0.688..1.040 rows=4 loops=1)

                           Index Cond: ((CURRENT_TIMESTAMP >= start_ts) AND (CURRENT_TIMESTAMP <= stop_ts))

                           Filter: (((groups -> 0) ->> 'provider'::text) ~ '^target_mail_ru'::text)

                           Rows Removed by Filter: 41

   ->  Index Scan using adroom_stat_day_adroom_id_domain_id_url_id_is_wlabp_idx on adroom_stat  (cost=0.58..25524.33 rows=491 width=16) (actual time=104.847..240.846 rows=474 loops=3043)

         Index Cond: ((day >= date_trunc('day'::text, (CURRENT_TIMESTAMP - '7 days'::interval))) AND (day <= date_trunc('day'::text, CURRENT_TIMESTAMP)) AND (domain_id = (unnest(adroom.domain_ids))))

 Planning time: 1.580 ms

 Execution time: 733331.740 ms

 

Dmitry Shalashov, relap.io & surfingbird.ru

 

От:
Dmitry Shalashov
Дата:

I believe that with SSD disks random_page_cost should be very cheap, but here you go (I decided to settle on EXPLAIN without ANALYZE this time, is this is good enough?):

 Sort  (cost=18410.26..18410.27 rows=1 width=63)
   Sort Key: (sum(st.shows)) DESC
   CTE a
     ->  Index Scan using adroom_active_idx on adroom  (cost=0.28..301.85 rows=1 width=233)
           Index Cond: ((CURRENT_TIMESTAMP >= start_ts) AND (CURRENT_TIMESTAMP <= stop_ts))
           Filter: (((groups -> 0) ->> 'provider'::text) ~ '^target_mail_ru'::text)
   CTE b
     ->  HashAggregate  (cost=1.28..1.29 rows=1 width=40)
           Group Key: a.provider, a.id, unnest(a.domain_ids)
           ->  ProjectSet  (cost=0.00..0.53 rows=100 width=40)
                 ->  CTE Scan on a  (cost=0.00..0.02 rows=1 width=68)
   ->  GroupAggregate  (cost=18107.09..18107.11 rows=1 width=63)
         Group Key: b.provider, d.domain
         ->  Sort  (cost=18107.09..18107.09 rows=1 width=55)
               Sort Key: b.provider, d.domain
               ->  Nested Loop  (cost=1.00..18107.08 rows=1 width=55)
                     Join Filter: ((b.id = st.adroom_id) AND (b.domain_id = st.domain_id))
                     ->  Nested Loop  (cost=0.42..8.46 rows=1 width=59)
                           ->  CTE Scan on b  (cost=0.00..0.02 rows=1 width=40)
                           ->  Index Scan using domains_pkey on domains d  (cost=0.42..8.44 rows=1 width=19)
                                 Index Cond: (id = b.domain_id)
                     ->  Index Scan using adroom_stat_day_adroom_id_domain_id_url_id_is_wlabp_idx on adroom_stat st  (cost=0.58..180
91.26 rows=491 width=16)
                           Index Cond: ((day >= date_trunc('day'::text, (CURRENT_TIMESTAMP - '7 days'::interval))) AND (day <= date_trunc('day'::text, CURRENT_TIMESTAMP)) AND (domain_id = d.id))


Dmitry Shalashov, relap.io & surfingbird.ru

2017-11-22 17:44 GMT+03:00 Alex Ignatov <>:

Here is my select right after initdb:

 

postgres=# select name,setting from pg_settings where name like '%_cost';

         name         | setting

----------------------+---------

cpu_index_tuple_cost | 0.005

cpu_operator_cost    | 0.0025

cpu_tuple_cost       | 0.01

parallel_setup_cost  | 1000

parallel_tuple_cost  | 0.1

random_page_cost     | 4

seq_page_cost        | 1

 

 

Can you generate plan with random_page_cost     = 4?

 

 

--

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

 

From: Dmitry Shalashov [mailto:]
Sent: Wednesday, November 22, 2017 5:29 PM
To: Alex Ignatov <>
Cc: pgsql-performance@postgresql.org
Subject: Re: Query became very slow after 9.6 -> 10 upgrade

 

Sure, here it goes:

 

         name         | setting

----------------------+---------

 cpu_index_tuple_cost | 0.005

 cpu_operator_cost    | 0.0025

 cpu_tuple_cost       | 0.01

 parallel_setup_cost  | 1000

 parallel_tuple_cost  | 0.1

 random_page_cost     | 1

 seq_page_cost        | 1


 

Dmitry Shalashov, relap.io & surfingbird.ru

 

2017-11-22 17:24 GMT+03:00 Alex Ignatov <>:

Hello!

What about :

select name,setting from pg_settings where name like '%_cost';

 

--

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

 

 

From: Dmitry Shalashov [mailto:]
Sent: Wednesday, November 22, 2017 5:14 PM
To: pgsql-performance@postgresql.org
Subject: Query became very slow after 9.6 -> 10 upgrade

 

Hi!

 

I've seen few letters like this on mailing list and for some reason thought that probably it won't happen to us, but here I am lol.

 

It's "nestloop hits again" situation.

 

I'll try to provide plan from 9.6 later, but right now I have only plan from 10.1.

 

It was running under 3 seconds (it's our default timeout) and now it runs for 12 minutes.

 

\d adroom: https://pastebin.com/vBrPGtxT (3800 rows)

\d adroom_stat: https://pastebin.com/CkBArCC9 (47mln rows, 1.5mln satisfy condition on day column)

\d domains: https://pastebin.com/65hk7YCm (73000 rows)

 

All three tables are analyzed.

 

EXPLAIN ANALYZE with nestloop off: https://pastebin.com/zX35CPCV (0.8s)

 

Regarding server parameters - it's a mighty beast with 2x E5-2630 v3, 192Gb of RAM and two very, very fast NVME server class SSD's in RAID1.

 

What can I do with it?

 

 

Also maybe this will be useful:

 

1st query, runs under 1ms

select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts

 

2nd query that uses 1st one, runs under 3 ms

select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1

 

3rd query which returns 1.5mln rows, runs in about 0.6s

SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp)

 

BUT if I'll add to 3rd query one additional condition, which is basically 2nd query, it will ran same 12 minutes:

SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', current_timestamp) AND domain_id IN (select distinct unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider, domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and current_timestamp between start_ts and stop_ts) t1)

 

Plan of last query:

 Nested Loop  (cost=88.63..25617.31 rows=491 width=16) (actual time=3.512..733248.271 rows=1442797 loops=1)

   ->  HashAggregate  (cost=88.06..88.07 rows=1 width=4) (actual time=3.380..13.561 rows=3043 loops=1)

         Group Key: (unnest(adroom.domain_ids))

         ->  HashAggregate  (cost=88.03..88.04 rows=1 width=4) (actual time=2.199..2.607 rows=3043 loops=1)

               Group Key: unnest(adroom.domain_ids)

               ->  ProjectSet  (cost=0.28..87.78 rows=100 width=4) (actual time=0.701..1.339 rows=3173 loops=1)

                     ->  Index Scan using adroom_active_idx on adroom  (cost=0.28..87.27 rows=1 width=167) (actual time=0.688..1.040 rows=4 loops=1)

                           Index Cond: ((CURRENT_TIMESTAMP >= start_ts) AND (CURRENT_TIMESTAMP <= stop_ts))

                           Filter: (((groups -> 0) ->> 'provider'::text) ~ '^target_mail_ru'::text)

                           Rows Removed by Filter: 41

   ->  Index Scan using adroom_stat_day_adroom_id_domain_id_url_id_is_wlabp_idx on adroom_stat  (cost=0.58..25524.33 rows=491 width=16) (actual time=104.847..240.846 rows=474 loops=3043)

         Index Cond: ((day >= date_trunc('day'::text, (CURRENT_TIMESTAMP - '7 days'::interval))) AND (day <= date_trunc('day'::text, CURRENT_TIMESTAMP)) AND (domain_id = (unnest(adroom.domain_ids))))

 Planning time: 1.580 ms

 Execution time: 733331.740 ms

 

Dmitry Shalashov, relap.io & surfingbird.ru

 


От:
Tomas Vondra
Дата:

IMHO the problems here are due to poor cardinality estimates.

For example in the first query, the problem is here:
   ->  Nested Loop  (cost=0.42..2.46 rows=1 width=59)                    (actual time=2.431..91.330 rows=3173 loops=1)
    ->  CTE Scan on b  (cost=0.00..0.02 rows=1 width=40)                          (actual time=2.407..23.115 rows=3173
loops=1)      ->  Index Scan using domains_pkey on domains d           (cost=0.42..2.44 rows=1 width=19)
(actualtime=0.018..0.018 rows=1 loops=3173)
 

That is, the database expects the CTE to return 1 row, but it returns
3173 of them, which makes the nested loop very inefficient.

Similarly for the other query, where this happens:
Nested Loop  (cost=88.63..25617.31 rows=491 width=16)             (actual time=3.512..733248.271 rows=1442797 loops=1)
-> HashAggregate  (cost=88.06..88.07 rows=1 width=4)                     (actual time=3.380..13.561 rows=3043 loops=1)
 

That is, about 1:3000 difference in both cases.

Those estimation errors seem to be caused by a condition that is almost
impossible to estimate, because in both queries it does this:
   groups->0->>'provider' ~ '^something'

That is, it's a regexp on an expression. You might try creating an index
on the expression (which is the only way to add expression statistics),
and reformulate the condition as LIKE (which I believe we can estimate
better than regular expressions, but I haven't tried).

So something like
   CREATE INDEX ON adroom ((groups->0->>'provider'));
   WHERE groups->0->>'provider' LIKE 'something%';

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


От:
Tom Lane
Дата:

Dmitry Shalashov <> writes:
> BUT if I'll add to 3rd query one additional condition, which is basically
> 2nd query, it will ran same 12 minutes:
> SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day
> between date_trunc('day', current_timestamp - interval '1 week') and
> date_trunc('day', current_timestamp) AND domain_id IN (select distinct
> unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider,
> domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and
> not is_paused and current_timestamp between start_ts and stop_ts) t1)

> Plan of last query:
>  Nested Loop  (cost=88.63..25617.31 rows=491 width=16) (actual
> time=3.512..733248.271 rows=1442797 loops=1)
>    ->  HashAggregate  (cost=88.06..88.07 rows=1 width=4) (actual
> time=3.380..13.561 rows=3043 loops=1)
>          Group Key: (unnest(adroom.domain_ids))
>          ->  HashAggregate  (cost=88.03..88.04 rows=1 width=4) (actual
> time=2.199..2.607 rows=3043 loops=1)
>                Group Key: unnest(adroom.domain_ids)
>                ->  ProjectSet  (cost=0.28..87.78 rows=100 width=4) (actual
> time=0.701..1.339 rows=3173 loops=1)

Hm, seems like the problem is that that lower HashAggregate is estimated
as having only one row out, which is way off and doesn't sound like a
particularly bright default estimate anyway.  (And then we're doing an
additional HashAggregate on top of that, which is useless --- implies
that something isn't realizing that the output of the SELECT DISTINCT
is already distinct.)

I'm suspicious that this is breakage from the work that was done on
targetlist SRFs in v10, but that's just a guess at this point.

Trying simple test queries involving WHERE x IN (SELECT DISTINCT
unnest(foo) FROM ...), I do not see a behavior like this, so there is some
not-very-obvious contributing factor in your situation.  Can you put
together a self-contained test case that produces a bogus one-row
estimate?  Extra points if it produces duplicate HashAgg steps.
        regards, tom lane


От:
Dmitry Shalashov
Дата:

Turns out we had not 9.6 but 9.5.

And query plan from 9.5 is:

 Sort  (cost=319008.18..319008.19 rows=1 width=556) (actual time=0.028..0.028 rows=0 loops=1)
   Sort Key: (sum(st.shows)) DESC
   Sort Method: quicksort  Memory: 25kB
   CTE a
     ->  Index Scan using adroom_active_idx on adroom  (cost=0.13..5.21 rows=1 width=584) (actual time=0.004..0.004 rows=0 loops=1)
           Index Cond: ((now() >= start_ts) AND (now() <= stop_ts))
           Filter: (((groups -> 0) ->> 'provider'::text) ~ '^target_mail_ru'::text)
   CTE b
     ->  HashAggregate  (cost=1.27..1.77 rows=100 width=68) (actual time=0.005..0.005 rows=0 loops=1)
           Group Key: a.provider, a.id, unnest(a.domain_ids)
           ->  CTE Scan on a  (cost=0.00..0.52 rows=100 width=68) (actual time=0.004..0.004 rows=0 loops=1)
   ->  HashAggregate  (cost=319001.17..319001.18 rows=1 width=556) (actual time=0.013..0.013 rows=0 loops=1)
         Group Key: b.provider, d.domain
         ->  Hash Join  (cost=16.55..319001.16 rows=1 width=556) (actual time=0.013..0.013 rows=0 loops=1)
               Hash Cond: ((st.adroom_id = b.id) AND (st.domain_id = b.domain_id))
               ->  Hash Join  (cost=13.05..318633.29 rows=48581 width=536) (never executed)
                     Hash Cond: (st.domain_id = d.id)
                     ->  Index Scan using adroom_stat_day_adroom_id_domain_id_url_id_is_wlabp_idx on adroom_stat st  (cost=0.58..313307.30 rows=1287388 width=16) (never executed)
                           Index Cond: ((day >= date_trunc('day'::text, (now() - '7 days'::interval))) AND (day <= date_trunc('day'::text, now())))
                     ->  Hash  (cost=11.10..11.10 rows=110 width=520) (never executed)
                           ->  Seq Scan on domains d  (cost=0.00..11.10 rows=110 width=520) (never executed)
               ->  Hash  (cost=2.00..2.00 rows=100 width=40) (actual time=0.007..0.007 rows=0 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 8kB
                     ->  CTE Scan on b  (cost=0.00..2.00 rows=100 width=40) (actual time=0.007..0.007 rows=0 loops=1)
 Planning time: 6.641 ms
 Execution time: 0.203 ms


Also I prepared test case for Tom and sent it to him.


Dmitry Shalashov, relap.io & surfingbird.ru

2017-11-22 18:19 GMT+03:00 Tom Lane <>:
Dmitry Shalashov <> writes:
> BUT if I'll add to 3rd query one additional condition, which is basically
> 2nd query, it will ran same 12 minutes:
> SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day
> between date_trunc('day', current_timestamp - interval '1 week') and
> date_trunc('day', current_timestamp) AND domain_id IN (select distinct
> unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider,
> domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and
> not is_paused and current_timestamp between start_ts and stop_ts) t1)

> Plan of last query:
>  Nested Loop  (cost=88.63..25617.31 rows=491 width=16) (actual
> time=3.512..733248.271 rows=1442797 loops=1)
>    ->  HashAggregate  (cost=88.06..88.07 rows=1 width=4) (actual
> time=3.380..13.561 rows=3043 loops=1)
>          Group Key: (unnest(adroom.domain_ids))
>          ->  HashAggregate  (cost=88.03..88.04 rows=1 width=4) (actual
> time=2.199..2.607 rows=3043 loops=1)
>                Group Key: unnest(adroom.domain_ids)
>                ->  ProjectSet  (cost=0.28..87.78 rows=100 width=4) (actual
> time=0.701..1.339 rows=3173 loops=1)

Hm, seems like the problem is that that lower HashAggregate is estimated
as having only one row out, which is way off and doesn't sound like a
particularly bright default estimate anyway.  (And then we're doing an
additional HashAggregate on top of that, which is useless --- implies
that something isn't realizing that the output of the SELECT DISTINCT
is already distinct.)

I'm suspicious that this is breakage from the work that was done on
targetlist SRFs in v10, but that's just a guess at this point.

Trying simple test queries involving WHERE x IN (SELECT DISTINCT
unnest(foo) FROM ...), I do not see a behavior like this, so there is some
not-very-obvious contributing factor in your situation.  Can you put
together a self-contained test case that produces a bogus one-row
estimate?  Extra points if it produces duplicate HashAgg steps.

                        regards, tom lane

От:
Tom Lane
Дата:

Dmitry Shalashov <> writes:
> Turns out we had not 9.6 but 9.5.

I'd managed to reproduce the weird planner behavior locally in the
regression database:

regression=# create table foo (f1 int[], f2 int);
CREATE TABLE
regression=# explain select * from tenk1 where unique2 in (select distinct unnest(f1) from foo where f2=1);
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Nested Loop  (cost=30.85..80.50 rows=6 width=244)
   ->  HashAggregate  (cost=30.57..30.63 rows=6 width=4)
         Group Key: (unnest(foo.f1))
         ->  HashAggregate  (cost=30.42..30.49 rows=6 width=4)
               Group Key: unnest(foo.f1)
               ->  ProjectSet  (cost=0.00..28.92 rows=600 width=4)
                     ->  Seq Scan on foo  (cost=0.00..25.88 rows=6 width=32)
                           Filter: (f2 = 1)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.29..8.30 rows=1 width=244)
         Index Cond: (unique2 = (unnest(foo.f1)))
(10 rows)

Digging into it, the reason for the duplicate HashAggregate step was that
query_supports_distinctness() punted on SRFs-in-the-targetlist, basically
on the argument that it wasn't worth extra work to handle that case.
Thinking a bit harder, it seems to me that the correct analysis is:
1. If we are proving distinctness on the grounds of a DISTINCT clause,
then it doesn't matter whether there are any SRFs, because DISTINCT
removes duplicates after tlist SRF expansion.
2. But tlist SRFs break the ability to prove distinctness on the grounds
of GROUP BY, unless all of them are within grouping columns.
It still seems like detecting the second case is harder than it's worth,
but we can trivially handle the first case, with little more than some
code rearrangement.

The other problem is that the output rowcount of the sub-select (ie, of
the HashAggregate) is being estimated as though the SRF weren't there.
This turns out to be because estimate_num_groups() doesn't consider the
possibility of SRFs in the grouping columns.  It never has, but in 9.6 and
before the problem was masked by the fact that grouping_planner scaled up
the result rowcount by tlist_returns_set_rows() *after* performing
grouping.  Now we're effectively doing that in the other order, which is
more correct, but that means estimate_num_groups() has to apply some sort
of adjustment.  I suggest that it just multiply its old estimate by the
maximum of the SRF expansion counts.  That's likely to be an overestimate,
but it's really hard to do better without specific knowledge of the
individual SRF's behavior.

In short, I propose the attached fixes.  I've checked this and it seems
to fix Dmitry's original problem according to the test case he sent
off-list.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 5b0da14..5783f90 100644
*** a/src/backend/optimizer/plan/analyzejoins.c
--- b/src/backend/optimizer/plan/analyzejoins.c
*************** rel_is_distinct_for(PlannerInfo *root, R
*** 744,751 ****
  bool
  query_supports_distinctness(Query *query)
  {
!     /* we don't cope with SRFs, see comment below */
!     if (query->hasTargetSRFs)
          return false;

      /* check for features we can prove distinctness with */
--- 744,751 ----
  bool
  query_supports_distinctness(Query *query)
  {
!     /* SRFs break distinctness except with DISTINCT, see below */
!     if (query->hasTargetSRFs && query->distinctClause == NIL)
          return false;

      /* check for features we can prove distinctness with */
*************** query_is_distinct_for(Query *query, List
*** 787,806 ****
      Assert(list_length(colnos) == list_length(opids));

      /*
-      * A set-returning function in the query's targetlist can result in
-      * returning duplicate rows, if the SRF is evaluated after the
-      * de-duplication step; so we play it safe and say "no" if there are any
-      * SRFs.  (We could be certain that it's okay if SRFs appear only in the
-      * specified columns, since those must be evaluated before de-duplication;
-      * but it doesn't presently seem worth the complication to check that.)
-      */
-     if (query->hasTargetSRFs)
-         return false;
-
-     /*
       * DISTINCT (including DISTINCT ON) guarantees uniqueness if all the
       * columns in the DISTINCT clause appear in colnos and operator semantics
!      * match.
       */
      if (query->distinctClause)
      {
--- 787,796 ----
      Assert(list_length(colnos) == list_length(opids));

      /*
       * DISTINCT (including DISTINCT ON) guarantees uniqueness if all the
       * columns in the DISTINCT clause appear in colnos and operator semantics
!      * match.  This is true even if there are SRFs in the DISTINCT columns or
!      * elsewhere in the tlist.
       */
      if (query->distinctClause)
      {
*************** query_is_distinct_for(Query *query, List
*** 820,825 ****
--- 810,825 ----
      }

      /*
+      * Otherwise, a set-returning function in the query's targetlist can
+      * result in returning duplicate rows, despite any grouping that might
+      * occur before tlist evaluation.  (If all tlist SRFs are within GROUP BY
+      * columns, it would be safe because they'd be expanded before grouping.
+      * But it doesn't currently seem worth the effort to check for that.)
+      */
+     if (query->hasTargetSRFs)
+         return false;
+
+     /*
       * Similarly, GROUP BY without GROUPING SETS guarantees uniqueness if all
       * the grouped columns appear in colnos and operator semantics match.
       */
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 4bbb4a8..edff6da 100644
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
*************** estimate_num_groups(PlannerInfo *root, L
*** 3361,3366 ****
--- 3361,3367 ----
                      List **pgset)
  {
      List       *varinfos = NIL;
+     double        srf_multiplier = 1.0;
      double        numdistinct;
      ListCell   *l;
      int            i;
*************** estimate_num_groups(PlannerInfo *root, L
*** 3394,3399 ****
--- 3395,3401 ----
      foreach(l, groupExprs)
      {
          Node       *groupexpr = (Node *) lfirst(l);
+         double        this_srf_multiplier;
          VariableStatData vardata;
          List       *varshere;
          ListCell   *l2;
*************** estimate_num_groups(PlannerInfo *root, L
*** 3402,3407 ****
--- 3404,3424 ----
          if (pgset && !list_member_int(*pgset, i++))
              continue;

+         /*
+          * Set-returning functions in grouping columns are a bit problematic.
+          * The code below will effectively ignore their SRF nature and come up
+          * with a numdistinct estimate as though they were scalar functions.
+          * We compensate by scaling up the end result by the largest SRF
+          * rowcount estimate.  (This will be an overestimate if the SRF
+          * produces multiple copies of any output value, but it seems best to
+          * assume the SRF's outputs are distinct.  In any case, it's probably
+          * pointless to worry too much about this without much better
+          * estimates for SRF output rowcounts than we have today.)
+          */
+         this_srf_multiplier = expression_returns_set_rows(groupexpr);
+         if (srf_multiplier < this_srf_multiplier)
+             srf_multiplier = this_srf_multiplier;
+
          /* Short-circuit for expressions returning boolean */
          if (exprType(groupexpr) == BOOLOID)
          {
*************** estimate_num_groups(PlannerInfo *root, L
*** 3467,3475 ****
--- 3484,3498 ----
       */
      if (varinfos == NIL)
      {
+         /* Apply SRF multiplier as we would do in the long path */
+         numdistinct *= srf_multiplier;
+         /* Round off */
+         numdistinct = ceil(numdistinct);
          /* Guard against out-of-range answers */
          if (numdistinct > input_rows)
              numdistinct = input_rows;
+         if (numdistinct < 1.0)
+             numdistinct = 1.0;
          return numdistinct;
      }

*************** estimate_num_groups(PlannerInfo *root, L
*** 3638,3643 ****
--- 3661,3670 ----
          varinfos = newvarinfos;
      } while (varinfos != NIL);

+     /* Now we can account for the effects of any SRFs */
+     numdistinct *= srf_multiplier;
+
+     /* Round off */
      numdistinct = ceil(numdistinct);

      /* Guard against out-of-range answers */

От:
Tom Lane
Дата:

Dmitry Shalashov <> writes:
> Turns out we had not 9.6 but 9.5.

I'd managed to reproduce the weird planner behavior locally in the
regression database:

regression=# create table foo (f1 int[], f2 int);
CREATE TABLE
regression=# explain select * from tenk1 where unique2 in (select distinct unnest(f1) from foo where f2=1);
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Nested Loop  (cost=30.85..80.50 rows=6 width=244)
   ->  HashAggregate  (cost=30.57..30.63 rows=6 width=4)
         Group Key: (unnest(foo.f1))
         ->  HashAggregate  (cost=30.42..30.49 rows=6 width=4)
               Group Key: unnest(foo.f1)
               ->  ProjectSet  (cost=0.00..28.92 rows=600 width=4)
                     ->  Seq Scan on foo  (cost=0.00..25.88 rows=6 width=32)
                           Filter: (f2 = 1)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.29..8.30 rows=1 width=244)
         Index Cond: (unique2 = (unnest(foo.f1)))
(10 rows)

Digging into it, the reason for the duplicate HashAggregate step was that
query_supports_distinctness() punted on SRFs-in-the-targetlist, basically
on the argument that it wasn't worth extra work to handle that case.
Thinking a bit harder, it seems to me that the correct analysis is:
1. If we are proving distinctness on the grounds of a DISTINCT clause,
then it doesn't matter whether there are any SRFs, because DISTINCT
removes duplicates after tlist SRF expansion.
2. But tlist SRFs break the ability to prove distinctness on the grounds
of GROUP BY, unless all of them are within grouping columns.
It still seems like detecting the second case is harder than it's worth,
but we can trivially handle the first case, with little more than some
code rearrangement.

The other problem is that the output rowcount of the sub-select (ie, of
the HashAggregate) is being estimated as though the SRF weren't there.
This turns out to be because estimate_num_groups() doesn't consider the
possibility of SRFs in the grouping columns.  It never has, but in 9.6 and
before the problem was masked by the fact that grouping_planner scaled up
the result rowcount by tlist_returns_set_rows() *after* performing
grouping.  Now we're effectively doing that in the other order, which is
more correct, but that means estimate_num_groups() has to apply some sort
of adjustment.  I suggest that it just multiply its old estimate by the
maximum of the SRF expansion counts.  That's likely to be an overestimate,
but it's really hard to do better without specific knowledge of the
individual SRF's behavior.

In short, I propose the attached fixes.  I've checked this and it seems
to fix Dmitry's original problem according to the test case he sent
off-list.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 5b0da14..5783f90 100644
*** a/src/backend/optimizer/plan/analyzejoins.c
--- b/src/backend/optimizer/plan/analyzejoins.c
*************** rel_is_distinct_for(PlannerInfo *root, R
*** 744,751 ****
  bool
  query_supports_distinctness(Query *query)
  {
!     /* we don't cope with SRFs, see comment below */
!     if (query->hasTargetSRFs)
          return false;

      /* check for features we can prove distinctness with */
--- 744,751 ----
  bool
  query_supports_distinctness(Query *query)
  {
!     /* SRFs break distinctness except with DISTINCT, see below */
!     if (query->hasTargetSRFs && query->distinctClause == NIL)
          return false;

      /* check for features we can prove distinctness with */
*************** query_is_distinct_for(Query *query, List
*** 787,806 ****
      Assert(list_length(colnos) == list_length(opids));

      /*
-      * A set-returning function in the query's targetlist can result in
-      * returning duplicate rows, if the SRF is evaluated after the
-      * de-duplication step; so we play it safe and say "no" if there are any
-      * SRFs.  (We could be certain that it's okay if SRFs appear only in the
-      * specified columns, since those must be evaluated before de-duplication;
-      * but it doesn't presently seem worth the complication to check that.)
-      */
-     if (query->hasTargetSRFs)
-         return false;
-
-     /*
       * DISTINCT (including DISTINCT ON) guarantees uniqueness if all the
       * columns in the DISTINCT clause appear in colnos and operator semantics
!      * match.
       */
      if (query->distinctClause)
      {
--- 787,796 ----
      Assert(list_length(colnos) == list_length(opids));

      /*
       * DISTINCT (including DISTINCT ON) guarantees uniqueness if all the
       * columns in the DISTINCT clause appear in colnos and operator semantics
!      * match.  This is true even if there are SRFs in the DISTINCT columns or
!      * elsewhere in the tlist.
       */
      if (query->distinctClause)
      {
*************** query_is_distinct_for(Query *query, List
*** 820,825 ****
--- 810,825 ----
      }

      /*
+      * Otherwise, a set-returning function in the query's targetlist can
+      * result in returning duplicate rows, despite any grouping that might
+      * occur before tlist evaluation.  (If all tlist SRFs are within GROUP BY
+      * columns, it would be safe because they'd be expanded before grouping.
+      * But it doesn't currently seem worth the effort to check for that.)
+      */
+     if (query->hasTargetSRFs)
+         return false;
+
+     /*
       * Similarly, GROUP BY without GROUPING SETS guarantees uniqueness if all
       * the grouped columns appear in colnos and operator semantics match.
       */
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 4bbb4a8..edff6da 100644
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
*************** estimate_num_groups(PlannerInfo *root, L
*** 3361,3366 ****
--- 3361,3367 ----
                      List **pgset)
  {
      List       *varinfos = NIL;
+     double        srf_multiplier = 1.0;
      double        numdistinct;
      ListCell   *l;
      int            i;
*************** estimate_num_groups(PlannerInfo *root, L
*** 3394,3399 ****
--- 3395,3401 ----
      foreach(l, groupExprs)
      {
          Node       *groupexpr = (Node *) lfirst(l);
+         double        this_srf_multiplier;
          VariableStatData vardata;
          List       *varshere;
          ListCell   *l2;
*************** estimate_num_groups(PlannerInfo *root, L
*** 3402,3407 ****
--- 3404,3424 ----
          if (pgset && !list_member_int(*pgset, i++))
              continue;

+         /*
+          * Set-returning functions in grouping columns are a bit problematic.
+          * The code below will effectively ignore their SRF nature and come up
+          * with a numdistinct estimate as though they were scalar functions.
+          * We compensate by scaling up the end result by the largest SRF
+          * rowcount estimate.  (This will be an overestimate if the SRF
+          * produces multiple copies of any output value, but it seems best to
+          * assume the SRF's outputs are distinct.  In any case, it's probably
+          * pointless to worry too much about this without much better
+          * estimates for SRF output rowcounts than we have today.)
+          */
+         this_srf_multiplier = expression_returns_set_rows(groupexpr);
+         if (srf_multiplier < this_srf_multiplier)
+             srf_multiplier = this_srf_multiplier;
+
          /* Short-circuit for expressions returning boolean */
          if (exprType(groupexpr) == BOOLOID)
          {
*************** estimate_num_groups(PlannerInfo *root, L
*** 3467,3475 ****
--- 3484,3498 ----
       */
      if (varinfos == NIL)
      {
+         /* Apply SRF multiplier as we would do in the long path */
+         numdistinct *= srf_multiplier;
+         /* Round off */
+         numdistinct = ceil(numdistinct);
          /* Guard against out-of-range answers */
          if (numdistinct > input_rows)
              numdistinct = input_rows;
+         if (numdistinct < 1.0)
+             numdistinct = 1.0;
          return numdistinct;
      }

*************** estimate_num_groups(PlannerInfo *root, L
*** 3638,3643 ****
--- 3661,3670 ----
          varinfos = newvarinfos;
      } while (varinfos != NIL);

+     /* Now we can account for the effects of any SRFs */
+     numdistinct *= srf_multiplier;
+
+     /* Round off */
      numdistinct = ceil(numdistinct);

      /* Guard against out-of-range answers */

От:
Dmitry Shalashov
Дата:

We tried to apply the patch on 10.1 source, but something is wrong it seems: patch -p1 < ../1.patch (Stripping trailing CRs from patch; use --binary to disable.) patching file src/backend/optimizer/plan/analyzejoins.c (Stripping trailing CRs from patch; use --binary to disable.) patching file src/backend/utils/adt/selfuncs.c Hunk #1 succeeded at 3270 (offset -91 lines). Hunk #2 succeeded at 3304 (offset -91 lines). Hunk #3 succeeded at 3313 (offset -91 lines). Hunk #4 succeeded at 3393 (offset -91 lines). patch unexpectedly ends in middle of line Hunk #5 succeeded at 3570 with fuzz 1 (offset -91 lines). Dmitry Shalashov, relap.io & surfingbird.ru 2017-11-23 2:07 GMT+03:00 Tom Lane <>: > Dmitry Shalashov <> writes: > > Turns out we had not 9.6 but 9.5. > > I'd managed to reproduce the weird planner behavior locally in the > regression database: > > regression=# create table foo (f1 int[], f2 int); > CREATE TABLE > regression=# explain select * from tenk1 where unique2 in (select distinct > unnest(f1) from foo where f2=1); > QUERY PLAN > ------------------------------------------------------------ > ----------------------- > Nested Loop (cost=30.85..80.50 rows=6 width=244) > -> HashAggregate (cost=30.57..30.63 rows=6 width=4) > Group Key: (unnest(foo.f1)) > -> HashAggregate (cost=30.42..30.49 rows=6 width=4) > Group Key: unnest(foo.f1) > -> ProjectSet (cost=0.00..28.92 rows=600 width=4) > -> Seq Scan on foo (cost=0.00..25.88 rows=6 > width=32) > Filter: (f2 = 1) > -> Index Scan using tenk1_unique2 on tenk1 (cost=0.29..8.30 rows=1 > width=244) > Index Cond: (unique2 = (unnest(foo.f1))) > (10 rows) > > Digging into it, the reason for the duplicate HashAggregate step was that > query_supports_distinctness() punted on SRFs-in-the-targetlist, basically > on the argument that it wasn't worth extra work to handle that case. > Thinking a bit harder, it seems to me that the correct analysis is: > 1. If we are proving distinctness on the grounds of a DISTINCT clause, > then it doesn't matter whether there are any SRFs, because DISTINCT > removes duplicates after tlist SRF expansion. > 2. But tlist SRFs break the ability to prove distinctness on the grounds > of GROUP BY, unless all of them are within grouping columns. > It still seems like detecting the second case is harder than it's worth, > but we can trivially handle the first case, with little more than some > code rearrangement. > > The other problem is that the output rowcount of the sub-select (ie, of > the HashAggregate) is being estimated as though the SRF weren't there. > This turns out to be because estimate_num_groups() doesn't consider the > possibility of SRFs in the grouping columns. It never has, but in 9.6 and > before the problem was masked by the fact that grouping_planner scaled up > the result rowcount by tlist_returns_set_rows() *after* performing > grouping. Now we're effectively doing that in the other order, which is > more correct, but that means estimate_num_groups() has to apply some sort > of adjustment. I suggest that it just multiply its old estimate by the > maximum of the SRF expansion counts. That's likely to be an overestimate, > but it's really hard to do better without specific knowledge of the > individual SRF's behavior. > > In short, I propose the attached fixes. I've checked this and it seems > to fix Dmitry's original problem according to the test case he sent > off-list. > > regards, tom lane > >
От:
Dmitry Shalashov
Дата:

We tried to apply the patch on 10.1 source, but something is wrong it seems: patch -p1 < ../1.patch (Stripping trailing CRs from patch; use --binary to disable.) patching file src/backend/optimizer/plan/analyzejoins.c (Stripping trailing CRs from patch; use --binary to disable.) patching file src/backend/utils/adt/selfuncs.c Hunk #1 succeeded at 3270 (offset -91 lines). Hunk #2 succeeded at 3304 (offset -91 lines). Hunk #3 succeeded at 3313 (offset -91 lines). Hunk #4 succeeded at 3393 (offset -91 lines). patch unexpectedly ends in middle of line Hunk #5 succeeded at 3570 with fuzz 1 (offset -91 lines). Dmitry Shalashov, relap.io & surfingbird.ru 2017-11-23 2:07 GMT+03:00 Tom Lane <>: > Dmitry Shalashov <> writes: > > Turns out we had not 9.6 but 9.5. > > I'd managed to reproduce the weird planner behavior locally in the > regression database: > > regression=# create table foo (f1 int[], f2 int); > CREATE TABLE > regression=# explain select * from tenk1 where unique2 in (select distinct > unnest(f1) from foo where f2=1); > QUERY PLAN > ------------------------------------------------------------ > ----------------------- > Nested Loop (cost=30.85..80.50 rows=6 width=244) > -> HashAggregate (cost=30.57..30.63 rows=6 width=4) > Group Key: (unnest(foo.f1)) > -> HashAggregate (cost=30.42..30.49 rows=6 width=4) > Group Key: unnest(foo.f1) > -> ProjectSet (cost=0.00..28.92 rows=600 width=4) > -> Seq Scan on foo (cost=0.00..25.88 rows=6 > width=32) > Filter: (f2 = 1) > -> Index Scan using tenk1_unique2 on tenk1 (cost=0.29..8.30 rows=1 > width=244) > Index Cond: (unique2 = (unnest(foo.f1))) > (10 rows) > > Digging into it, the reason for the duplicate HashAggregate step was that > query_supports_distinctness() punted on SRFs-in-the-targetlist, basically > on the argument that it wasn't worth extra work to handle that case. > Thinking a bit harder, it seems to me that the correct analysis is: > 1. If we are proving distinctness on the grounds of a DISTINCT clause, > then it doesn't matter whether there are any SRFs, because DISTINCT > removes duplicates after tlist SRF expansion. > 2. But tlist SRFs break the ability to prove distinctness on the grounds > of GROUP BY, unless all of them are within grouping columns. > It still seems like detecting the second case is harder than it's worth, > but we can trivially handle the first case, with little more than some > code rearrangement. > > The other problem is that the output rowcount of the sub-select (ie, of > the HashAggregate) is being estimated as though the SRF weren't there. > This turns out to be because estimate_num_groups() doesn't consider the > possibility of SRFs in the grouping columns. It never has, but in 9.6 and > before the problem was masked by the fact that grouping_planner scaled up > the result rowcount by tlist_returns_set_rows() *after* performing > grouping. Now we're effectively doing that in the other order, which is > more correct, but that means estimate_num_groups() has to apply some sort > of adjustment. I suggest that it just multiply its old estimate by the > maximum of the SRF expansion counts. That's likely to be an overestimate, > but it's really hard to do better without specific knowledge of the > individual SRF's behavior. > > In short, I propose the attached fixes. I've checked this and it seems > to fix Dmitry's original problem according to the test case he sent > off-list. > > regards, tom lane > >
От:
Tom Lane
Дата:

Dmitry Shalashov <> writes:
> We tried to apply the patch on 10.1 source, but something is wrong it seems:
> patch -p1 < ../1.patch
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file src/backend/optimizer/plan/analyzejoins.c
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file src/backend/utils/adt/selfuncs.c
> Hunk #1 succeeded at 3270 (offset -91 lines).
> Hunk #2 succeeded at 3304 (offset -91 lines).
> Hunk #3 succeeded at 3313 (offset -91 lines).
> Hunk #4 succeeded at 3393 (offset -91 lines).
> patch unexpectedly ends in middle of line
> Hunk #5 succeeded at 3570 with fuzz 1 (offset -91 lines).

The line number offsets are expected when applying to v10, but it looks
like you failed to transfer the attachment cleanly ... there were
certainly not CRs in it when I mailed it.  The output on v10
should just look like

patching file src/backend/optimizer/plan/analyzejoins.c
patching file src/backend/utils/adt/selfuncs.c
Hunk #1 succeeded at 3270 (offset -91 lines).
Hunk #2 succeeded at 3304 (offset -91 lines).
Hunk #3 succeeded at 3313 (offset -91 lines).
Hunk #4 succeeded at 3393 (offset -91 lines).
Hunk #5 succeeded at 3570 (offset -91 lines).
        regards, tom lane


От:
Dmitry Shalashov
Дата:

> The line number offsets are expected when applying to v10, but it looks > like you failed to transfer the attachment cleanly ... Yes, it was some mistake on our side. It looks that patch helps us. Tom, thank you! I'm still testing it though, just in case. What are PostgreSQL schedule on releasing fixes like this? Can I expect that it will be in 10.2 and when can I expect 10.2, approximately of course? Dmitry Shalashov, relap.io & surfingbird.ru 2017-11-23 20:00 GMT+03:00 Tom Lane <>: > Dmitry Shalashov <> writes: > > We tried to apply the patch on 10.1 source, but something is wrong it > seems: > > patch -p1 < ../1.patch > > (Stripping trailing CRs from patch; use --binary to disable.) > > patching file src/backend/optimizer/plan/analyzejoins.c > > (Stripping trailing CRs from patch; use --binary to disable.) > > patching file src/backend/utils/adt/selfuncs.c > > Hunk #1 succeeded at 3270 (offset -91 lines). > > Hunk #2 succeeded at 3304 (offset -91 lines). > > Hunk #3 succeeded at 3313 (offset -91 lines). > > Hunk #4 succeeded at 3393 (offset -91 lines). > > patch unexpectedly ends in middle of line > > Hunk #5 succeeded at 3570 with fuzz 1 (offset -91 lines). > > The line number offsets are expected when applying to v10, but it looks > like you failed to transfer the attachment cleanly ... there were > certainly not CRs in it when I mailed it. The output on v10 > should just look like > > patching file src/backend/optimizer/plan/analyzejoins.c > patching file src/backend/utils/adt/selfuncs.c > Hunk #1 succeeded at 3270 (offset -91 lines). > Hunk #2 succeeded at 3304 (offset -91 lines). > Hunk #3 succeeded at 3313 (offset -91 lines). > Hunk #4 succeeded at 3393 (offset -91 lines). > Hunk #5 succeeded at 3570 (offset -91 lines). > > regards, tom lane >
От:
Tom Lane
Дата:

Dmitry Shalashov <> writes:
> It looks that patch helps us. Tom, thank you!
> I'm still testing it though, just in case.

Excellent, please follow up if you learn anything new.

> What are PostgreSQL schedule on releasing fixes like this? Can I expect
> that it will be in 10.2 and when can I expect 10.2, approximately of course?

I haven't pushed it to the git repo yet, but I will shortly, and then
it will be in the next minor release.  That will probably be in
early February, per our release policy:
https://www.postgresql.org/developer/roadmap/
        regards, tom lane


От:
Dmitry Shalashov
Дата:

> Excellent, please follow up if you learn anything new. Sure. But my testing is over and something new might come out only incidentally now. Testing hasn't reveal anything interesting. > That will probably be in > early February, per our release policy: ok, thanks. That makes me kinda hope for some security problem :) Is it completely safe to use manually patched version in production? Dmitry Shalashov, relap.io & surfingbird.ru 2017-11-24 19:39 GMT+03:00 Tom Lane <>: > Dmitry Shalashov <> writes: > > It looks that patch helps us. Tom, thank you! > > I'm still testing it though, just in case. > > Excellent, please follow up if you learn anything new. > > > What are PostgreSQL schedule on releasing fixes like this? Can I expect > > that it will be in 10.2 and when can I expect 10.2, approximately of > course? > > I haven't pushed it to the git repo yet, but I will shortly, and then > it will be in the next minor release. That will probably be in > early February, per our release policy: > https://www.postgresql.org/developer/roadmap/ > > regards, tom lane >
От:
Michael Paquier
Дата:

On Sat, Nov 25, 2017 at 8:54 PM, Dmitry Shalashov <> wrote:
> Is it completely safe to use manually patched version in production?

Patching upstream PostgreSQL to fix a critical bug is something that
can of course be done. And to reach a state where you think something
is safe to use in production first be sure to test it thoroughly on a
stage instance. The author is also working on Postgres for 20 years,
so this gives some insurance.
-- 
Michael


От:
Dmitry Shalashov
Дата:

> The author is also working on Postgres for 20 years, > so this gives some insurance. I know. Tom is a legend. But still I'd like to hear from him to be sure :) Dmitry Shalashov, relap.io & surfingbird.ru 2017-11-25 15:13 GMT+03:00 Michael Paquier <>: > On Sat, Nov 25, 2017 at 8:54 PM, Dmitry Shalashov <> > wrote: > > Is it completely safe to use manually patched version in production? > > Patching upstream PostgreSQL to fix a critical bug is something that > can of course be done. And to reach a state where you think something > is safe to use in production first be sure to test it thoroughly on a > stage instance. The author is also working on Postgres for 20 years, > so this gives some insurance. > -- > Michael >
От:
Tom Lane
Дата:

Michael Paquier <> writes:
> On Sat, Nov 25, 2017 at 8:54 PM, Dmitry Shalashov <> wrote:
>> Is it completely safe to use manually patched version in production?

> Patching upstream PostgreSQL to fix a critical bug is something that
> can of course be done. And to reach a state where you think something
> is safe to use in production first be sure to test it thoroughly on a
> stage instance. The author is also working on Postgres for 20 years,
> so this gives some insurance.

It's not like there's some magic dust that we sprinkle on the code at
release time ;-).  If there's a problem with that patch, it's much more
likely that you'd discover it through field testing than that we would
notice it during development (we missed the original problem after all).
So you can do that field testing now, or after 10.2 comes out.  The
former seems preferable, if you are comfortable with building a patched
copy at all.  I don't know what your normal source of Postgres executables
is, but all the common packaging technologies make it pretty easy to
rebuild a package from source with patch(es) added.  Modifying your
vendor's SRPM (or equivalent concept if you're not on Red Hat) is a
good skill to have.
        regards, tom lane


От:
Dmitry Shalashov
Дата:

Ok, understood :-) Dmitry Shalashov, relap.io & surfingbird.ru 2017-11-25 18:42 GMT+03:00 Tom Lane <>: > Michael Paquier <> writes: > > On Sat, Nov 25, 2017 at 8:54 PM, Dmitry Shalashov <> > wrote: > >> Is it completely safe to use manually patched version in production? > > > Patching upstream PostgreSQL to fix a critical bug is something that > > can of course be done. And to reach a state where you think something > > is safe to use in production first be sure to test it thoroughly on a > > stage instance. The author is also working on Postgres for 20 years, > > so this gives some insurance. > > It's not like there's some magic dust that we sprinkle on the code at > release time ;-). If there's a problem with that patch, it's much more > likely that you'd discover it through field testing than that we would > notice it during development (we missed the original problem after all). > So you can do that field testing now, or after 10.2 comes out. The > former seems preferable, if you are comfortable with building a patched > copy at all. I don't know what your normal source of Postgres executables > is, but all the common packaging technologies make it pretty easy to > rebuild a package from source with patch(es) added. Modifying your > vendor's SRPM (or equivalent concept if you're not on Red Hat) is a > good skill to have. > > regards, tom lane >