Обсуждение: Performance issues

Поиск
Список
Период
Сортировка

Performance issues

От
Vivekanand Joshi
Дата:

Hi Team,

 

I am a novice to this territory. We are trying to migrate few jasper reports from Netezza to PostgreSQL.

 

I have one report ready with me but queries are taking too much time. To be honest, it is not giving any result  most of the time.

 

The same query in Netezza is running in less than 2-3 seconds.

 

========================================================================================================

 

This is the query :

 

 

SELECT

                    COUNT(DISTINCT TARGET_ID)

                FROM

                    S_V_F_PROMOTION_HISTORY_EMAIL PH

                    INNER JOIN S_V_D_CAMPAIGN_HIERARCHY CH

                        ON PH.TOUCHPOINT_EXECUTION_ID = CH.TOUCHPOINT_EXECUTION_ID

                WHERE

                    1=1

                    AND SEND_DT >= '2014-03-13'

                    AND SEND_DT <= '2015-03-14'

 

Statistics:

 

Select Count(1) from S_V_F_PROMOTION_HISTORY_EMAIL

4559289

Time: 16781.409 ms

 

Select count(1) from S_V_D_CAMPAIGN_HIERARCHY;

count

-------

45360

(1 row)

 

Time: 467869.185 ms

==================================================================

EXPLAIN PLAN FOR QUERY:

 

"Aggregate  (cost=356422.36..356422.37 rows=1 width=8)"

"  Output: count(DISTINCT base.target_id)"

"  ->  Nested Loop  (cost=68762.23..356422.36 rows=1 width=8)"

"        Output: base.target_id"

"        Join Filter: (base.touchpoint_execution_id = tp_exec.touchpoint_execution_id)"

"        ->  Nested Loop  (cost=33927.73..38232.16 rows=1 width=894)"

"              Output: camp.campaign_id, camp.campaign_name, camp.initiative, camp.objective, camp.category_id, "CATEGORY".category_name, camp_exec.campaign_execution_id, camp_exec.campaign_execution_name, camp_exec.group_id, grup.group_name, camp_exec.star (...)"

"              Join Filter: (tp_exec.touchpoint_execution_id = valid_executions.touchpoint_execution_id)"

"              CTE valid_executions"

"                ->  Merge Join  (cost=30420.45..31971.94 rows=1 width=8)"

"                      Output: s_f_touchpoint_execution_status_history_2.touchpoint_execution_id"

"                      Merge Cond: ((s_f_touchpoint_execution_status_history_2.touchpoint_execution_id = s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id) AND (s_f_touchpoint_execution_status_history_2.creation_dt = (max(s_f_touchpoint_ex (...)"

"                      ->  Sort  (cost=17196.30..17539.17 rows=137149 width=16)"

"                            Output: s_f_touchpoint_execution_status_history_2.touchpoint_execution_id, s_f_touchpoint_execution_status_history_2.creation_dt"

"                            Sort Key: s_f_touchpoint_execution_status_history_2.touchpoint_execution_id, s_f_touchpoint_execution_status_history_2.creation_dt"

"                            ->  Seq Scan on public.s_f_touchpoint_execution_status_history s_f_touchpoint_execution_status_history_2  (cost=0.00..5493.80 rows=137149 width=16)"

"                                  Output: s_f_touchpoint_execution_status_history_2.touchpoint_execution_id, s_f_touchpoint_execution_status_history_2.creation_dt"

"                                  Filter: (s_f_touchpoint_execution_status_history_2.touchpoint_execution_status_type_id = ANY ('{3,4}'::integer[]))"

"                      ->  Sort  (cost=13224.15..13398.43 rows=69715 width=16)"

"                            Output: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id, (max(s_f_touchpoint_execution_status_history_1_1.creation_dt))"

"                            Sort Key: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id, (max(s_f_touchpoint_execution_status_history_1_1.creation_dt))"

"                            ->  HashAggregate  (cost=6221.56..6918.71 rows=69715 width=16)"

"                                  Output: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id, max(s_f_touchpoint_execution_status_history_1_1.creation_dt)"

"                                  Group Key: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id"

"                                  ->  Seq Scan on public.s_f_touchpoint_execution_status_history s_f_touchpoint_execution_status_history_1_1  (cost=0.00..4766.04 rows=291104 width=16)"

"                                        Output: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id, s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_status_type_id, s_f_touchpoint_execution_status_history_1_1.status_message (...)"

"              ->  Nested Loop Left Join  (cost=1955.80..6260.19 rows=1 width=894)"

"                    Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution_id, camp_exec.campaign_ (...)"

"                    ->  Nested Loop  (cost=1955.67..6260.04 rows=1 width=776)"

"                          Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution_id, camp_exec.cam (...)"

"                          ->  Nested Loop Left Join  (cost=1955.54..6259.87 rows=1 width=658)"

"                                Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution_id, camp_ex (...)"

"                                ->  Nested Loop Left Join  (cost=1955.40..6259.71 rows=1 width=340)"

"                                      Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution_id, c (...)"

"                                      ->  Nested Loop Left Join  (cost=1955.27..6259.55 rows=1 width=222)"

"                                            Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution (...)"

"                                            ->  Nested Loop  (cost=1954.99..6259.24 rows=1 width=197)"

"                                                  Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_exe (...)"

"                                                  ->  Nested Loop  (cost=1954.71..6258.92 rows=1 width=173)"

"                                                        Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campai (...)"

"                                                        Join Filter: (camp_exec.campaign_id = wave.campaign_id)"

"                                                        ->  Nested Loop  (cost=1954.42..6254.67 rows=13 width=167)"

"                                                              Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, wave_exec. (...)"

"                                                              ->  Hash Join  (cost=1954.13..6249.67 rows=13 width=108)"

"                                                                    Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, wave (...)"

"                                                                    Hash Cond: ((tp_exec.touchpoint_id = tp.touchpoint_id) AND (wave_exec.wave_id = tp.wave_id))"

"                                                                    ->  Hash Join  (cost=1576.83..4595.51 rows=72956 width=90)"

"                                                                          Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, tp_exec.touchpoint_id, wave_exec.wave_execution_id, wave_exec.wave_execution_n (...)"

"                                                                          Hash Cond: (tp_exec.wave_execution_id = wave_exec.wave_execution_id)"

"                                                                          ->  Seq Scan on public.s_d_touchpoint_execution tp_exec  (cost=0.00..1559.56 rows=72956 width=42)"

"                                                                                Output: tp_exec.touchpoint_execution_id, tp_exec.wave_execution_id, tp_exec.touchpoint_id, tp_exec.channel_type_id, tp_exec.content_id, tp_exec.message_type_id, tp_exec.start_d (...)"

"                                                                          ->  Hash  (cost=1001.37..1001.37 rows=46037 width=56)"

"                                                                                Output: wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, wave_exec.campaign_execution_id, wave_exec.wave_id"

"                                                                                ->  Seq Scan on public.s_d_wave_execution wave_exec  (cost=0.00..1001.37 rows=46037 width=56)"

"                                                                                      Output: wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, wave_exec.campaign_execution_id, wave_exec.wave_id"

"                                                                    ->  Hash  (cost=212.72..212.72 rows=10972 width=26)"

"                                                                          Output: tp.touchpoint_id, tp.touchpoint_name, tp.wave_id, tp.channel_type_id"

"                                                                          ->  Seq Scan on public.s_d_touchpoint tp  (cost=0.00..212.72 rows=10972 width=26)"

"                                                                                Output: tp.touchpoint_id, tp.touchpoint_name, tp.wave_id, tp.channel_type_id"

"                                                              ->  Index Scan using s_d_campaign_execution_idx on public.s_d_campaign_execution camp_exec  (cost=0.29..0.37 rows=1 width=67)"

"                                                                    Output: camp_exec.campaign_execution_id, camp_exec.campaign_id, camp_exec.group_id, camp_exec.campaign_execution_name, camp_exec.start_dt, camp_exec.creation_dt"

"                                                                    Index Cond: (camp_exec.campaign_execution_id = wave_exec.campaign_execution_id)"

"                                                        ->  Index Scan using s_d_wave_pkey on public.s_d_wave wave  (cost=0.29..0.31 rows=1 width=22)"

"                                                              Output: wave.wave_id, wave.campaign_id, wave.wave_name, wave.creation_dt, wave.modified_dt"

"                                                              Index Cond: (wave.wave_id = wave_exec.wave_id)"

"                                                  ->  Index Scan using s_d_campaign_pkey on public.s_d_campaign camp  (cost=0.29..0.32 rows=1 width=40)"

"                                                        Output: camp.campaign_id, camp.campaign_name, camp.objective, camp.initiative, camp.category_id, camp.creation_dt, camp.modified_dt"

"                                                        Index Cond: (camp.campaign_id = camp_exec.campaign_id)"

"                                            ->  Index Scan using s_d_content_pkey on public.s_d_content content  (cost=0.28..0.30 rows=1 width=33)"

"                                                  Output: content.content_id, content.content_name, content.creation_dt, content.channel_type_id, content.modified_dt"

"                                                  Index Cond: (tp_exec.content_id = content.content_id)"

"                                      ->  Index Scan using s_d_message_type_pkey on public.s_d_message_type message_type  (cost=0.13..0.15 rows=1 width=120)"

"                                            Output: message_type.message_type_id, message_type.message_type_name, message_type.creation_dt, message_type.modified_dt"

"                                            Index Cond: (tp_exec.message_type_id = message_type.message_type_id)"

"                                ->  Index Scan using s_d_group_pkey on public.s_d_group grup  (cost=0.13..0.15 rows=1 width=320)"

"                                      Output: grup.group_id, grup.group_name, grup.creation_dt, grup.modified_dt"

"                                      Index Cond: (camp_exec.group_id = grup.group_id)"

"                          ->  Index Scan using d_channel_pk on public.s_d_channel_type channel  (cost=0.13..0.15 rows=1 width=120)"

"                                Output: channel.channel_type_id, channel.channel_type_name"

"                                Index Cond: (channel.channel_type_id = tp.channel_type_id)"

"                    ->  Index Scan using s_d_category_pkey on public.s_d_category "CATEGORY"  (cost=0.13..0.15 rows=1 width=120)"

"                          Output: "CATEGORY".category_id, "CATEGORY".category_name, "CATEGORY".creation_dt, "CATEGORY".modified_dt"

"                          Index Cond: (camp.category_id = "CATEGORY".category_id)"

"              ->  CTE Scan on valid_executions  (cost=0.00..0.02 rows=1 width=8)"

"                    Output: valid_executions.touchpoint_execution_id"

"        ->  Nested Loop Left Join  (cost=34834.49..318190.14 rows=2 width=148)"

"              Output: base.promo_hist_id, base.audience_member_id, base.target_id, base.touchpoint_execution_id, base.contact_group_id, base.content_version_execution_id, base.sent_ind, CASE WHEN (email.sbounce_ind IS NOT NULL) THEN (email.sbounce_ind)::in (...)"

"              CTE valid_executions"

"                ->  Nested Loop  (cost=33089.13..34834.20 rows=1 width=8)"

"                      Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id"

"                      ->  Nested Loop  (cost=33088.84..34833.88 rows=1 width=16)"

"                            Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id, tpe.touchpoint_id"

"                            ->  Unique  (cost=33088.42..34825.42 rows=1 width=8)"

"                                  Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id"

"                                  ->  Merge Join  (cost=33088.42..34825.42 rows=1 width=8)"

"                                        Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id"

"                                        Merge Cond: ((s_f_touchpoint_execution_status_history.touchpoint_execution_id = s_f_touchpoint_execution_status_history_1.touchpoint_execution_id) AND (s_f_touchpoint_execution_status_history.creation_dt = (max(s_f_t (...)"

"                                        ->  Sort  (cost=19864.28..20268.98 rows=161883 width=16)"

"                                              Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id, s_f_touchpoint_execution_status_history.creation_dt"

"                                              Sort Key: s_f_touchpoint_execution_status_history.touchpoint_execution_id, s_f_touchpoint_execution_status_history.creation_dt"

"                                              ->  Seq Scan on public.s_f_touchpoint_execution_status_history  (cost=0.00..5857.68 rows=161883 width=16)"

"                                                    Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id, s_f_touchpoint_execution_status_history.creation_dt"

"                                                    Filter: (s_f_touchpoint_execution_status_history.touchpoint_execution_status_type_id = ANY ('{3,4,6}'::integer[]))"

"                                        ->  Sort  (cost=13224.15..13398.43 rows=69715 width=16)"

"                                              Output: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id, (max(s_f_touchpoint_execution_status_history_1.creation_dt))"

"                                              Sort Key: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id, (max(s_f_touchpoint_execution_status_history_1.creation_dt))"

"                                              ->  HashAggregate  (cost=6221.56..6918.71 rows=69715 width=16)"

"                                                    Output: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id, max(s_f_touchpoint_execution_status_history_1.creation_dt)"

"                                                    Group Key: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id"

"                                                    ->  Seq Scan on public.s_f_touchpoint_execution_status_history s_f_touchpoint_execution_status_history_1  (cost=0.00..4766.04 rows=291104 width=16)"

"                                                          Output: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id, s_f_touchpoint_execution_status_history_1.touchpoint_execution_status_type_id, s_f_touchpoint_execution_status_history_1.st (...)"

"                            ->  Index Scan using s_d_touchpoint_execution_pkey on public.s_d_touchpoint_execution tpe  (cost=0.42..8.44 rows=1 width=16)"

"                                  Output: tpe.touchpoint_execution_id, tpe.wave_execution_id, tpe.touchpoint_id, tpe.channel_type_id, tpe.content_id, tpe.message_type_id, tpe.start_dt, tpe.creation_dt"

"                                  Index Cond: (tpe.touchpoint_execution_id = s_f_touchpoint_execution_status_history.touchpoint_execution_id)"

"                      ->  Index Only Scan using s_d_touchpoint_pkey on public.s_d_touchpoint tp_1  (cost=0.29..0.32 rows=1 width=8)"

"                            Output: tp_1.touchpoint_id, tp_1.channel_type_id"

"                            Index Cond: ((tp_1.touchpoint_id = tpe.touchpoint_id) AND (tp_1.channel_type_id = 1))"

"              ->  Nested Loop  (cost=0.00..283350.22 rows=2 width=74)"

"                    Output: base.promo_hist_id, base.audience_member_id, base.target_id, base.touchpoint_execution_id, base.contact_group_id, base.content_version_execution_id, base.sent_ind, base.send_dt, base.creation_dt, base.modified_dt"

"                    Join Filter: (base.touchpoint_execution_id = valid_executions_1.touchpoint_execution_id)"

"                    ->  CTE Scan on valid_executions valid_executions_1  (cost=0.00..0.02 rows=1 width=8)"

"                          Output: valid_executions_1.touchpoint_execution_id"

"                    ->  Seq Scan on public.s_f_promotion_history base  (cost=0.00..283334.00 rows=1296 width=74)"

"                          Output: base.promo_hist_id, base.target_id, base.audience_member_id, base.touchpoint_execution_id, base.contact_group_id, base.content_version_execution_id, base.sent_ind, base.send_dt, base.creation_dt, base.modified_dt"

"                          Filter: ((base.send_dt >= '2014-03-13 00:00:00'::timestamp without time zone) AND (base.send_dt <= '2015-03-14 00:00:00'::timestamp without time zone))"

"              ->  Index Scan using s_f_promotion_history_email_pk1 on public.s_f_promotion_history_email email  (cost=0.29..2.83 rows=1 width=90)"

"                    Output: email.promo_hist_id, email.target_id, email.audience_member_id, email.touchpoint_execution_id, email.contact_group_id, email.sbounce_ind, email.hbounce_ind, email.opened_ind, email.clicked_ind, email.unsubscribe_ind, email.unsub (...)"

"                    Index Cond: (base.promo_hist_id = email.promo_hist_id)"

"                    Filter: (base.audience_member_id = email.audience_member_id)"

 

=================================================================================================

Questions here are :

 

Is the query written correctly as per the PostgreSQL?

Am I missing anything here?

 

Total Memory : 8 GB

shared_buffers = 2GB

work_mem = 64MB

maintenance_work_mem = 700MB

effective_cache_size = 4GB

Any kind of help is appreciated.

 

Warm Regards,


Vivekanand Joshi
+919654227927

 

Zeta Interactive

185 Madison Ave. New York, NY 10016

www.zetainteractive.com

 

Вложения

Re: Performance issues

От
Varadharajan Mukundan
Дата:
Hi Vivekanand,

From the query plan, we can see that good amount of time is spent in this line

->  Seq Scan on public.s_f_promotion_history base  (cost=0.00..283334.00 rows=1296 width=74)

            Output: base.promo_hist_id, base.target_id, base.audience_member_id, base.touchpoint_execution_id, base.contact_group_id, base.content_version_execution_id, base.sent_ind, base.send_dt, base.creation_dt, base.modified_dt"                    Filter: ((base.send_dt >= '2014-03-13 00:00:00'::timestamp without time zone) AND (base.send_dt <= '2015-03-14 00:00:00'::timestamp without time zone))"



Can you try creating (partial) index based on the filter fields? ( Good tutorial @ https://devcenter.heroku.com/articles/postgresql-indexes). Did you try doing a VACUUM ANALYZE? Other approach worth trying it out is partitioning the public.s_f_promotion_history table by date (BTW, what is the size and number of rows in this table?). 


On Fri, Mar 13, 2015 at 12:44 PM, Vivekanand Joshi <vjoshi@zetainteractive.com> wrote:

Hi Team,

 

I am a novice to this territory. We are trying to migrate few jasper reports from Netezza to PostgreSQL.

 

I have one report ready with me but queries are taking too much time. To be honest, it is not giving any result  most of the time.

 

The same query in Netezza is running in less than 2-3 seconds.

 

========================================================================================================

 

This is the query :

 

 

SELECT

                    COUNT(DISTINCT TARGET_ID)

                FROM

                    S_V_F_PROMOTION_HISTORY_EMAIL PH

                    INNER JOIN S_V_D_CAMPAIGN_HIERARCHY CH

                        ON PH.TOUCHPOINT_EXECUTION_ID = CH.TOUCHPOINT_EXECUTION_ID

                WHERE

                    1=1

                    AND SEND_DT >= '2014-03-13'

                    AND SEND_DT <= '2015-03-14'

 

Statistics:

 

Select Count(1) from S_V_F_PROMOTION_HISTORY_EMAIL

4559289

Time: 16781.409 ms

 

Select count(1) from S_V_D_CAMPAIGN_HIERARCHY;

count

-------

45360

(1 row)

 

Time: 467869.185 ms

==================================================================

EXPLAIN PLAN FOR QUERY:

 

"Aggregate  (cost=356422.36..356422.37 rows=1 width=8)"

"  Output: count(DISTINCT base.target_id)"

"  ->  Nested Loop  (cost=68762.23..356422.36 rows=1 width=8)"

"        Output: base.target_id"

"        Join Filter: (base.touchpoint_execution_id = tp_exec.touchpoint_execution_id)"

"        ->  Nested Loop  (cost=33927.73..38232.16 rows=1 width=894)"

"              Output: camp.campaign_id, camp.campaign_name, camp.initiative, camp.objective, camp.category_id, "CATEGORY".category_name, camp_exec.campaign_execution_id, camp_exec.campaign_execution_name, camp_exec.group_id, grup.group_name, camp_exec.star (...)"

"              Join Filter: (tp_exec.touchpoint_execution_id = valid_executions.touchpoint_execution_id)"

"              CTE valid_executions"

"                ->  Merge Join  (cost=30420.45..31971.94 rows=1 width=8)"

"                      Output: s_f_touchpoint_execution_status_history_2.touchpoint_execution_id"

"                      Merge Cond: ((s_f_touchpoint_execution_status_history_2.touchpoint_execution_id = s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id) AND (s_f_touchpoint_execution_status_history_2.creation_dt = (max(s_f_touchpoint_ex (...)"

"                      ->  Sort  (cost=17196.30..17539.17 rows=137149 width=16)"

"                            Output: s_f_touchpoint_execution_status_history_2.touchpoint_execution_id, s_f_touchpoint_execution_status_history_2.creation_dt"

"                            Sort Key: s_f_touchpoint_execution_status_history_2.touchpoint_execution_id, s_f_touchpoint_execution_status_history_2.creation_dt"

"                            ->  Seq Scan on public.s_f_touchpoint_execution_status_history s_f_touchpoint_execution_status_history_2  (cost=0.00..5493.80 rows=137149 width=16)"

"                                  Output: s_f_touchpoint_execution_status_history_2.touchpoint_execution_id, s_f_touchpoint_execution_status_history_2.creation_dt"

"                                  Filter: (s_f_touchpoint_execution_status_history_2.touchpoint_execution_status_type_id = ANY ('{3,4}'::integer[]))"

"                      ->  Sort  (cost=13224.15..13398.43 rows=69715 width=16)"

"                            Output: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id, (max(s_f_touchpoint_execution_status_history_1_1.creation_dt))"

"                            Sort Key: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id, (max(s_f_touchpoint_execution_status_history_1_1.creation_dt))"

"                            ->  HashAggregate  (cost=6221.56..6918.71 rows=69715 width=16)"

"                                  Output: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id, max(s_f_touchpoint_execution_status_history_1_1.creation_dt)"

"                                  Group Key: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id"

"                                  ->  Seq Scan on public.s_f_touchpoint_execution_status_history s_f_touchpoint_execution_status_history_1_1  (cost=0.00..4766.04 rows=291104 width=16)"

"                                        Output: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id, s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_status_type_id, s_f_touchpoint_execution_status_history_1_1.status_message (...)"

"              ->  Nested Loop Left Join  (cost=1955.80..6260.19 rows=1 width=894)"

"                    Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution_id, camp_exec.campaign_ (...)"

"                    ->  Nested Loop  (cost=1955.67..6260.04 rows=1 width=776)"

"                          Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution_id, camp_exec.cam (...)"

"                          ->  Nested Loop Left Join  (cost=1955.54..6259.87 rows=1 width=658)"

"                                Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution_id, camp_ex (...)"

"                                ->  Nested Loop Left Join  (cost=1955.40..6259.71 rows=1 width=340)"

"                                      Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution_id, c (...)"

"                                      ->  Nested Loop Left Join  (cost=1955.27..6259.55 rows=1 width=222)"

"                                            Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution (...)"

"                                            ->  Nested Loop  (cost=1954.99..6259.24 rows=1 width=197)"

"                                                  Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_exe (...)"

"                                                  ->  Nested Loop  (cost=1954.71..6258.92 rows=1 width=173)"

"                                                        Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campai (...)"

"                                                        Join Filter: (camp_exec.campaign_id = wave.campaign_id)"

"                                                        ->  Nested Loop  (cost=1954.42..6254.67 rows=13 width=167)"

"                                                              Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, wave_exec. (...)"

"                                                              ->  Hash Join  (cost=1954.13..6249.67 rows=13 width=108)"

"                                                                    Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, wave (...)"

"                                                                    Hash Cond: ((tp_exec.touchpoint_id = tp.touchpoint_id) AND (wave_exec.wave_id = tp.wave_id))"

"                                                                    ->  Hash Join  (cost=1576.83..4595.51 rows=72956 width=90)"

"                                                                          Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, tp_exec.touchpoint_id, wave_exec.wave_execution_id, wave_exec.wave_execution_n (...)"

"                                                                          Hash Cond: (tp_exec.wave_execution_id = wave_exec.wave_execution_id)"

"                                                                          ->  Seq Scan on public.s_d_touchpoint_execution tp_exec  (cost=0.00..1559.56 rows=72956 width=42)"

"                                                                                Output: tp_exec.touchpoint_execution_id, tp_exec.wave_execution_id, tp_exec.touchpoint_id, tp_exec.channel_type_id, tp_exec.content_id, tp_exec.message_type_id, tp_exec.start_d (...)"

"                                                                          ->  Hash  (cost=1001.37..1001.37 rows=46037 width=56)"

"                                                                                Output: wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, wave_exec.campaign_execution_id, wave_exec.wave_id"

"                                                                                ->  Seq Scan on public.s_d_wave_execution wave_exec  (cost=0.00..1001.37 rows=46037 width=56)"

"                                                                                      Output: wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, wave_exec.campaign_execution_id, wave_exec.wave_id"

"                                                                    ->  Hash  (cost=212.72..212.72 rows=10972 width=26)"

"                                                                          Output: tp.touchpoint_id, tp.touchpoint_name, tp.wave_id, tp.channel_type_id"

"                                                                          ->  Seq Scan on public.s_d_touchpoint tp  (cost=0.00..212.72 rows=10972 width=26)"

"                                                                                Output: tp.touchpoint_id, tp.touchpoint_name, tp.wave_id, tp.channel_type_id"

"                                                              ->  Index Scan using s_d_campaign_execution_idx on public.s_d_campaign_execution camp_exec  (cost=0.29..0.37 rows=1 width=67)"

"                                                                    Output: camp_exec.campaign_execution_id, camp_exec.campaign_id, camp_exec.group_id, camp_exec.campaign_execution_name, camp_exec.start_dt, camp_exec.creation_dt"

"                                                                    Index Cond: (camp_exec.campaign_execution_id = wave_exec.campaign_execution_id)"

"                                                        ->  Index Scan using s_d_wave_pkey on public.s_d_wave wave  (cost=0.29..0.31 rows=1 width=22)"

"                                                              Output: wave.wave_id, wave.campaign_id, wave.wave_name, wave.creation_dt, wave.modified_dt"

"                                                              Index Cond: (wave.wave_id = wave_exec.wave_id)"

"                                                  ->  Index Scan using s_d_campaign_pkey on public.s_d_campaign camp  (cost=0.29..0.32 rows=1 width=40)"

"                                                        Output: camp.campaign_id, camp.campaign_name, camp.objective, camp.initiative, camp.category_id, camp.creation_dt, camp.modified_dt"

"                                                        Index Cond: (camp.campaign_id = camp_exec.campaign_id)"

"                                            ->  Index Scan using s_d_content_pkey on public.s_d_content content  (cost=0.28..0.30 rows=1 width=33)"

"                                                  Output: content.content_id, content.content_name, content.creation_dt, content.channel_type_id, content.modified_dt"

"                                                  Index Cond: (tp_exec.content_id = content.content_id)"

"                                      ->  Index Scan using s_d_message_type_pkey on public.s_d_message_type message_type  (cost=0.13..0.15 rows=1 width=120)"

"                                            Output: message_type.message_type_id, message_type.message_type_name, message_type.creation_dt, message_type.modified_dt"

"                                            Index Cond: (tp_exec.message_type_id = message_type.message_type_id)"

"                                ->  Index Scan using s_d_group_pkey on public.s_d_group grup  (cost=0.13..0.15 rows=1 width=320)"

"                                      Output: grup.group_id, grup.group_name, grup.creation_dt, grup.modified_dt"

"                                      Index Cond: (camp_exec.group_id = grup.group_id)"

"                          ->  Index Scan using d_channel_pk on public.s_d_channel_type channel  (cost=0.13..0.15 rows=1 width=120)"

"                                Output: channel.channel_type_id, channel.channel_type_name"

"                                Index Cond: (channel.channel_type_id = tp.channel_type_id)"

"                    ->  Index Scan using s_d_category_pkey on public.s_d_category "CATEGORY"  (cost=0.13..0.15 rows=1 width=120)"

"                          Output: "CATEGORY".category_id, "CATEGORY".category_name, "CATEGORY".creation_dt, "CATEGORY".modified_dt"

"                          Index Cond: (camp.category_id = "CATEGORY".category_id)"

"              ->  CTE Scan on valid_executions  (cost=0.00..0.02 rows=1 width=8)"

"                    Output: valid_executions.touchpoint_execution_id"

"        ->  Nested Loop Left Join  (cost=34834.49..318190.14 rows=2 width=148)"

"              Output: base.promo_hist_id, base.audience_member_id, base.target_id, base.touchpoint_execution_id, base.contact_group_id, base.content_version_execution_id, base.sent_ind, CASE WHEN (email.sbounce_ind IS NOT NULL) THEN (email.sbounce_ind)::in (...)"

"              CTE valid_executions"

"                ->  Nested Loop  (cost=33089.13..34834.20 rows=1 width=8)"

"                      Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id"

"                      ->  Nested Loop  (cost=33088.84..34833.88 rows=1 width=16)"

"                            Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id, tpe.touchpoint_id"

"                            ->  Unique  (cost=33088.42..34825.42 rows=1 width=8)"

"                                  Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id"

"                                  ->  Merge Join  (cost=33088.42..34825.42 rows=1 width=8)"

"                                        Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id"

"                                        Merge Cond: ((s_f_touchpoint_execution_status_history.touchpoint_execution_id = s_f_touchpoint_execution_status_history_1.touchpoint_execution_id) AND (s_f_touchpoint_execution_status_history.creation_dt = (max(s_f_t (...)"

"                                        ->  Sort  (cost=19864.28..20268.98 rows=161883 width=16)"

"                                              Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id, s_f_touchpoint_execution_status_history.creation_dt"

"                                              Sort Key: s_f_touchpoint_execution_status_history.touchpoint_execution_id, s_f_touchpoint_execution_status_history.creation_dt"

"                                              ->  Seq Scan on public.s_f_touchpoint_execution_status_history  (cost=0.00..5857.68 rows=161883 width=16)"

"                                                    Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id, s_f_touchpoint_execution_status_history.creation_dt"

"                                                    Filter: (s_f_touchpoint_execution_status_history.touchpoint_execution_status_type_id = ANY ('{3,4,6}'::integer[]))"

"                                        ->  Sort  (cost=13224.15..13398.43 rows=69715 width=16)"

"                                              Output: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id, (max(s_f_touchpoint_execution_status_history_1.creation_dt))"

"                                              Sort Key: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id, (max(s_f_touchpoint_execution_status_history_1.creation_dt))"

"                                              ->  HashAggregate  (cost=6221.56..6918.71 rows=69715 width=16)"

"                                                    Output: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id, max(s_f_touchpoint_execution_status_history_1.creation_dt)"

"                                                    Group Key: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id"

"                                                    ->  Seq Scan on public.s_f_touchpoint_execution_status_history s_f_touchpoint_execution_status_history_1  (cost=0.00..4766.04 rows=291104 width=16)"

"                                                          Output: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id, s_f_touchpoint_execution_status_history_1.touchpoint_execution_status_type_id, s_f_touchpoint_execution_status_history_1.st (...)"

"                            ->  Index Scan using s_d_touchpoint_execution_pkey on public.s_d_touchpoint_execution tpe  (cost=0.42..8.44 rows=1 width=16)"

"                                  Output: tpe.touchpoint_execution_id, tpe.wave_execution_id, tpe.touchpoint_id, tpe.channel_type_id, tpe.content_id, tpe.message_type_id, tpe.start_dt, tpe.creation_dt"

"                                  Index Cond: (tpe.touchpoint_execution_id = s_f_touchpoint_execution_status_history.touchpoint_execution_id)"

"                      ->  Index Only Scan using s_d_touchpoint_pkey on public.s_d_touchpoint tp_1  (cost=0.29..0.32 rows=1 width=8)"

"                            Output: tp_1.touchpoint_id, tp_1.channel_type_id"

"                            Index Cond: ((tp_1.touchpoint_id = tpe.touchpoint_id) AND (tp_1.channel_type_id = 1))"

"              ->  Nested Loop  (cost=0.00..283350.22 rows=2 width=74)"

"                    Output: base.promo_hist_id, base.audience_member_id, base.target_id, base.touchpoint_execution_id, base.contact_group_id, base.content_version_execution_id, base.sent_ind, base.send_dt, base.creation_dt, base.modified_dt"

"                    Join Filter: (base.touchpoint_execution_id = valid_executions_1.touchpoint_execution_id)"

"                    ->  CTE Scan on valid_executions valid_executions_1  (cost=0.00..0.02 rows=1 width=8)"

"                          Output: valid_executions_1.touchpoint_execution_id"

"                    ->  Seq Scan on public.s_f_promotion_history base  (cost=0.00..283334.00 rows=1296 width=74)"

"                          Output: base.promo_hist_id, base.target_id, base.audience_member_id, base.touchpoint_execution_id, base.contact_group_id, base.content_version_execution_id, base.sent_ind, base.send_dt, base.creation_dt, base.modified_dt"

"                          Filter: ((base.send_dt >= '2014-03-13 00:00:00'::timestamp without time zone) AND (base.send_dt <= '2015-03-14 00:00:00'::timestamp without time zone))"

"              ->  Index Scan using s_f_promotion_history_email_pk1 on public.s_f_promotion_history_email email  (cost=0.29..2.83 rows=1 width=90)"

"                    Output: email.promo_hist_id, email.target_id, email.audience_member_id, email.touchpoint_execution_id, email.contact_group_id, email.sbounce_ind, email.hbounce_ind, email.opened_ind, email.clicked_ind, email.unsubscribe_ind, email.unsub (...)"

"                    Index Cond: (base.promo_hist_id = email.promo_hist_id)"

"                    Filter: (base.audience_member_id = email.audience_member_id)"

 

=================================================================================================

Questions here are :

 

Is the query written correctly as per the PostgreSQL?

Am I missing anything here?

 

Total Memory : 8 GB

shared_buffers = 2GB

work_mem = 64MB

maintenance_work_mem = 700MB

effective_cache_size = 4GB

Any kind of help is appreciated.

 

Warm Regards,


Vivekanand Joshi
+919654227927

 

Zeta Interactive

185 Madison Ave. New York, NY 10016

www.zetainteractive.com

 




--
Thanks,
M. Varadharajan

------------------------------------------------

"Experience is what you get when you didn't get what you wanted"
               -By Prof. Randy Pausch in "The Last Lecture"

My Journal :- www.thinkasgeek.wordpress.com
Вложения

Re: Performance issues

От
Vivekanand Joshi
Дата:

10 million records in s_f_promotion_history table.

 

From: Varadharajan Mukundan [mailto:srinathsmn@gmail.com]
Sent: Friday, March 13, 2015 6:29 PM
To: vjoshi@zetainteractive.com
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issues

 

Hi Vivekanand,

 

From the query plan, we can see that good amount of time is spent in this line

 

->  Seq Scan on public.s_f_promotion_history base  (cost=0.00..283334.00 rows=1296 width=74)

            Output: base.promo_hist_id, base.target_id, base.audience_member_id, base.touchpoint_execution_id, base.contact_group_id, base.content_version_execution_id, base.sent_ind, base.send_dt, base.creation_dt, base.modified_dt"                    Filter: ((base.send_dt >= '2014-03-13 00:00:00'::timestamp without time zone) AND (base.send_dt <= '2015-03-14 00:00:00'::timestamp without time zone))"

 

 

Can you try creating (partial) index based on the filter fields? ( Good tutorial @ https://devcenter.heroku.com/articles/postgresql-indexes). Did you try doing a VACUUM ANALYZE? Other approach worth trying it out is partitioning the public.s_f_promotion_history table by date (BTW, what is the size and number of rows in this table?). 

 

On Fri, Mar 13, 2015 at 12:44 PM, Vivekanand Joshi <vjoshi@zetainteractive.com> wrote:

Hi Team,

 

I am a novice to this territory. We are trying to migrate few jasper reports from Netezza to PostgreSQL.

 

I have one report ready with me but queries are taking too much time. To be honest, it is not giving any result  most of the time.

 

The same query in Netezza is running in less than 2-3 seconds.

 

========================================================================================================

 

This is the query :

 

 

SELECT

                    COUNT(DISTINCT TARGET_ID)

                FROM

                    S_V_F_PROMOTION_HISTORY_EMAIL PH

                    INNER JOIN S_V_D_CAMPAIGN_HIERARCHY CH

                        ON PH.TOUCHPOINT_EXECUTION_ID = CH.TOUCHPOINT_EXECUTION_ID

                WHERE

                    1=1

                    AND SEND_DT >= '2014-03-13'

                    AND SEND_DT <= '2015-03-14'

 

Statistics:

 

Select Count(1) from S_V_F_PROMOTION_HISTORY_EMAIL

4559289

Time: 16781.409 ms

 

Select count(1) from S_V_D_CAMPAIGN_HIERARCHY;

count

-------

45360

(1 row)

 

Time: 467869.185 ms

==================================================================

EXPLAIN PLAN FOR QUERY:

 

"Aggregate  (cost=356422.36..356422.37 rows=1 width=8)"

"  Output: count(DISTINCT base.target_id)"

"  ->  Nested Loop  (cost=68762.23..356422.36 rows=1 width=8)"

"        Output: base.target_id"

"        Join Filter: (base.touchpoint_execution_id = tp_exec.touchpoint_execution_id)"

"        ->  Nested Loop  (cost=33927.73..38232.16 rows=1 width=894)"

"              Output: camp.campaign_id, camp.campaign_name, camp.initiative, camp.objective, camp.category_id, "CATEGORY".category_name, camp_exec.campaign_execution_id, camp_exec.campaign_execution_name, camp_exec.group_id, grup.group_name, camp_exec.star (...)"

"              Join Filter: (tp_exec.touchpoint_execution_id = valid_executions.touchpoint_execution_id)"

"              CTE valid_executions"

"                ->  Merge Join  (cost=30420.45..31971.94 rows=1 width=8)"

"                      Output: s_f_touchpoint_execution_status_history_2.touchpoint_execution_id"

"                      Merge Cond: ((s_f_touchpoint_execution_status_history_2.touchpoint_execution_id = s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id) AND (s_f_touchpoint_execution_status_history_2.creation_dt = (max(s_f_touchpoint_ex (...)"

"                      ->  Sort  (cost=17196.30..17539.17 rows=137149 width=16)"

"                            Output: s_f_touchpoint_execution_status_history_2.touchpoint_execution_id, s_f_touchpoint_execution_status_history_2.creation_dt"

"                            Sort Key: s_f_touchpoint_execution_status_history_2.touchpoint_execution_id, s_f_touchpoint_execution_status_history_2.creation_dt"

"                            ->  Seq Scan on public.s_f_touchpoint_execution_status_history s_f_touchpoint_execution_status_history_2  (cost=0.00..5493.80 rows=137149 width=16)"

"                                  Output: s_f_touchpoint_execution_status_history_2.touchpoint_execution_id, s_f_touchpoint_execution_status_history_2.creation_dt"

"                                  Filter: (s_f_touchpoint_execution_status_history_2.touchpoint_execution_status_type_id = ANY ('{3,4}'::integer[]))"

"                      ->  Sort  (cost=13224.15..13398.43 rows=69715 width=16)"

"                            Output: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id, (max(s_f_touchpoint_execution_status_history_1_1.creation_dt))"

"                            Sort Key: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id, (max(s_f_touchpoint_execution_status_history_1_1.creation_dt))"

"                            ->  HashAggregate  (cost=6221.56..6918.71 rows=69715 width=16)"

"                                  Output: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id, max(s_f_touchpoint_execution_status_history_1_1.creation_dt)"

"                                  Group Key: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id"

"                                  ->  Seq Scan on public.s_f_touchpoint_execution_status_history s_f_touchpoint_execution_status_history_1_1  (cost=0.00..4766.04 rows=291104 width=16)"

"                                        Output: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id, s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_status_type_id, s_f_touchpoint_execution_status_history_1_1.status_message (...)"

"              ->  Nested Loop Left Join  (cost=1955.80..6260.19 rows=1 width=894)"

"                    Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution_id, camp_exec.campaign_ (...)"

"                    ->  Nested Loop  (cost=1955.67..6260.04 rows=1 width=776)"

"                          Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution_id, camp_exec.cam (...)"

"                          ->  Nested Loop Left Join  (cost=1955.54..6259.87 rows=1 width=658)"

"                                Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution_id, camp_ex (...)"

"                                ->  Nested Loop Left Join  (cost=1955.40..6259.71 rows=1 width=340)"

"                                      Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution_id, c (...)"

"                                      ->  Nested Loop Left Join  (cost=1955.27..6259.55 rows=1 width=222)"

"                                            Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_execution (...)"

"                                            ->  Nested Loop  (cost=1954.99..6259.24 rows=1 width=197)"

"                                                  Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campaign_exe (...)"

"                                                  ->  Nested Loop  (cost=1954.71..6258.92 rows=1 width=173)"

"                                                        Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, camp_exec.campai (...)"

"                                                        Join Filter: (camp_exec.campaign_id = wave.campaign_id)"

"                                                        ->  Nested Loop  (cost=1954.42..6254.67 rows=13 width=167)"

"                                                              Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, wave_exec. (...)"

"                                                              ->  Hash Join  (cost=1954.13..6249.67 rows=13 width=108)"

"                                                                    Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, wave (...)"

"                                                                    Hash Cond: ((tp_exec.touchpoint_id = tp.touchpoint_id) AND (wave_exec.wave_id = tp.wave_id))"

"                                                                    ->  Hash Join  (cost=1576.83..4595.51 rows=72956 width=90)"

"                                                                          Output: tp_exec.touchpoint_execution_id, tp_exec.start_dt, tp_exec.message_type_id, tp_exec.content_id, tp_exec.touchpoint_id, wave_exec.wave_execution_id, wave_exec.wave_execution_n (...)"

"                                                                          Hash Cond: (tp_exec.wave_execution_id = wave_exec.wave_execution_id)"

"                                                                          ->  Seq Scan on public.s_d_touchpoint_execution tp_exec  (cost=0.00..1559.56 rows=72956 width=42)"

"                                                                                Output: tp_exec.touchpoint_execution_id, tp_exec.wave_execution_id, tp_exec.touchpoint_id, tp_exec.channel_type_id, tp_exec.content_id, tp_exec.message_type_id, tp_exec.start_d (...)"

"                                                                          ->  Hash  (cost=1001.37..1001.37 rows=46037 width=56)"

"                                                                                Output: wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, wave_exec.campaign_execution_id, wave_exec.wave_id"

"                                                                                ->  Seq Scan on public.s_d_wave_execution wave_exec  (cost=0.00..1001.37 rows=46037 width=56)"

"                                                                                      Output: wave_exec.wave_execution_id, wave_exec.wave_execution_name, wave_exec.start_dt, wave_exec.campaign_execution_id, wave_exec.wave_id"

"                                                                    ->  Hash  (cost=212.72..212.72 rows=10972 width=26)"

"                                                                          Output: tp.touchpoint_id, tp.touchpoint_name, tp.wave_id, tp.channel_type_id"

"                                                                          ->  Seq Scan on public.s_d_touchpoint tp  (cost=0.00..212.72 rows=10972 width=26)"

"                                                                                Output: tp.touchpoint_id, tp.touchpoint_name, tp.wave_id, tp.channel_type_id"

"                                                              ->  Index Scan using s_d_campaign_execution_idx on public.s_d_campaign_execution camp_exec  (cost=0.29..0.37 rows=1 width=67)"

"                                                                    Output: camp_exec.campaign_execution_id, camp_exec.campaign_id, camp_exec.group_id, camp_exec.campaign_execution_name, camp_exec.start_dt, camp_exec.creation_dt"

"                                                                    Index Cond: (camp_exec.campaign_execution_id = wave_exec.campaign_execution_id)"

"                                                        ->  Index Scan using s_d_wave_pkey on public.s_d_wave wave  (cost=0.29..0.31 rows=1 width=22)"

"                                                              Output: wave.wave_id, wave.campaign_id, wave.wave_name, wave.creation_dt, wave.modified_dt"

"                                                              Index Cond: (wave.wave_id = wave_exec.wave_id)"

"                                                  ->  Index Scan using s_d_campaign_pkey on public.s_d_campaign camp  (cost=0.29..0.32 rows=1 width=40)"

"                                                        Output: camp.campaign_id, camp.campaign_name, camp.objective, camp.initiative, camp.category_id, camp.creation_dt, camp.modified_dt"

"                                                        Index Cond: (camp.campaign_id = camp_exec.campaign_id)"

"                                            ->  Index Scan using s_d_content_pkey on public.s_d_content content  (cost=0.28..0.30 rows=1 width=33)"

"                                                  Output: content.content_id, content.content_name, content.creation_dt, content.channel_type_id, content.modified_dt"

"                                                  Index Cond: (tp_exec.content_id = content.content_id)"

"                                      ->  Index Scan using s_d_message_type_pkey on public.s_d_message_type message_type  (cost=0.13..0.15 rows=1 width=120)"

"                                            Output: message_type.message_type_id, message_type.message_type_name, message_type.creation_dt, message_type.modified_dt"

"                                            Index Cond: (tp_exec.message_type_id = message_type.message_type_id)"

"                                ->  Index Scan using s_d_group_pkey on public.s_d_group grup  (cost=0.13..0.15 rows=1 width=320)"

"                                      Output: grup.group_id, grup.group_name, grup.creation_dt, grup.modified_dt"

"                                      Index Cond: (camp_exec.group_id = grup.group_id)"

"                          ->  Index Scan using d_channel_pk on public.s_d_channel_type channel  (cost=0.13..0.15 rows=1 width=120)"

"                                Output: channel.channel_type_id, channel.channel_type_name"

"                                Index Cond: (channel.channel_type_id = tp.channel_type_id)"

"                    ->  Index Scan using s_d_category_pkey on public.s_d_category "CATEGORY"  (cost=0.13..0.15 rows=1 width=120)"

"                          Output: "CATEGORY".category_id, "CATEGORY".category_name, "CATEGORY".creation_dt, "CATEGORY".modified_dt"

"                          Index Cond: (camp.category_id = "CATEGORY".category_id)"

"              ->  CTE Scan on valid_executions  (cost=0.00..0.02 rows=1 width=8)"

"                    Output: valid_executions.touchpoint_execution_id"

"        ->  Nested Loop Left Join  (cost=34834.49..318190.14 rows=2 width=148)"

"              Output: base.promo_hist_id, base.audience_member_id, base.target_id, base.touchpoint_execution_id, base.contact_group_id, base.content_version_execution_id, base.sent_ind, CASE WHEN (email.sbounce_ind IS NOT NULL) THEN (email.sbounce_ind)::in (...)"

"              CTE valid_executions"

"                ->  Nested Loop  (cost=33089.13..34834.20 rows=1 width=8)"

"                      Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id"

"                      ->  Nested Loop  (cost=33088.84..34833.88 rows=1 width=16)"

"                            Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id, tpe.touchpoint_id"

"                            ->  Unique  (cost=33088.42..34825.42 rows=1 width=8)"

"                                  Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id"

"                                  ->  Merge Join  (cost=33088.42..34825.42 rows=1 width=8)"

"                                        Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id"

"                                        Merge Cond: ((s_f_touchpoint_execution_status_history.touchpoint_execution_id = s_f_touchpoint_execution_status_history_1.touchpoint_execution_id) AND (s_f_touchpoint_execution_status_history.creation_dt = (max(s_f_t (...)"

"                                        ->  Sort  (cost=19864.28..20268.98 rows=161883 width=16)"

"                                              Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id, s_f_touchpoint_execution_status_history.creation_dt"

"                                              Sort Key: s_f_touchpoint_execution_status_history.touchpoint_execution_id, s_f_touchpoint_execution_status_history.creation_dt"

"                                              ->  Seq Scan on public.s_f_touchpoint_execution_status_history  (cost=0.00..5857.68 rows=161883 width=16)"

"                                                    Output: s_f_touchpoint_execution_status_history.touchpoint_execution_id, s_f_touchpoint_execution_status_history.creation_dt"

"                                                    Filter: (s_f_touchpoint_execution_status_history.touchpoint_execution_status_type_id = ANY ('{3,4,6}'::integer[]))"

"                                        ->  Sort  (cost=13224.15..13398.43 rows=69715 width=16)"

"                                              Output: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id, (max(s_f_touchpoint_execution_status_history_1.creation_dt))"

"                                              Sort Key: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id, (max(s_f_touchpoint_execution_status_history_1.creation_dt))"

"                                              ->  HashAggregate  (cost=6221.56..6918.71 rows=69715 width=16)"

"                                                    Output: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id, max(s_f_touchpoint_execution_status_history_1.creation_dt)"

"                                                    Group Key: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id"

"                                                    ->  Seq Scan on public.s_f_touchpoint_execution_status_history s_f_touchpoint_execution_status_history_1  (cost=0.00..4766.04 rows=291104 width=16)"

"                                                          Output: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id, s_f_touchpoint_execution_status_history_1.touchpoint_execution_status_type_id, s_f_touchpoint_execution_status_history_1.st (...)"

"                            ->  Index Scan using s_d_touchpoint_execution_pkey on public.s_d_touchpoint_execution tpe  (cost=0.42..8.44 rows=1 width=16)"

"                                  Output: tpe.touchpoint_execution_id, tpe.wave_execution_id, tpe.touchpoint_id, tpe.channel_type_id, tpe.content_id, tpe.message_type_id, tpe.start_dt, tpe.creation_dt"

"                                  Index Cond: (tpe.touchpoint_execution_id = s_f_touchpoint_execution_status_history.touchpoint_execution_id)"

"                      ->  Index Only Scan using s_d_touchpoint_pkey on public.s_d_touchpoint tp_1  (cost=0.29..0.32 rows=1 width=8)"

"                            Output: tp_1.touchpoint_id, tp_1.channel_type_id"

"                            Index Cond: ((tp_1.touchpoint_id = tpe.touchpoint_id) AND (tp_1.channel_type_id = 1))"

"              ->  Nested Loop  (cost=0.00..283350.22 rows=2 width=74)"

"                    Output: base.promo_hist_id, base.audience_member_id, base.target_id, base.touchpoint_execution_id, base.contact_group_id, base.content_version_execution_id, base.sent_ind, base.send_dt, base.creation_dt, base.modified_dt"

"                    Join Filter: (base.touchpoint_execution_id = valid_executions_1.touchpoint_execution_id)"

"                    ->  CTE Scan on valid_executions valid_executions_1  (cost=0.00..0.02 rows=1 width=8)"

"                          Output: valid_executions_1.touchpoint_execution_id"

"                    ->  Seq Scan on public.s_f_promotion_history base  (cost=0.00..283334.00 rows=1296 width=74)"

"                          Output: base.promo_hist_id, base.target_id, base.audience_member_id, base.touchpoint_execution_id, base.contact_group_id, base.content_version_execution_id, base.sent_ind, base.send_dt, base.creation_dt, base.modified_dt"

"                          Filter: ((base.send_dt >= '2014-03-13 00:00:00'::timestamp without time zone) AND (base.send_dt <= '2015-03-14 00:00:00'::timestamp without time zone))"

"              ->  Index Scan using s_f_promotion_history_email_pk1 on public.s_f_promotion_history_email email  (cost=0.29..2.83 rows=1 width=90)"

"                    Output: email.promo_hist_id, email.target_id, email.audience_member_id, email.touchpoint_execution_id, email.contact_group_id, email.sbounce_ind, email.hbounce_ind, email.opened_ind, email.clicked_ind, email.unsubscribe_ind, email.unsub (...)"

"                    Index Cond: (base.promo_hist_id = email.promo_hist_id)"

"                    Filter: (base.audience_member_id = email.audience_member_id)"

 

=================================================================================================

Questions here are :

 

Is the query written correctly as per the PostgreSQL?

Am I missing anything here?

 

Total Memory : 8 GB

shared_buffers = 2GB

work_mem = 64MB

maintenance_work_mem = 700MB

effective_cache_size = 4GB

Any kind of help is appreciated.

 

Warm Regards,


Vivekanand Joshi
+919654227927

 

Zeta Interactive

185 Madison Ave. New York, NY 10016

www.zetainteractive.com

 



 

--

Thanks,
M. Varadharajan

------------------------------------------------

"Experience is what you get when you didn't get what you wanted"
               -By Prof. Randy Pausch in "The Last Lecture"

My Journal :- www.thinkasgeek.wordpress.com

Вложения

Re: Performance issues

От
Varadharajan Mukundan
Дата:
If the s_f_promotion_history table will have a explosive growth, then its worth considering partitioning by date and using constraint exclusion to speed up the queries. Else, it makes sense to get started with multiple partial index (like, have a index for each week or something like that. You may want to start with a coarse grain timeline for the index and then fine grain it based on the needs)

Re: Performance issues

От
Vivekanand Joshi
Дата:

I am really worried about the performance of PostgreSQL as we have almost 1.5 billion records in promotion history table. Do you guys really think PostgreSQL can handle this much load. We have fact tables which are more than 15 GB in size and we have to make joins with those tables in almost every query.

On 13 Mar 2015 18:40, "Varadharajan Mukundan" <srinathsmn@gmail.com> wrote:
If the s_f_promotion_history table will have a explosive growth, then its worth considering partitioning by date and using constraint exclusion to speed up the queries. Else, it makes sense to get started with multiple partial index (like, have a index for each week or something like that. You may want to start with a coarse grain timeline for the index and then fine grain it based on the needs)

Re: Performance issues

От
Tomas Vondra
Дата:
Hi,

On 13.3.2015 20:59, Vivekanand Joshi wrote:
> I am really worried about the performance of PostgreSQL as we have
> almost 1.5 billion records in promotion history table. Do you guys

In the previous message you claimed the post table has 10M rows ...

> really think PostgreSQL can handle this much load. We have fact
> tables which are more than 15 GB in size and we have to make joins
> with those tables in almost every query.

That depends on what performance you're looking for. You'll have to
provide considerably more information until we can help you. You might
want to check this:

   https://wiki.postgresql.org/wiki/Slow_Query_Questions

You have not provided the full query, just a query apparently
referencing views, so that the actual query is way more complicated.
Also, plain EXPLAIN is not really sufficient, we need EXPLAIN ANALYZE.

regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance issues

От
Vivekanand Joshi
Дата:

Since I was doing it only for the testing purposes and on a development server which has only  8 GB of RAM, I used only 10m rows. But the original table has 1.5 billion rows. We will obviously be using a server with very high capacity, but I am not satisfied with the performance at all. This might be only a start, so I might get a better performance later.

Yes, the view is complex and almost is created by using 10 tables. Same goes with other views as well but this is what we are using in Netezza as well. And we are getting results of the full report in less than 5 seconds. And add to that, this is only a very little part of the whole query used in a report.

I will post the result of whole query with Explain analyze tomorrow.

We might even consider taking experts advice on how to tune queries and server, but if postgres is going to behave like this, I am not sure we would be able to continue with it.

Having said that, I would day again that I am completely new to this territory, so I might miss lots and lots of thing.

On 14 Mar 2015 02:07, "Tomas Vondra" <tomas.vondra@2ndquadrant.com> wrote:
Hi,

On 13.3.2015 20:59, Vivekanand Joshi wrote:
> I am really worried about the performance of PostgreSQL as we have
> almost 1.5 billion records in promotion history table. Do you guys

In the previous message you claimed the post table has 10M rows ...

> really think PostgreSQL can handle this much load. We have fact
> tables which are more than 15 GB in size and we have to make joins
> with those tables in almost every query.

That depends on what performance you're looking for. You'll have to
provide considerably more information until we can help you. You might
want to check this:

   https://wiki.postgresql.org/wiki/Slow_Query_Questions

You have not provided the full query, just a query apparently
referencing views, so that the actual query is way more complicated.
Also, plain EXPLAIN is not really sufficient, we need EXPLAIN ANALYZE.

regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance issues

От
Tomas Vondra
Дата:
On 13.3.2015 21:46, Vivekanand Joshi wrote:
> Since I was doing it only for the testing purposes and on a
> development server which has only  8 GB of RAM, I used only 10m rows.
> But the original table has 1.5 billion rows. We will obviously be
> using a server with very high capacity, but I am not satisfied with
> the performance at all. This might be only a start, so I might get a
> better performance later.

OK, understood.

> Yes, the view is complex and almost is created by using 10 tables. Same
> goes with other views as well but this is what we are using in Netezza
> as well. And we are getting results of the full report in less than 5
> seconds. And add to that, this is only a very little part of the whole
> query used in a report.

Well, in the very first message you asked "Is the query written
correctly as per the PostgreSQL?" - how can we decide that when most of
the query is hidden in some unknown view?

> I will post the result of whole query with Explain analyze tomorrow.

Please also collect some information about the system using iostat,
vmstat and such, so that we know what is the bottleneck.

> We might even consider taking experts advice on how to tune queries
> and server, but if postgres is going to behave like this, I am not
> sure we would be able to continue with it.

That's probably a good idea.


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance issues

От
Varadharajan Mukundan
Дата:
> We might even consider taking experts advice on how to tune queries and
> server, but if postgres is going to behave like this, I am not sure we would
> be able to continue with it.
>
> Having said that, I would day again that I am completely new to this
> territory, so I might miss lots and lots of thing.

My two cents: Postgres out of the box might not be a good choice for
data warehouse style queries, that is because it is optimized to run
thousands of small queries (OLTP style processing) and not one big
monolithic query. I've faced similar problems myself before and here
are few tricks i followed to get my elephant do real time adhoc
analysis on a table with ~45 columns and few billion rows in it.

1. Partition your table! use constraint exclusion to the fullest extent
2. Fire multiple small queries distributed over partitions and
aggregate them at the application layer. This is needed because, you
might to exploit all your cores to the fullest extent (Assuming that
you've enough memory for effective FS cache). If your dataset goes
beyond the capability of a single system, try something like Stado
(GridSQL)
3. Storing index on a RAM / faster disk disk (using tablespaces) and
using it properly makes the system blazing fast. CAUTION: This
requires some other infrastructure setup for backup and recovery
4. If you're accessing a small set of columns in a big table and if
you feel compressing the data helps a lot, give this FDW a try -
https://github.com/citusdata/cstore_fdw


Re: Performance issues

От
Scott Marlowe
Дата:
On Fri, Mar 13, 2015 at 4:03 PM, Varadharajan Mukundan
<srinathsmn@gmail.com> wrote:
>> We might even consider taking experts advice on how to tune queries and
>> server, but if postgres is going to behave like this, I am not sure we would
>> be able to continue with it.
>>
>> Having said that, I would day again that I am completely new to this
>> territory, so I might miss lots and lots of thing.
>
> My two cents: Postgres out of the box might not be a good choice for
> data warehouse style queries, that is because it is optimized to run
> thousands of small queries (OLTP style processing) and not one big
> monolithic query. I've faced similar problems myself before and here
> are few tricks i followed to get my elephant do real time adhoc
> analysis on a table with ~45 columns and few billion rows in it.
>
> 1. Partition your table! use constraint exclusion to the fullest extent
> 2. Fire multiple small queries distributed over partitions and
> aggregate them at the application layer. This is needed because, you
> might to exploit all your cores to the fullest extent (Assuming that
> you've enough memory for effective FS cache). If your dataset goes
> beyond the capability of a single system, try something like Stado
> (GridSQL)
> 3. Storing index on a RAM / faster disk disk (using tablespaces) and
> using it properly makes the system blazing fast. CAUTION: This
> requires some other infrastructure setup for backup and recovery
> 4. If you're accessing a small set of columns in a big table and if
> you feel compressing the data helps a lot, give this FDW a try -
> https://github.com/citusdata/cstore_fdw

Agreed here. IF you're gonna run reporting queries against postgresql
you have to optimize for fast seq scan stuff. I.e. an IO subsystem
that can read a big table in hundreds of megabytes per second.
Gigabytes if you can get it. A lot of spinning drives on a fast RAID
card or good software raid can do this on the cheapish, since a lot of
times you don't need big drives if you have a lot. 24 cheap 1TB drives
that each can read at ~100 MB/s can gang up on the data and you can
read a 100GB in a few seconds. But you can't deny physics. If you need
to read a 2TB table it's going to take time.

If you're only running 1 or 2 queries at a time, you can crank up the
work_mem to something crazy like 1GB even on an 8GB machine. Stopping
sorts from spilling to disk, or at least giving queries a big
playground to work in can make a huge difference. If you're gonna give
big work_mem then definitely limit connections to a handful. If you
need a lot of persistent connections then use a pooler.

The single biggest mistake people make in setting up reporting servers
on postgresql is thinking that the same hardware that worked well for
transactional stuff (a handful of SSDs and lots of memory) might not
help when you're working with TB data sets. The hardware you need
isn't the same, and using that for a reporting server is gonna result
in sub-optimal performance.

--
To understand recursion, one must first understand recursion.


Re: Performance issues

От
Vivekanand Joshi
Дата:
Hi Guys,

So here is the full information attached as well as in the link provided
below:

http://pgsql.privatepaste.com/41207bea45

I can provide new information as well.

Would like to see if queries of these type can actually run in postgres
server?

If yes, what would be the minimum requirements for hardware? We would like
to migrate our whole solution on PostgreSQL as we can spend on hardware as
much as we can but working on a proprietary appliance is becoming very
difficult for us.

Vivek


-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Saturday, March 14, 2015 3:56 AM
To: Varadharajan Mukundan
Cc: vjoshi@zetainteractive.com; Tomas Vondra;
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issues

On Fri, Mar 13, 2015 at 4:03 PM, Varadharajan Mukundan
<srinathsmn@gmail.com> wrote:
>> We might even consider taking experts advice on how to tune queries
>> and server, but if postgres is going to behave like this, I am not
>> sure we would be able to continue with it.
>>
>> Having said that, I would day again that I am completely new to this
>> territory, so I might miss lots and lots of thing.
>
> My two cents: Postgres out of the box might not be a good choice for
> data warehouse style queries, that is because it is optimized to run
> thousands of small queries (OLTP style processing) and not one big
> monolithic query. I've faced similar problems myself before and here
> are few tricks i followed to get my elephant do real time adhoc
> analysis on a table with ~45 columns and few billion rows in it.
>
> 1. Partition your table! use constraint exclusion to the fullest
> extent 2. Fire multiple small queries distributed over partitions and
> aggregate them at the application layer. This is needed because, you
> might to exploit all your cores to the fullest extent (Assuming that
> you've enough memory for effective FS cache). If your dataset goes
> beyond the capability of a single system, try something like Stado
> (GridSQL)
> 3. Storing index on a RAM / faster disk disk (using tablespaces) and
> using it properly makes the system blazing fast. CAUTION: This
> requires some other infrastructure setup for backup and recovery 4. If
> you're accessing a small set of columns in a big table and if you feel
> compressing the data helps a lot, give this FDW a try -
> https://github.com/citusdata/cstore_fdw

Agreed here. IF you're gonna run reporting queries against postgresql you
have to optimize for fast seq scan stuff. I.e. an IO subsystem that can read
a big table in hundreds of megabytes per second.
Gigabytes if you can get it. A lot of spinning drives on a fast RAID card or
good software raid can do this on the cheapish, since a lot of times you
don't need big drives if you have a lot. 24 cheap 1TB drives that each can
read at ~100 MB/s can gang up on the data and you can read a 100GB in a few
seconds. But you can't deny physics. If you need to read a 2TB table it's
going to take time.

If you're only running 1 or 2 queries at a time, you can crank up the
work_mem to something crazy like 1GB even on an 8GB machine. Stopping sorts
from spilling to disk, or at least giving queries a big playground to work
in can make a huge difference. If you're gonna give big work_mem then
definitely limit connections to a handful. If you need a lot of persistent
connections then use a pooler.

The single biggest mistake people make in setting up reporting servers on
postgresql is thinking that the same hardware that worked well for
transactional stuff (a handful of SSDs and lots of memory) might not help
when you're working with TB data sets. The hardware you need isn't the same,
and using that for a reporting server is gonna result in sub-optimal
performance.

--
To understand recursion, one must first understand recursion.

Вложения

Re: Performance issues

От
Tomas Vondra
Дата:
On 14.3.2015 00:28, Vivekanand Joshi wrote:
> Hi Guys,
>
> So here is the full information attached as well as in the link
> provided below:
>
> http://pgsql.privatepaste.com/41207bea45
>
> I can provide new information as well.

Thanks.

We still don't have EXPLAIN ANALYZE - how long was the query running (I
assume it got killed at some point)? It's really difficult to give you
any advices because we don't know where the problem is.

If EXPLAIN ANALYZE really takes too long (say, it does not complete
after an hour / over night), you'll have to break the query into parts
and first tweak those independently.

For example in the first message you mentioned that select from the
S_V_D_CAMPAIGN_HIERARCHY view takes ~9 minutes, so start with that. Give
us EXPLAIN ANALYZE for that query.

Few more comments:

(1) You're using CTEs - be aware that CTEs are not just aliases, but
    impact planning / optimization, and in some cases may prevent
    proper optimization. Try replacing them with plain views.

(2) Varadharajan Mukundan already recommended you to create index on
    s_f_promotion_history.send_dt. Have you tried that? You may also
    try creating an index on all the columns needed by the query, so
    that "Index Only Scan" is possible.

(3) There are probably additional indexes that might be useful here.
    What I'd try is adding indexes on all columns that are either a
    foreign key or used in a WHERE condition. This might be an
    overkill in some cases, but let's see.

(4) I suspect many of the relations referenced in the views are not
    actually needed in the query, i.e. the join is performed but
    then it's just discarded because those columns are not used.
    Try to simplify the views as much has possible - remove all the
    tables that are not really necessary to run the query. If two
    queries need different tables, maybe defining two views is
    a better approach.

(5) The vmstat / iostat data are pretty useless - what you provided are
    averages since the machine was started, but we need a few samples
    collected when the query is running. I.e. start the query, and then
    give us a few samples from these commands:

    iostat -x -k 1
    vmstat 1

> Would like to see if queries of these type can actually run in
> postgres server?

Why not? We're running DWH applications on tens/hundreds of GBs.

> If yes, what would be the minimum requirements for hardware? We would
> like to migrate our whole solution on PostgreSQL as we can spend on
> hardware as much as we can but working on a proprietary appliance is
> becoming very difficult for us.

That's difficult to say, because we really don't know where the problem
is and how much the queries can be optimized.


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance issues

От
Gavin Flower
Дата:
On 14/03/15 13:12, Tomas Vondra wrote:
> On 14.3.2015 00:28, Vivekanand Joshi wrote:
>> Hi Guys,
>>
>> So here is the full information attached as well as in the link
>> provided below:
>>
>> http://pgsql.privatepaste.com/41207bea45
>>
>> I can provide new information as well.
> Thanks.
>
> We still don't have EXPLAIN ANALYZE - how long was the query running (I
> assume it got killed at some point)? It's really difficult to give you
> any advices because we don't know where the problem is.
>
> If EXPLAIN ANALYZE really takes too long (say, it does not complete
> after an hour / over night), you'll have to break the query into parts
> and first tweak those independently.
>
> For example in the first message you mentioned that select from the
> S_V_D_CAMPAIGN_HIERARCHY view takes ~9 minutes, so start with that. Give
> us EXPLAIN ANALYZE for that query.
>
> Few more comments:
>
> (1) You're using CTEs - be aware that CTEs are not just aliases, but
>      impact planning / optimization, and in some cases may prevent
>      proper optimization. Try replacing them with plain views.
>
> (2) Varadharajan Mukundan already recommended you to create index on
>      s_f_promotion_history.send_dt. Have you tried that? You may also
>      try creating an index on all the columns needed by the query, so
>      that "Index Only Scan" is possible.
>
> (3) There are probably additional indexes that might be useful here.
>      What I'd try is adding indexes on all columns that are either a
>      foreign key or used in a WHERE condition. This might be an
>      overkill in some cases, but let's see.
>
> (4) I suspect many of the relations referenced in the views are not
>      actually needed in the query, i.e. the join is performed but
>      then it's just discarded because those columns are not used.
>      Try to simplify the views as much has possible - remove all the
>      tables that are not really necessary to run the query. If two
>      queries need different tables, maybe defining two views is
>      a better approach.
>
> (5) The vmstat / iostat data are pretty useless - what you provided are
>      averages since the machine was started, but we need a few samples
>      collected when the query is running. I.e. start the query, and then
>      give us a few samples from these commands:
>
>      iostat -x -k 1
>      vmstat 1
>
>> Would like to see if queries of these type can actually run in
>> postgres server?
> Why not? We're running DWH applications on tens/hundreds of GBs.
>
>> If yes, what would be the minimum requirements for hardware? We would
>> like to migrate our whole solution on PostgreSQL as we can spend on
>> hardware as much as we can but working on a proprietary appliance is
>> becoming very difficult for us.
> That's difficult to say, because we really don't know where the problem
> is and how much the queries can be optimized.
>
>
I notice that no one appears to have suggested the default setting in
postgresql.conf - these need changing as they are initially set up for
small machines, and to let PostgreSQL take anywhere near full advantage
of a box have large amounts of RAM, you need to change some of the
configuration settings!

For example 'temp_buffers' (default 8MB) and 'maintenance_work_mem'
(default 16MB) should be drastically increased,  and there are other
settings that need changing.  The precise values depend on many factors,
but the initial values set by default are definitely far too small for
your usage.

Am assuming that you are looking at PostgreSQL 9.4.



Cheers,
Gavin




Re: Performance issues

От
Varadharajan Mukundan
Дата:
Hi Gavin,

Vivekanand is his first mail itself mentioned the below configuration
of postgresql.conf. It looks good enough to me.

Total Memory : 8 GB

shared_buffers = 2GB

work_mem = 64MB

maintenance_work_mem = 700MB

effective_cache_size = 4GB

On Sat, Mar 14, 2015 at 10:06 PM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:
> On 14/03/15 13:12, Tomas Vondra wrote:
>>
>> On 14.3.2015 00:28, Vivekanand Joshi wrote:
>>>
>>> Hi Guys,
>>>
>>> So here is the full information attached as well as in the link
>>> provided below:
>>>
>>> http://pgsql.privatepaste.com/41207bea45
>>>
>>> I can provide new information as well.
>>
>> Thanks.
>>
>> We still don't have EXPLAIN ANALYZE - how long was the query running (I
>> assume it got killed at some point)? It's really difficult to give you
>> any advices because we don't know where the problem is.
>>
>> If EXPLAIN ANALYZE really takes too long (say, it does not complete
>> after an hour / over night), you'll have to break the query into parts
>> and first tweak those independently.
>>
>> For example in the first message you mentioned that select from the
>> S_V_D_CAMPAIGN_HIERARCHY view takes ~9 minutes, so start with that. Give
>> us EXPLAIN ANALYZE for that query.
>>
>> Few more comments:
>>
>> (1) You're using CTEs - be aware that CTEs are not just aliases, but
>>      impact planning / optimization, and in some cases may prevent
>>      proper optimization. Try replacing them with plain views.
>>
>> (2) Varadharajan Mukundan already recommended you to create index on
>>      s_f_promotion_history.send_dt. Have you tried that? You may also
>>      try creating an index on all the columns needed by the query, so
>>      that "Index Only Scan" is possible.
>>
>> (3) There are probably additional indexes that might be useful here.
>>      What I'd try is adding indexes on all columns that are either a
>>      foreign key or used in a WHERE condition. This might be an
>>      overkill in some cases, but let's see.
>>
>> (4) I suspect many of the relations referenced in the views are not
>>      actually needed in the query, i.e. the join is performed but
>>      then it's just discarded because those columns are not used.
>>      Try to simplify the views as much has possible - remove all the
>>      tables that are not really necessary to run the query. If two
>>      queries need different tables, maybe defining two views is
>>      a better approach.
>>
>> (5) The vmstat / iostat data are pretty useless - what you provided are
>>      averages since the machine was started, but we need a few samples
>>      collected when the query is running. I.e. start the query, and then
>>      give us a few samples from these commands:
>>
>>      iostat -x -k 1
>>      vmstat 1
>>
>>> Would like to see if queries of these type can actually run in
>>> postgres server?
>>
>> Why not? We're running DWH applications on tens/hundreds of GBs.
>>
>>> If yes, what would be the minimum requirements for hardware? We would
>>> like to migrate our whole solution on PostgreSQL as we can spend on
>>> hardware as much as we can but working on a proprietary appliance is
>>> becoming very difficult for us.
>>
>> That's difficult to say, because we really don't know where the problem
>> is and how much the queries can be optimized.
>>
>>
> I notice that no one appears to have suggested the default setting in
> postgresql.conf - these need changing as they are initially set up for small
> machines, and to let PostgreSQL take anywhere near full advantage of a box
> have large amounts of RAM, you need to change some of the configuration
> settings!
>
> For example 'temp_buffers' (default 8MB) and 'maintenance_work_mem' (default
> 16MB) should be drastically increased,  and there are other settings that
> need changing.  The precise values depend on many factors, but the initial
> values set by default are definitely far too small for your usage.
>
> Am assuming that you are looking at PostgreSQL 9.4.
>
>
>
> Cheers,
> Gavin
>
>



--
Thanks,
M. Varadharajan

------------------------------------------------

"Experience is what you get when you didn't get what you wanted"
               -By Prof. Randy Pausch in "The Last Lecture"

My Journal :- www.thinkasgeek.wordpress.com


Re: Performance issues

От
Gavin Flower
Дата:
On 15/03/15 10:23, Varadharajan Mukundan wrote:
> Hi Gavin,
>
> Vivekanand is his first mail itself mentioned the below configuration
> of postgresql.conf. It looks good enough to me.
>
> Total Memory : 8 GB
>
> shared_buffers = 2GB
>
> work_mem = 64MB
>
> maintenance_work_mem = 700MB
>
> effective_cache_size = 4GB


Sorry, it didn't register when I read it!
(Probably reading too fast)
>
> On Sat, Mar 14, 2015 at 10:06 PM, Gavin Flower
> <GavinFlower@archidevsys.co.nz> wrote:
>> On 14/03/15 13:12, Tomas Vondra wrote:
>>> On 14.3.2015 00:28, Vivekanand Joshi wrote:
>>>> Hi Guys,
>>>>
>>>> So here is the full information attached as well as in the link
>>>> provided below:
>>>>
>>>> http://pgsql.privatepaste.com/41207bea45
>>>>
>>>> I can provide new information as well.
>>> Thanks.
>>>
>>> We still don't have EXPLAIN ANALYZE - how long was the query running (I
>>> assume it got killed at some point)? It's really difficult to give you
>>> any advices because we don't know where the problem is.
>>>
>>> If EXPLAIN ANALYZE really takes too long (say, it does not complete
>>> after an hour / over night), you'll have to break the query into parts
>>> and first tweak those independently.
>>>
>>> For example in the first message you mentioned that select from the
>>> S_V_D_CAMPAIGN_HIERARCHY view takes ~9 minutes, so start with that. Give
>>> us EXPLAIN ANALYZE for that query.
>>>
>>> Few more comments:
>>>
>>> (1) You're using CTEs - be aware that CTEs are not just aliases, but
>>>       impact planning / optimization, and in some cases may prevent
>>>       proper optimization. Try replacing them with plain views.
>>>
>>> (2) Varadharajan Mukundan already recommended you to create index on
>>>       s_f_promotion_history.send_dt. Have you tried that? You may also
>>>       try creating an index on all the columns needed by the query, so
>>>       that "Index Only Scan" is possible.
>>>
>>> (3) There are probably additional indexes that might be useful here.
>>>       What I'd try is adding indexes on all columns that are either a
>>>       foreign key or used in a WHERE condition. This might be an
>>>       overkill in some cases, but let's see.
>>>
>>> (4) I suspect many of the relations referenced in the views are not
>>>       actually needed in the query, i.e. the join is performed but
>>>       then it's just discarded because those columns are not used.
>>>       Try to simplify the views as much has possible - remove all the
>>>       tables that are not really necessary to run the query. If two
>>>       queries need different tables, maybe defining two views is
>>>       a better approach.
>>>
>>> (5) The vmstat / iostat data are pretty useless - what you provided are
>>>       averages since the machine was started, but we need a few samples
>>>       collected when the query is running. I.e. start the query, and then
>>>       give us a few samples from these commands:
>>>
>>>       iostat -x -k 1
>>>       vmstat 1
>>>
>>>> Would like to see if queries of these type can actually run in
>>>> postgres server?
>>> Why not? We're running DWH applications on tens/hundreds of GBs.
>>>
>>>> If yes, what would be the minimum requirements for hardware? We would
>>>> like to migrate our whole solution on PostgreSQL as we can spend on
>>>> hardware as much as we can but working on a proprietary appliance is
>>>> becoming very difficult for us.
>>> That's difficult to say, because we really don't know where the problem
>>> is and how much the queries can be optimized.
>>>
>>>
>> I notice that no one appears to have suggested the default setting in
>> postgresql.conf - these need changing as they are initially set up for small
>> machines, and to let PostgreSQL take anywhere near full advantage of a box
>> have large amounts of RAM, you need to change some of the configuration
>> settings!
>>
>> For example 'temp_buffers' (default 8MB) and 'maintenance_work_mem' (default
>> 16MB) should be drastically increased,  and there are other settings that
>> need changing.  The precise values depend on many factors, but the initial
>> values set by default are definitely far too small for your usage.
>>
>> Am assuming that you are looking at PostgreSQL 9.4.
>>
>>
>>
>> Cheers,
>> Gavin
>>
>>
>
>
>



Re: Performance issues

От
Vivekanand Joshi
Дата:
Hi Team,

This is the EXPLAIN ANALYZE for one of the view : S_V_D_CAMPAIGN_HIERARCHY:

===========================================


Nested Loop  (cost=33666.96..37971.39 rows=1 width=894) (actual
time=443.556..966558.767 rows=45360 loops=1)
   Join Filter: (tp_exec.touchpoint_execution_id =
valid_executions.touchpoint_execution_id)
   Rows Removed by Join Filter: 3577676116
   CTE valid_executions
     ->  Hash Join  (cost=13753.53..31711.17 rows=1 width=8) (actual
time=232.571..357.749 rows=52997 loops=1)
           Hash Cond:
((s_f_touchpoint_execution_status_history_1.touchpoint_execution_id =
s_f_touchpoint_execution_status_history.touchpoint_execution_id) AND ((max(s
_f_touchpoint_execution_status_history_1.creation_dt)) =
s_f_touchpoint_execution_status_history.creation_dt))
           ->  HashAggregate  (cost=6221.56..6905.66 rows=68410 width=16)
(actual time=139.713..171.340 rows=76454 loops=1)
                 ->  Seq Scan on s_f_touchpoint_execution_status_history
s_f_touchpoint_execution_status_history_1  (cost=0.00..4766.04 rows=291104
width=16) (actual ti
me=0.006..38.582 rows=291104 loops=1)
           ->  Hash  (cost=5493.80..5493.80 rows=135878 width=16) (actual
time=92.737..92.737 rows=136280 loops=1)
                 Buckets: 16384  Batches: 1  Memory Usage: 6389kB
                 ->  Seq Scan on s_f_touchpoint_execution_status_history
(cost=0.00..5493.80 rows=135878 width=16) (actual time=0.012..55.078
rows=136280 loops=1)
                       Filter: (touchpoint_execution_status_type_id = ANY
('{3,4}'::integer[]))
                       Rows Removed by Filter: 154824
   ->  Nested Loop Left Join  (cost=1955.80..6260.19 rows=1 width=894)
(actual time=31.608..3147.015 rows=67508 loops=1)
         ->  Nested Loop  (cost=1955.67..6260.04 rows=1 width=776) (actual
time=31.602..2912.625 rows=67508 loops=1)
               ->  Nested Loop Left Join  (cost=1955.54..6259.87 rows=1
width=658) (actual time=31.595..2713.696 rows=72427 loops=1)
                     ->  Nested Loop Left Join  (cost=1955.40..6259.71
rows=1 width=340) (actual time=31.589..2532.926 rows=72427 loops=1)
                           ->  Nested Loop Left Join  (cost=1955.27..6259.55
rows=1 width=222) (actual time=31.581..2354.662 rows=72427 loops=1)
                                 ->  Nested Loop  (cost=1954.99..6259.24
rows=1 width=197) (actual time=31.572..2090.104 rows=72427 loops=1)
                                       ->  Nested Loop
(cost=1954.71..6258.92 rows=1 width=173) (actual time=31.562..1802.857
rows=72427 loops=1)
                                             Join Filter:
(camp_exec.campaign_id = wave.campaign_id)
                                             Rows Removed by Join Filter:
243
                                             ->  Nested Loop
(cost=1954.42..6254.67 rows=13 width=167) (actual time=31.551..1468.718
rows=72670 loops=1)
                                                   ->  Hash Join
(cost=1954.13..6249.67 rows=13 width=108) (actual time=31.525..402.039
rows=72670 loops=1)
                                                         Hash Cond:
((tp_exec.touchpoint_id = tp.touchpoint_id) AND (wave_exec.wave_id =
tp.wave_id))
                                                         ->  Hash Join
(cost=1576.83..4595.51 rows=72956 width=90) (actual time=26.254..256.328
rows=72956 loops=1)
                                                               Hash Cond:
(tp_exec.wave_execution_id = wave_exec.wave_execution_id)
                                                               ->  Seq Scan
on s_d_touchpoint_execution tp_exec  (cost=0.00..1559.56 rows=72956
width=42) (actual time=0.005..76.099 rows=72956 loops=1)
                                                               ->  Hash
(cost=1001.37..1001.37 rows=46037 width=56) (actual time=26.178..26.178
rows=46037 loops=1)
                                                                     Buckets:
8192  Batches: 1  Memory Usage: 4104kB
                                                                     ->  Seq
Scan on s_d_wave_execution wave_exec  (cost=0.00..1001.37 rows=46037
width=56) (actual time=0.006..10.388 rows=46037 loops=1)
                                                         ->  Hash
(cost=212.72..212.72 rows=10972 width=26) (actual time=5.252..5.252
rows=10972 loops=1)
                                                               Buckets: 2048
Batches: 1  Memory Usage: 645kB
                                                               ->  Seq Scan
on s_d_touchpoint tp  (cost=0.00..212.72 rows=10972 width=26) (actual
time=0.012..2.319 rows=10972 loops=1)
                                                   ->  Index Scan using
s_d_campaign_execution_idx on s_d_campaign_execution camp_exec
(cost=0.29..0.37 rows=1 width=67) (actual time=0.013..0.013 rows=1
loops=72670)
                                                         Index Cond:
(campaign_execution_id = wave_exec.campaign_execution_id)
                                             ->  Index Scan using
s_d_wave_pkey on s_d_wave wave  (cost=0.29..0.31 rows=1 width=22) (actual
time=0.003..0.003 rows=1 loops=72670)
                                                   Index Cond: (wave_id =
wave_exec.wave_id)
                                       ->  Index Scan using
s_d_campaign_pkey on s_d_campaign camp  (cost=0.29..0.32 rows=1 width=40)
(actual time=0.003..0.003 rows=1 loops=72427)
                                             Index Cond: (campaign_id =
camp_exec.campaign_id)
                                 ->  Index Scan using s_d_content_pkey on
s_d_content content  (cost=0.28..0.30 rows=1 width=33) (actual
time=0.002..0.003 rows=1 loops=72427)
                                       Index Cond: (tp_exec.content_id =
content_id)
                           ->  Index Scan using s_d_message_type_pkey on
s_d_message_type message_type  (cost=0.13..0.15 rows=1 width=120) (actual
time=0.001..0.002 rows=1 loops=72427)
                                 Index Cond: (tp_exec.message_type_id =
message_type_id)
                     ->  Index Scan using s_d_group_pkey on s_d_group grup
(cost=0.13..0.15 rows=1 width=320) (actual time=0.001..0.002 rows=1
loops=72427)
                           Index Cond: (camp_exec.group_id = group_id)
               ->  Index Scan using d_channel_pk on s_d_channel_type channel
(cost=0.13..0.15 rows=1 width=120) (actual time=0.001..0.002 rows=1
loops=72427)
                     Index Cond: (channel_type_id = tp.channel_type_id)
         ->  Index Scan using s_d_category_pkey on s_d_category "CATEGORY"
(cost=0.13..0.15 rows=1 width=120) (actual time=0.001..0.002 rows=1
loops=67508)
               Index Cond: (camp.category_id = category_id)
   ->  CTE Scan on valid_executions  (cost=0.00..0.02 rows=1 width=8)
(actual time=0.004..6.803 rows=52997 loops=67508)
 Total runtime: 966566.574 ms

========================================================

Can you please see it an let me know where is the issue?


-----Original Message-----
From: Gavin Flower [mailto:GavinFlower@archidevsys.co.nz]
Sent: Sunday, March 15, 2015 3:02 AM
To: Varadharajan Mukundan
Cc: Tomas Vondra; vjoshi@zetainteractive.com; Scott Marlowe;
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issues

On 15/03/15 10:23, Varadharajan Mukundan wrote:
> Hi Gavin,
>
> Vivekanand is his first mail itself mentioned the below configuration
> of postgresql.conf. It looks good enough to me.
>
> Total Memory : 8 GB
>
> shared_buffers = 2GB
>
> work_mem = 64MB
>
> maintenance_work_mem = 700MB
>
> effective_cache_size = 4GB


Sorry, it didn't register when I read it!
(Probably reading too fast)
>
> On Sat, Mar 14, 2015 at 10:06 PM, Gavin Flower
> <GavinFlower@archidevsys.co.nz> wrote:
>> On 14/03/15 13:12, Tomas Vondra wrote:
>>> On 14.3.2015 00:28, Vivekanand Joshi wrote:
>>>> Hi Guys,
>>>>
>>>> So here is the full information attached as well as in the link
>>>> provided below:
>>>>
>>>> http://pgsql.privatepaste.com/41207bea45
>>>>
>>>> I can provide new information as well.
>>> Thanks.
>>>
>>> We still don't have EXPLAIN ANALYZE - how long was the query running
>>> (I assume it got killed at some point)? It's really difficult to
>>> give you any advices because we don't know where the problem is.
>>>
>>> If EXPLAIN ANALYZE really takes too long (say, it does not complete
>>> after an hour / over night), you'll have to break the query into
>>> parts and first tweak those independently.
>>>
>>> For example in the first message you mentioned that select from the
>>> S_V_D_CAMPAIGN_HIERARCHY view takes ~9 minutes, so start with that.
>>> Give us EXPLAIN ANALYZE for that query.
>>>
>>> Few more comments:
>>>
>>> (1) You're using CTEs - be aware that CTEs are not just aliases, but
>>>       impact planning / optimization, and in some cases may prevent
>>>       proper optimization. Try replacing them with plain views.
>>>
>>> (2) Varadharajan Mukundan already recommended you to create index on
>>>       s_f_promotion_history.send_dt. Have you tried that? You may also
>>>       try creating an index on all the columns needed by the query, so
>>>       that "Index Only Scan" is possible.
>>>
>>> (3) There are probably additional indexes that might be useful here.
>>>       What I'd try is adding indexes on all columns that are either a
>>>       foreign key or used in a WHERE condition. This might be an
>>>       overkill in some cases, but let's see.
>>>
>>> (4) I suspect many of the relations referenced in the views are not
>>>       actually needed in the query, i.e. the join is performed but
>>>       then it's just discarded because those columns are not used.
>>>       Try to simplify the views as much has possible - remove all the
>>>       tables that are not really necessary to run the query. If two
>>>       queries need different tables, maybe defining two views is
>>>       a better approach.
>>>
>>> (5) The vmstat / iostat data are pretty useless - what you provided are
>>>       averages since the machine was started, but we need a few samples
>>>       collected when the query is running. I.e. start the query, and
>>> then
>>>       give us a few samples from these commands:
>>>
>>>       iostat -x -k 1
>>>       vmstat 1
>>>
>>>> Would like to see if queries of these type can actually run in
>>>> postgres server?
>>> Why not? We're running DWH applications on tens/hundreds of GBs.
>>>
>>>> If yes, what would be the minimum requirements for hardware? We
>>>> would like to migrate our whole solution on PostgreSQL as we can
>>>> spend on hardware as much as we can but working on a proprietary
>>>> appliance is becoming very difficult for us.
>>> That's difficult to say, because we really don't know where the
>>> problem is and how much the queries can be optimized.
>>>
>>>
>> I notice that no one appears to have suggested the default setting in
>> postgresql.conf - these need changing as they are initially set up
>> for small machines, and to let PostgreSQL take anywhere near full
>> advantage of a box have large amounts of RAM, you need to change some
>> of the configuration settings!
>>
>> For example 'temp_buffers' (default 8MB) and 'maintenance_work_mem'
>> (default
>> 16MB) should be drastically increased,  and there are other settings
>> that need changing.  The precise values depend on many factors, but
>> the initial values set by default are definitely far too small for your
>> usage.
>>
>> Am assuming that you are looking at PostgreSQL 9.4.
>>
>>
>>
>> Cheers,
>> Gavin
>>
>>
>
>
>


Re: Performance issues

От
Marc Mamin
Дата:
>Hi Team,
>
>This is the EXPLAIN ANALYZE for one of the view : S_V_D_CAMPAIGN_HIERARCHY:


    >Rows Removed by Join Filter: 3577676116

    That's quite a lot.
    You're possibly missing a clause in a join, resulting in a cross join.
    It is also helpful to put your result here:
    http://explain.depesz.com/
    regards,

    Marc Mamin


>
>===========================================
>
>
>Nested Loop  (cost=33666.96..37971.39 rows=1 width=894) (actual
>time=443.556..966558.767 rows=45360 loops=1)
>   Join Filter: (tp_exec.touchpoint_execution_id =
>valid_executions.touchpoint_execution_id)
>   Rows Removed by Join Filter: 3577676116
>   CTE valid_executions
>     ->  Hash Join  (cost=13753.53..31711.17 rows=1 width=8) (actual
>time=232.571..357.749 rows=52997 loops=1)
>           Hash Cond:
>((s_f_touchpoint_execution_status_history_1.touchpoint_execution_id =
>s_f_touchpoint_execution_status_history.touchpoint_execution_id) AND ((max(s
>_f_touchpoint_execution_status_history_1.creation_dt)) =
>s_f_touchpoint_execution_status_history.creation_dt))
>           ->  HashAggregate  (cost=6221.56..6905.66 rows=68410 width=16)
>(actual time=139.713..171.340 rows=76454 loops=1)
>                 ->  Seq Scan on s_f_touchpoint_execution_status_history
>s_f_touchpoint_execution_status_history_1  (cost=0.00..4766.04 rows=291104
>width=16) (actual ti
>me=0.006..38.582 rows=291104 loops=1)
>           ->  Hash  (cost=5493.80..5493.80 rows=135878 width=16) (actual
>time=92.737..92.737 rows=136280 loops=1)
>                 Buckets: 16384  Batches: 1  Memory Usage: 6389kB
>                 ->  Seq Scan on s_f_touchpoint_execution_status_history
>(cost=0.00..5493.80 rows=135878 width=16) (actual time=0.012..55.078
>rows=136280 loops=1)
>                       Filter: (touchpoint_execution_status_type_id = ANY
>('{3,4}'::integer[]))
>                       Rows Removed by Filter: 154824
>   ->  Nested Loop Left Join  (cost=1955.80..6260.19 rows=1 width=894)
>(actual time=31.608..3147.015 rows=67508 loops=1)
>         ->  Nested Loop  (cost=1955.67..6260.04 rows=1 width=776) (actual
>time=31.602..2912.625 rows=67508 loops=1)
>               ->  Nested Loop Left Join  (cost=1955.54..6259.87 rows=1
>width=658) (actual time=31.595..2713.696 rows=72427 loops=1)
>                     ->  Nested Loop Left Join  (cost=1955.40..6259.71
>rows=1 width=340) (actual time=31.589..2532.926 rows=72427 loops=1)
>                           ->  Nested Loop Left Join  (cost=1955.27..6259.55
>rows=1 width=222) (actual time=31.581..2354.662 rows=72427 loops=1)
>                                 ->  Nested Loop  (cost=1954.99..6259.24
>rows=1 width=197) (actual time=31.572..2090.104 rows=72427 loops=1)
>                                       ->  Nested Loop
>(cost=1954.71..6258.92 rows=1 width=173) (actual time=31.562..1802.857
>rows=72427 loops=1)
>                                             Join Filter:
>(camp_exec.campaign_id = wave.campaign_id)
>                                             Rows Removed by Join Filter:
>243
>                                             ->  Nested Loop
>(cost=1954.42..6254.67 rows=13 width=167) (actual time=31.551..1468.718
>rows=72670 loops=1)
>                                                   ->  Hash Join
>(cost=1954.13..6249.67 rows=13 width=108) (actual time=31.525..402.039
>rows=72670 loops=1)
>                                                         Hash Cond:
>((tp_exec.touchpoint_id = tp.touchpoint_id) AND (wave_exec.wave_id =
>tp.wave_id))
>                                                         ->  Hash Join
>(cost=1576.83..4595.51 rows=72956 width=90) (actual time=26.254..256.328
>rows=72956 loops=1)
>                                                               Hash Cond:
>(tp_exec.wave_execution_id = wave_exec.wave_execution_id)
>                                                               ->  Seq Scan
>on s_d_touchpoint_execution tp_exec  (cost=0.00..1559.56 rows=72956
>width=42) (actual time=0.005..76.099 rows=72956 loops=1)
>                                                               ->  Hash
>(cost=1001.37..1001.37 rows=46037 width=56) (actual time=26.178..26.178
>rows=46037 loops=1)
>                                                                     Buckets:
>8192  Batches: 1  Memory Usage: 4104kB
>                                                                     ->  Seq
>Scan on s_d_wave_execution wave_exec  (cost=0.00..1001.37 rows=46037
>width=56) (actual time=0.006..10.388 rows=46037 loops=1)
>                                                         ->  Hash
>(cost=212.72..212.72 rows=10972 width=26) (actual time=5.252..5.252
>rows=10972 loops=1)
>                                                               Buckets: 2048
>Batches: 1  Memory Usage: 645kB
>                                                               ->  Seq Scan
>on s_d_touchpoint tp  (cost=0.00..212.72 rows=10972 width=26) (actual
>time=0.012..2.319 rows=10972 loops=1)
>                                                   ->  Index Scan using
>s_d_campaign_execution_idx on s_d_campaign_execution camp_exec
>(cost=0.29..0.37 rows=1 width=67) (actual time=0.013..0.013 rows=1
>loops=72670)
>                                                         Index Cond:
>(campaign_execution_id = wave_exec.campaign_execution_id)
>                                             ->  Index Scan using
>s_d_wave_pkey on s_d_wave wave  (cost=0.29..0.31 rows=1 width=22) (actual
>time=0.003..0.003 rows=1 loops=72670)
>                                                   Index Cond: (wave_id =
>wave_exec.wave_id)
>                                       ->  Index Scan using
>s_d_campaign_pkey on s_d_campaign camp  (cost=0.29..0.32 rows=1 width=40)
>(actual time=0.003..0.003 rows=1 loops=72427)
>                                             Index Cond: (campaign_id =
>camp_exec.campaign_id)
>                                 ->  Index Scan using s_d_content_pkey on
>s_d_content content  (cost=0.28..0.30 rows=1 width=33) (actual
>time=0.002..0.003 rows=1 loops=72427)
>                                       Index Cond: (tp_exec.content_id =
>content_id)
>                           ->  Index Scan using s_d_message_type_pkey on
>s_d_message_type message_type  (cost=0.13..0.15 rows=1 width=120) (actual
>time=0.001..0.002 rows=1 loops=72427)
>                                 Index Cond: (tp_exec.message_type_id =
>message_type_id)
>                     ->  Index Scan using s_d_group_pkey on s_d_group grup
>(cost=0.13..0.15 rows=1 width=320) (actual time=0.001..0.002 rows=1
>loops=72427)
>                           Index Cond: (camp_exec.group_id = group_id)
>               ->  Index Scan using d_channel_pk on s_d_channel_type channel
>(cost=0.13..0.15 rows=1 width=120) (actual time=0.001..0.002 rows=1
>loops=72427)
>                     Index Cond: (channel_type_id = tp.channel_type_id)
>         ->  Index Scan using s_d_category_pkey on s_d_category "CATEGORY"
>(cost=0.13..0.15 rows=1 width=120) (actual time=0.001..0.002 rows=1
>loops=67508)
>               Index Cond: (camp.category_id = category_id)
>   ->  CTE Scan on valid_executions  (cost=0.00..0.02 rows=1 width=8)
>(actual time=0.004..6.803 rows=52997 loops=67508)
> Total runtime: 966566.574 ms
>
>========================================================
>
>Can you please see it an let me know where is the issue?
>
>
>-----Original Message-----
>From: Gavin Flower [mailto:GavinFlower@archidevsys.co.nz]
>Sent: Sunday, March 15, 2015 3:02 AM
>To: Varadharajan Mukundan
>Cc: Tomas Vondra; vjoshi@zetainteractive.com; Scott Marlowe;
>pgsql-performance@postgresql.org
>Subject: Re: [PERFORM] Performance issues
>
>On 15/03/15 10:23, Varadharajan Mukundan wrote:
>> Hi Gavin,
>>
>> Vivekanand is his first mail itself mentioned the below configuration
>> of postgresql.conf. It looks good enough to me.
>>
>> Total Memory : 8 GB
>>
>> shared_buffers = 2GB
>>
>> work_mem = 64MB
>>
>> maintenance_work_mem = 700MB
>>
>> effective_cache_size = 4GB
>
>
>Sorry, it didn't register when I read it!
>(Probably reading too fast)
>>
>> On Sat, Mar 14, 2015 at 10:06 PM, Gavin Flower
>> <GavinFlower@archidevsys.co.nz> wrote:
>>> On 14/03/15 13:12, Tomas Vondra wrote:
>>>> On 14.3.2015 00:28, Vivekanand Joshi wrote:
>>>>> Hi Guys,
>>>>>
>>>>> So here is the full information attached as well as in the link
>>>>> provided below:
>>>>>
>>>>> http://pgsql.privatepaste.com/41207bea45
>>>>>
>>>>> I can provide new information as well.
>>>> Thanks.
>>>>
>>>> We still don't have EXPLAIN ANALYZE - how long was the query running
>>>> (I assume it got killed at some point)? It's really difficult to
>>>> give you any advices because we don't know where the problem is.
>>>>
>>>> If EXPLAIN ANALYZE really takes too long (say, it does not complete
>>>> after an hour / over night), you'll have to break the query into
>>>> parts and first tweak those independently.
>>>>
>>>> For example in the first message you mentioned that select from the
>>>> S_V_D_CAMPAIGN_HIERARCHY view takes ~9 minutes, so start with that.
>>>> Give us EXPLAIN ANALYZE for that query.
>>>>
>>>> Few more comments:
>>>>
>>>> (1) You're using CTEs - be aware that CTEs are not just aliases, but
>>>>       impact planning / optimization, and in some cases may prevent
>>>>       proper optimization. Try replacing them with plain views.
>>>>
>>>> (2) Varadharajan Mukundan already recommended you to create index on
>>>>       s_f_promotion_history.send_dt. Have you tried that? You may also
>>>>       try creating an index on all the columns needed by the query, so
>>>>       that "Index Only Scan" is possible.
>>>>
>>>> (3) There are probably additional indexes that might be useful here.
>>>>       What I'd try is adding indexes on all columns that are either a
>>>>       foreign key or used in a WHERE condition. This might be an
>>>>       overkill in some cases, but let's see.
>>>>
>>>> (4) I suspect many of the relations referenced in the views are not
>>>>       actually needed in the query, i.e. the join is performed but
>>>>       then it's just discarded because those columns are not used.
>>>>       Try to simplify the views as much has possible - remove all the
>>>>       tables that are not really necessary to run the query. If two
>>>>       queries need different tables, maybe defining two views is
>>>>       a better approach.
>>>>
>>>> (5) The vmstat / iostat data are pretty useless - what you provided are
>>>>       averages since the machine was started, but we need a few samples
>>>>       collected when the query is running. I.e. start the query, and
>>>> then
>>>>       give us a few samples from these commands:
>>>>
>>>>       iostat -x -k 1
>>>>       vmstat 1
>>>>
>>>>> Would like to see if queries of these type can actually run in
>>>>> postgres server?
>>>> Why not? We're running DWH applications on tens/hundreds of GBs.
>>>>
>>>>> If yes, what would be the minimum requirements for hardware? We
>>>>> would like to migrate our whole solution on PostgreSQL as we can
>>>>> spend on hardware as much as we can but working on a proprietary
>>>>> appliance is becoming very difficult for us.
>>>> That's difficult to say, because we really don't know where the
>>>> problem is and how much the queries can be optimized.
>>>>
>>>>
>>> I notice that no one appears to have suggested the default setting in
>>> postgresql.conf - these need changing as they are initially set up
>>> for small machines, and to let PostgreSQL take anywhere near full
>>> advantage of a box have large amounts of RAM, you need to change some
>>> of the configuration settings!
>>>
>>> For example 'temp_buffers' (default 8MB) and 'maintenance_work_mem'
>>> (default
>>> 16MB) should be drastically increased,  and there are other settings
>>> that need changing.  The precise values depend on many factors, but
>>> the initial values set by default are definitely far too small for your
>>> usage.
>>>
>>> Am assuming that you are looking at PostgreSQL 9.4.
>>>
>>>
>>>
>>> Cheers,
>>> Gavin
>>>
>>>
>>
>>
>>
>
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Performance issues

От
Tomas Vondra
Дата:
On 16.3.2015 18:49, Marc Mamin wrote:
>
>> Hi Team,
>>
>> This is the EXPLAIN ANALYZE for one of the view : S_V_D_CAMPAIGN_HIERARCHY:

FWIW, this is a somewhat more readable version of the plan:

    http://explain.depesz.com/s/nbB

In the future, please do two things:

(1) Attach the plan as a text file, because the mail clients tend to
    screw things up (wrapping long lines). Unless the plan is trivial,
    of course - but pgsql-performance usually deals with complex stuff.

(2) Put the plan on explain.depesz.com helps too, because it's
    considerably more readable (but always do 1, because resorces
    placed somewhere else tends to disappear, and the posts then make
    very little sense, which is bad when searching in the archives)

(3) Same for stuff pasted somewhere else - always attach it to the
    message. For example I'd like to give you more accurate advice, but
    I can't as http://pgsql.privatepaste.com/41207bea45 is unavailable.

>
>
>     >Rows Removed by Join Filter: 3577676116
>
>     That's quite a lot.
>     You're possibly missing a clause in a join, resulting in a cross join.
>     It is also helpful to put your result here:
>     http://explain.depesz.com/
>     regards,

IMHO this is merely a consequence of using the CTE, which produces 52997
rows and is scanned 67508x as the inner relation of a nested loop. That
gives you 3577721476 tuples in total, and only 45360 are kept (hence
3577676116 are removed).

This is a prime example of why CTEs are not just aliases for subqueries,
but may actually cause serious trouble.

There are other issues (e.g. the row count estimate of the CTE is
seriously off, most likely because of the HashAggregate in the outer
branch), but that's a secondary issue IMHO.

Vivekanand, try this (in the order of intrusiveness):

(1) Get rid of the CTE, and just replace it with subselect in the FROM
    part of the query, so instead of this:

    WITH valid_executions AS (...)
    SELECT ... FROM ... JOIN valid_executions ON (...)

    you'll have something like this:

    SELECT ... FROM ... JOIN (...) AS valid_executions ON (...)

    This way the subselect will optimized properly.


(2) Replace the CTE with a materialized view, or a temporary table.
    This has both advantages and disadvantages - the main advantage is
    that you can create indexes, collect statistics. Disadvantage is
    you have to refresh the MV, fill temporary table etc.

I expect (1) to improve the performance significantly, and (2) might
improve it even further by fixing the misestimates.


regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance issues

От
Vivekanand Joshi
Дата:

Hey guys, thanks a lot.

This is really helping. I am learning a lot. BTW, I changed CTE into subquery and it improved the performance by miles. I am getting the result in less than 3 seconds, though I am using a 24 GB ram server. It is still a great turnaround time as compared to  previous execution time.

Now I will look into the bigger query. I read explain analyze and that helped a lot. I will be coming up with more questions tomorrow as bigger query still has got some problems.

On 16 Mar 2015 23:55, "Tomas Vondra" <tomas.vondra@2ndquadrant.com> wrote:
On 16.3.2015 18:49, Marc Mamin wrote:
>
>> Hi Team,
>>
>> This is the EXPLAIN ANALYZE for one of the view : S_V_D_CAMPAIGN_HIERARCHY:

FWIW, this is a somewhat more readable version of the plan:

    http://explain.depesz.com/s/nbB

In the future, please do two things:

(1) Attach the plan as a text file, because the mail clients tend to
    screw things up (wrapping long lines). Unless the plan is trivial,
    of course - but pgsql-performance usually deals with complex stuff.

(2) Put the plan on explain.depesz.com helps too, because it's
    considerably more readable (but always do 1, because resorces
    placed somewhere else tends to disappear, and the posts then make
    very little sense, which is bad when searching in the archives)

(3) Same for stuff pasted somewhere else - always attach it to the
    message. For example I'd like to give you more accurate advice, but
    I can't as http://pgsql.privatepaste.com/41207bea45 is unavailable.

>
>
>       >Rows Removed by Join Filter: 3577676116
>
>       That's quite a lot.
>       You're possibly missing a clause in a join, resulting in a cross join.
>       It is also helpful to put your result here:
>       http://explain.depesz.com/
>       regards,

IMHO this is merely a consequence of using the CTE, which produces 52997
rows and is scanned 67508x as the inner relation of a nested loop. That
gives you 3577721476 tuples in total, and only 45360 are kept (hence
3577676116 are removed).

This is a prime example of why CTEs are not just aliases for subqueries,
but may actually cause serious trouble.

There are other issues (e.g. the row count estimate of the CTE is
seriously off, most likely because of the HashAggregate in the outer
branch), but that's a secondary issue IMHO.

Vivekanand, try this (in the order of intrusiveness):

(1) Get rid of the CTE, and just replace it with subselect in the FROM
    part of the query, so instead of this:

    WITH valid_executions AS (...)
    SELECT ... FROM ... JOIN valid_executions ON (...)

    you'll have something like this:

    SELECT ... FROM ... JOIN (...) AS valid_executions ON (...)

    This way the subselect will optimized properly.


(2) Replace the CTE with a materialized view, or a temporary table.
    This has both advantages and disadvantages - the main advantage is
    that you can create indexes, collect statistics. Disadvantage is
    you have to refresh the MV, fill temporary table etc.

I expect (1) to improve the performance significantly, and (2) might
improve it even further by fixing the misestimates.


regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance issues

От
Jim Nasby
Дата:
On 3/13/15 7:12 PM, Tomas Vondra wrote:
> (4) I suspect many of the relations referenced in the views are not
>      actually needed in the query, i.e. the join is performed but
>      then it's just discarded because those columns are not used.
>      Try to simplify the views as much has possible - remove all the
>      tables that are not really necessary to run the query. If two
>      queries need different tables, maybe defining two views is
>      a better approach.

A better alternative with multi-purpose views is to use an outer join
instead of an inner join. With an outer join if you ultimately don't
refer to any of the columns in a particular table Postgres will remove
the table from the query completely.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: Performance issues

От
Tomas Vondra
Дата:
On 16.3.2015 20:43, Jim Nasby wrote:
> On 3/13/15 7:12 PM, Tomas Vondra wrote:
>> (4) I suspect many of the relations referenced in the views are not
>>      actually needed in the query, i.e. the join is performed but
>>      then it's just discarded because those columns are not used.
>>      Try to simplify the views as much has possible - remove all the
>>      tables that are not really necessary to run the query. If two
>>      queries need different tables, maybe defining two views is
>>      a better approach.
>
> A better alternative with multi-purpose views is to use an outer
> join instead of an inner join. With an outer join if you ultimately
> don't refer to any of the columns in a particular table Postgres will
> remove the table from the query completely.

Really? Because a quick test suggests otherwise:

db=# create table test_a (id int);
CREATE TABLE
db=# create table test_b (id int);
CREATE TABLE
db=# explain select test_a.* from test_a left join test_b using (id);
                              QUERY PLAN
----------------------------------------------------------------------
 Merge Left Join  (cost=359.57..860.00 rows=32512 width=4)
   Merge Cond: (test_a.id = test_b.id)
   ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
         Sort Key: test_a.id
         ->  Seq Scan on test_a  (cost=0.00..35.50 rows=2550 width=4)
   ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
         Sort Key: test_b.id
         ->  Seq Scan on test_b  (cost=0.00..35.50 rows=2550 width=4)
(8 rows)

Also, how would that work with duplicate rows in the referenced table?


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance issues

От
Jim Nasby
Дата:
On 3/16/15 3:59 PM, Tomas Vondra wrote:
> On 16.3.2015 20:43, Jim Nasby wrote:
>> On 3/13/15 7:12 PM, Tomas Vondra wrote:
>>> (4) I suspect many of the relations referenced in the views are not
>>>       actually needed in the query, i.e. the join is performed but
>>>       then it's just discarded because those columns are not used.
>>>       Try to simplify the views as much has possible - remove all the
>>>       tables that are not really necessary to run the query. If two
>>>       queries need different tables, maybe defining two views is
>>>       a better approach.
>>
>> A better alternative with multi-purpose views is to use an outer
>> join instead of an inner join. With an outer join if you ultimately
>> don't refer to any of the columns in a particular table Postgres will
>> remove the table from the query completely.
>
> Really? Because a quick test suggests otherwise:
>
> db=# create table test_a (id int);
> CREATE TABLE
> db=# create table test_b (id int);
> CREATE TABLE
> db=# explain select test_a.* from test_a left join test_b using (id);
>                                QUERY PLAN
> ----------------------------------------------------------------------
>   Merge Left Join  (cost=359.57..860.00 rows=32512 width=4)
>     Merge Cond: (test_a.id = test_b.id)
>     ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
>           Sort Key: test_a.id
>           ->  Seq Scan on test_a  (cost=0.00..35.50 rows=2550 width=4)
>     ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
>           Sort Key: test_b.id
>           ->  Seq Scan on test_b  (cost=0.00..35.50 rows=2550 width=4)
> (8 rows)
>
> Also, how would that work with duplicate rows in the referenced table?

Right, I neglected to mention that the omitted table must also be unique
on the join key:

decibel@decina.attlocal=# create table a(a_id serial primary key);
CREATE TABLE
decibel@decina.attlocal=# create table b(a_id int);
CREATE TABLE
decibel@decina.attlocal=# explain analyze select a.* from a left join b
using(a_id);
                                                 QUERY PLAN

-----------------------------------------------------------------------------------------------------------
  Hash Right Join  (cost=67.38..137.94 rows=2550 width=4) (actual
time=0.035..0.035 rows=0 loops=1)
    Hash Cond: (b.a_id = a.a_id)
    ->  Seq Scan on b  (cost=0.00..35.50 rows=2550 width=4) (never executed)
    ->  Hash  (cost=35.50..35.50 rows=2550 width=4) (actual
time=0.002..0.002 rows=0 loops=1)
          Buckets: 4096  Batches: 1  Memory Usage: 32kB
          ->  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4)
(actual time=0.001..0.001 rows=0 loops=1)
  Planning time: 0.380 ms
  Execution time: 0.086 ms
(8 rows)

decibel@decina.attlocal=# alter table b add primary key(a_id);
ALTER TABLE
decibel@decina.attlocal=# explain analyze select a.* from a left join b
using(a_id);
                                           QUERY PLAN

-----------------------------------------------------------------------------------------------
  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
  Planning time: 0.247 ms
  Execution time: 0.029 ms
(3 rows)

decibel@decina.attlocal=# alter table a drop constraint a_pkey;
ALTER TABLE
decibel@decina.attlocal=# explain analyze select a.* from a left join b
using(a_id);
                                           QUERY PLAN

-----------------------------------------------------------------------------------------------
  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
  Planning time: 0.098 ms
  Execution time: 0.011 ms
(3 rows)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: Performance issues

От
Vivekanand Joshi
Дата:
Hi Guys,

Next level of query is following:

If this works, I guess 90% of the problem will be solved.

SELECT
                    COUNT(DISTINCT TARGET_ID)
                FROM
                    S_V_F_PROMOTION_HISTORY_EMAIL PH
                    INNER JOIN S_V_D_CAMPAIGN_HIERARCHY CH
                        ON PH.TOUCHPOINT_EXECUTION_ID =
CH.TOUCHPOINT_EXECUTION_ID
                WHERE
                    1=1
                    AND SEND_DT >= '2014-03-13'
                    AND SEND_DT <= '2015-03-14'


In this query, I am joining two views which were made earlier with CTEs. I
have replaced the CTE's with subqueries. The view were giving me output in
around 5-10 minutes and now I am getting the same result in around 3-4
seconds.

But when I executed the query written above, I am again stuck. I am
attaching the query plan as well the link.

http://explain.depesz.com/s/REeu

I can see most of the time is spending inside a nested loop and total
costs comes out be cost=338203.81..338203.82.

How to take care of this? I need to run this query in a report so I cannot
create a table like select * from views and then join the table. If I do
that I am getting the answer of whole big query in some 6-7 seconds. But
that is not feasible. A report (Jasper can have only one single (big/small
query).

Let me know if you need any other information.

Thanks a ton!
Vivek


-----Original Message-----
From: Jim Nasby [mailto:Jim.Nasby@BlueTreble.com]
Sent: Tuesday, March 17, 2015 5:36 AM
To: Tomas Vondra; vjoshi@zetainteractive.com; Scott Marlowe; Varadharajan
Mukundan
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issues

On 3/16/15 3:59 PM, Tomas Vondra wrote:
> On 16.3.2015 20:43, Jim Nasby wrote:
>> On 3/13/15 7:12 PM, Tomas Vondra wrote:
>>> (4) I suspect many of the relations referenced in the views are not
>>>       actually needed in the query, i.e. the join is performed but
>>>       then it's just discarded because those columns are not used.
>>>       Try to simplify the views as much has possible - remove all the
>>>       tables that are not really necessary to run the query. If two
>>>       queries need different tables, maybe defining two views is
>>>       a better approach.
>>
>> A better alternative with multi-purpose views is to use an outer join
>> instead of an inner join. With an outer join if you ultimately don't
>> refer to any of the columns in a particular table Postgres will
>> remove the table from the query completely.
>
> Really? Because a quick test suggests otherwise:
>
> db=# create table test_a (id int);
> CREATE TABLE
> db=# create table test_b (id int);
> CREATE TABLE
> db=# explain select test_a.* from test_a left join test_b using (id);
>                                QUERY PLAN
> ----------------------------------------------------------------------
>   Merge Left Join  (cost=359.57..860.00 rows=32512 width=4)
>     Merge Cond: (test_a.id = test_b.id)
>     ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
>           Sort Key: test_a.id
>           ->  Seq Scan on test_a  (cost=0.00..35.50 rows=2550 width=4)
>     ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
>           Sort Key: test_b.id
>           ->  Seq Scan on test_b  (cost=0.00..35.50 rows=2550 width=4)
> (8 rows)
>
> Also, how would that work with duplicate rows in the referenced table?

Right, I neglected to mention that the omitted table must also be unique
on the join key:

decibel@decina.attlocal=# create table a(a_id serial primary key); CREATE
TABLE decibel@decina.attlocal=# create table b(a_id int); CREATE TABLE
decibel@decina.attlocal=# explain analyze select a.* from a left join b
using(a_id);
                                                 QUERY PLAN

--------------------------------------------------------------------------
---------------------------------
  Hash Right Join  (cost=67.38..137.94 rows=2550 width=4) (actual
time=0.035..0.035 rows=0 loops=1)
    Hash Cond: (b.a_id = a.a_id)
    ->  Seq Scan on b  (cost=0.00..35.50 rows=2550 width=4) (never
executed)
    ->  Hash  (cost=35.50..35.50 rows=2550 width=4) (actual
time=0.002..0.002 rows=0 loops=1)
          Buckets: 4096  Batches: 1  Memory Usage: 32kB
          ->  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
  Planning time: 0.380 ms
  Execution time: 0.086 ms
(8 rows)

decibel@decina.attlocal=# alter table b add primary key(a_id); ALTER TABLE
decibel@decina.attlocal=# explain analyze select a.* from a left join b
using(a_id);
                                           QUERY PLAN

--------------------------------------------------------------------------
---------------------
  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
  Planning time: 0.247 ms
  Execution time: 0.029 ms
(3 rows)

decibel@decina.attlocal=# alter table a drop constraint a_pkey; ALTER
TABLE decibel@decina.attlocal=# explain analyze select a.* from a left
join b using(a_id);
                                           QUERY PLAN

--------------------------------------------------------------------------
---------------------
  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
  Planning time: 0.098 ms
  Execution time: 0.011 ms
(3 rows)
--
Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it
in Treble! http://BlueTreble.com

Вложения

Re: Performance issues

От
Vivekanand Joshi
Дата:
EXPLAIN ANALYZE didn't give result even after three hours.


-----Original Message-----
From: Vivekanand Joshi [mailto:vjoshi@zetainteractive.com]
Sent: Tuesday, March 17, 2015 1:11 PM
To: 'Jim Nasby'; 'Tomas Vondra'; 'Scott Marlowe'; 'Varadharajan Mukundan'
Cc: 'pgsql-performance@postgresql.org'
Subject: RE: [PERFORM] Performance issues

Hi Guys,

Next level of query is following:

If this works, I guess 90% of the problem will be solved.

SELECT
                    COUNT(DISTINCT TARGET_ID)
                FROM
                    S_V_F_PROMOTION_HISTORY_EMAIL PH
                    INNER JOIN S_V_D_CAMPAIGN_HIERARCHY CH
                        ON PH.TOUCHPOINT_EXECUTION_ID =
CH.TOUCHPOINT_EXECUTION_ID
                WHERE
                    1=1
                    AND SEND_DT >= '2014-03-13'
                    AND SEND_DT <= '2015-03-14'


In this query, I am joining two views which were made earlier with CTEs. I
have replaced the CTE's with subqueries. The view were giving me output in
around 5-10 minutes and now I am getting the same result in around 3-4
seconds.

But when I executed the query written above, I am again stuck. I am
attaching the query plan as well the link.

http://explain.depesz.com/s/REeu

I can see most of the time is spending inside a nested loop and total
costs comes out be cost=338203.81..338203.82.

How to take care of this? I need to run this query in a report so I cannot
create a table like select * from views and then join the table. If I do
that I am getting the answer of whole big query in some 6-7 seconds. But
that is not feasible. A report (Jasper can have only one single (big/small
query).

Let me know if you need any other information.

Thanks a ton!
Vivek


-----Original Message-----
From: Jim Nasby [mailto:Jim.Nasby@BlueTreble.com]
Sent: Tuesday, March 17, 2015 5:36 AM
To: Tomas Vondra; vjoshi@zetainteractive.com; Scott Marlowe; Varadharajan
Mukundan
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issues

On 3/16/15 3:59 PM, Tomas Vondra wrote:
> On 16.3.2015 20:43, Jim Nasby wrote:
>> On 3/13/15 7:12 PM, Tomas Vondra wrote:
>>> (4) I suspect many of the relations referenced in the views are not
>>>       actually needed in the query, i.e. the join is performed but
>>>       then it's just discarded because those columns are not used.
>>>       Try to simplify the views as much has possible - remove all the
>>>       tables that are not really necessary to run the query. If two
>>>       queries need different tables, maybe defining two views is
>>>       a better approach.
>>
>> A better alternative with multi-purpose views is to use an outer join
>> instead of an inner join. With an outer join if you ultimately don't
>> refer to any of the columns in a particular table Postgres will
>> remove the table from the query completely.
>
> Really? Because a quick test suggests otherwise:
>
> db=# create table test_a (id int);
> CREATE TABLE
> db=# create table test_b (id int);
> CREATE TABLE
> db=# explain select test_a.* from test_a left join test_b using (id);
>                                QUERY PLAN
> ----------------------------------------------------------------------
>   Merge Left Join  (cost=359.57..860.00 rows=32512 width=4)
>     Merge Cond: (test_a.id = test_b.id)
>     ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
>           Sort Key: test_a.id
>           ->  Seq Scan on test_a  (cost=0.00..35.50 rows=2550 width=4)
>     ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
>           Sort Key: test_b.id
>           ->  Seq Scan on test_b  (cost=0.00..35.50 rows=2550 width=4)
> (8 rows)
>
> Also, how would that work with duplicate rows in the referenced table?

Right, I neglected to mention that the omitted table must also be unique
on the join key:

decibel@decina.attlocal=# create table a(a_id serial primary key); CREATE
TABLE decibel@decina.attlocal=# create table b(a_id int); CREATE TABLE
decibel@decina.attlocal=# explain analyze select a.* from a left join b
using(a_id);
                                                 QUERY PLAN

--------------------------------------------------------------------------
---------------------------------
  Hash Right Join  (cost=67.38..137.94 rows=2550 width=4) (actual
time=0.035..0.035 rows=0 loops=1)
    Hash Cond: (b.a_id = a.a_id)
    ->  Seq Scan on b  (cost=0.00..35.50 rows=2550 width=4) (never
executed)
    ->  Hash  (cost=35.50..35.50 rows=2550 width=4) (actual
time=0.002..0.002 rows=0 loops=1)
          Buckets: 4096  Batches: 1  Memory Usage: 32kB
          ->  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
  Planning time: 0.380 ms
  Execution time: 0.086 ms
(8 rows)

decibel@decina.attlocal=# alter table b add primary key(a_id); ALTER TABLE
decibel@decina.attlocal=# explain analyze select a.* from a left join b
using(a_id);
                                           QUERY PLAN

--------------------------------------------------------------------------
---------------------
  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
  Planning time: 0.247 ms
  Execution time: 0.029 ms
(3 rows)

decibel@decina.attlocal=# alter table a drop constraint a_pkey; ALTER
TABLE decibel@decina.attlocal=# explain analyze select a.* from a left
join b using(a_id);
                                           QUERY PLAN

--------------------------------------------------------------------------
---------------------
  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
  Planning time: 0.098 ms
  Execution time: 0.011 ms
(3 rows)
--
Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it
in Treble! http://BlueTreble.com


Re: Performance issues

От
Tomas Vondra
Дата:
Hi,

On 17.3.2015 08:41, Vivekanand Joshi wrote:
> Hi Guys,
>
> Next level of query is following:
>
> If this works, I guess 90% of the problem will be solved.
>
> SELECT
>                     COUNT(DISTINCT TARGET_ID)
>                 FROM
>                     S_V_F_PROMOTION_HISTORY_EMAIL PH
>                     INNER JOIN S_V_D_CAMPAIGN_HIERARCHY CH
>                         ON PH.TOUCHPOINT_EXECUTION_ID =
> CH.TOUCHPOINT_EXECUTION_ID
>                 WHERE
>                     1=1
>                     AND SEND_DT >= '2014-03-13'
>                     AND SEND_DT <= '2015-03-14'
>
>
> In this query, I am joining two views which were made earlier with CTEs. I
> have replaced the CTE's with subqueries. The view were giving me output in
> around 5-10 minutes and now I am getting the same result in around 3-4
> seconds.
>
> But when I executed the query written above, I am again stuck. I am
> attaching the query plan as well the link.
>
> http://explain.depesz.com/s/REeu
>
> I can see most of the time is spending inside a nested loop and total
> costs comes out be cost=338203.81..338203.82.

Most of that cost comes from this:

Seq Scan on s_f_promotion_history base (cost=0.00..283,333.66 rows=1
width=32)
  Filter: ((send_dt >= '2014-03-13 00:00:00'::timestamp without time
zone) AND (send_dt <= '2015-03-14 00:00:00'::timestamp without time


That's a bit weird, I guess. If you analyze this part of the query
separately, i.e.

EXPLAIN ANALYZE SELECT * FROM s_f_promotion_history
 WHERE (send_dt >= '2014-03-13 00:00:00')
   AND (send_dt <= '2015-03-14 00:00:00')

what do you get?

I suspect it's used in EXISTS, i.e. something like this:

... WHERE EXISTS (SELECT * FROM s_f_promotion_history
                   WHERE ... send_dt conditions ...
                     AND touchpoint_execution_id =
     s_f_touchpoint_execution_status_history_1.touchpoint_execution_id)

and this is transformed into a nested loop join. If there's a
misestimate, this may be quite expensive - try to create index on

   s_f_promotion_history (touchpoint_execution_id, send_date)


regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance issues

От
Tomas Vondra
Дата:
On 17.3.2015 12:07, Vivekanand Joshi wrote:
> EXPLAIN ANALYZE didn't give result even after three hours.

In that case the only thing you can do is 'slice' the query into smaller
parts (representing subtrees of the plan), and analyze those first. Look
for misestimates (significant differences between estimated and actual
row counts, and very expensive parts).

We can't do that, because we don't have your data or queries, and
without the explain analyze it's difficult to give advices.


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance issues

От
Vivekanand Joshi
Дата:
Hi Tomas,

This is what I am getting,


EXPLAIN ANALYZE SELECT * FROM s_f_promotion_history  WHERE (send_dt >=
'2014-03-13 00:00:00');

                                                                   QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on s_f_promotion_history  (cost=0.00..283333.66 rows=1 width=74)
(actual time=711.023..1136.393 rows=1338 loops=1)
   Filter: ((send_dt >= '2014-03-13 00:00:00'::timestamp without time zone)
AND (send_dt <= '2015-03-14 00:00:00'::timestamp without time zone))
   Rows Removed by Filter: 9998662
 Total runtime: 1170.682 ms


CREATE INDEX idx_pr_history ON
S_F_PROMOTION_HISTORY(touchpoint_execution_id, send_dt);

 After Creating Index:

                                                                      QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_pr_history on s_f_promotion_history
(cost=0.43..254028.45 rows=1 width=74) (actual time=375.796..604.587
rows=1338 loops=1)
   Index Cond: ((send_dt >= '2014-03-13 00:00:00'::timestamp without time
zone) AND (send_dt <= '2015-03-14 00:00:00'::timestamp without time zone))
 Total runtime: 604.733 ms


The query I gave you is the smallest query, it is using two views and both
the views I have changed by using subqueries instead of CTEs.  When I join
these two views, it is not getting completed at all.

Explain analyze plan for view  s_v_f_promotion_history_email:
http://explain.depesz.com/s/ure
Explain analyze plan for view  s_v_d_campaign_hierarchy :
http://explain.depesz.com/s/WxI


Regards,
Vivek






-----Original Message-----
From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
Sent: Tuesday, March 17, 2015 5:15 PM
To: vjoshi@zetainteractive.com; Jim Nasby; Scott Marlowe; Varadharajan
Mukundan
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issues

On 17.3.2015 12:07, Vivekanand Joshi wrote:
> EXPLAIN ANALYZE didn't give result even after three hours.

In that case the only thing you can do is 'slice' the query into smaller
parts (representing subtrees of the plan), and analyze those first. Look
for misestimates (significant differences between estimated and actual
row counts, and very expensive parts).

We can't do that, because we don't have your data or queries, and
without the explain analyze it's difficult to give advices.


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance issues

От
Vivekanand Joshi
Дата:
Attaching explain analyze file as well.

Vivek


-----Original Message-----
From: Vivekanand Joshi [mailto:vjoshi@zetainteractive.com]
Sent: Tuesday, March 17, 2015 5:36 PM
To: 'Tomas Vondra'; 'Jim Nasby'; 'Scott Marlowe'; 'Varadharajan Mukundan'
Cc: 'pgsql-performance@postgresql.org'
Subject: RE: [PERFORM] Performance issues

Hi Tomas,

This is what I am getting,


EXPLAIN ANALYZE SELECT * FROM s_f_promotion_history  WHERE (send_dt >=
'2014-03-13 00:00:00');

                                                                   QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on s_f_promotion_history  (cost=0.00..283333.66 rows=1 width=74)
(actual time=711.023..1136.393 rows=1338 loops=1)
   Filter: ((send_dt >= '2014-03-13 00:00:00'::timestamp without time zone)
AND (send_dt <= '2015-03-14 00:00:00'::timestamp without time zone))
   Rows Removed by Filter: 9998662
 Total runtime: 1170.682 ms


CREATE INDEX idx_pr_history ON
S_F_PROMOTION_HISTORY(touchpoint_execution_id, send_dt);

 After Creating Index:

                                                                      QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_pr_history on s_f_promotion_history
(cost=0.43..254028.45 rows=1 width=74) (actual time=375.796..604.587
rows=1338 loops=1)
   Index Cond: ((send_dt >= '2014-03-13 00:00:00'::timestamp without time
zone) AND (send_dt <= '2015-03-14 00:00:00'::timestamp without time zone))
Total runtime: 604.733 ms


The query I gave you is the smallest query, it is using two views and both
the views I have changed by using subqueries instead of CTEs.  When I join
these two views, it is not getting completed at all.

Explain analyze plan for view  s_v_f_promotion_history_email:
http://explain.depesz.com/s/ure Explain analyze plan for view
s_v_d_campaign_hierarchy : http://explain.depesz.com/s/WxI


Regards,
Vivek






-----Original Message-----
From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
Sent: Tuesday, March 17, 2015 5:15 PM
To: vjoshi@zetainteractive.com; Jim Nasby; Scott Marlowe; Varadharajan
Mukundan
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issues

On 17.3.2015 12:07, Vivekanand Joshi wrote:
> EXPLAIN ANALYZE didn't give result even after three hours.

In that case the only thing you can do is 'slice' the query into smaller
parts (representing subtrees of the plan), and analyze those first. Look for
misestimates (significant differences between estimated and actual row
counts, and very expensive parts).

We can't do that, because we don't have your data or queries, and without
the explain analyze it's difficult to give advices.


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: Performance issues

От
Tomas Vondra
Дата:
Just as I feared, the attached explain analyze results show significant
misestimates, like this for example:

Nested Loop  (cost=32782.19..34504.16 rows=1 width=16)
        (actual time=337.484..884.438 rows=46454 loops=1)

Nested Loop  (cost=18484.94..20366.29 rows=1 width=776)
      (actual time=2445.487..3741.049 rows=45360 loops=1)

Hash Left Join  (cost=34679.90..37396.37 rows=11644 width=148)
          (actual time=609.472..9070.675 rows=4559289 loops=1)

There's plenty of nested loop joins - the optimizer believes there will
be only a few rows in the outer relation, but gets order of magnitude
more tuples. And nested loops are terrible in that case.

In case of the first view, it seems to be caused by this:

Merge Cond:
((s_f_touchpoint_execution_status_history.touchpoint_execution_id =
s_f_touchpoint_execution_status_history_1.touchpoint_ex
ecution_id) AND (s_f_touchpoint_execution_status_history.creation_dt =
(max(s_f_touchpoint_execution_status_history_1.creation_dt))))

especially the ':id = max(:id)' condition is probably giving the
optimizer a hard time. This is a conceptually difficult poblem (i.e.
fixing this at the optimizer level is unlikely to happen any time soon,
because it effectively means you have to predict the statistical
properties of the aggregation).

You may try increasing the statistical target, which makes the stats a
bit more detailed (the default on 9.4 is 100):

    SET default_statistics_target = 10000;
    ANALYZE;

But I don't really believe this might really fix the problem.

But maybe it's possible to rewrite the query somehow?

Let's experiment a bit - remove the aggregation, i.e. join directly to
s_f_touchpoint_execution_status_history. It'll return wrong results, but
the estimates should be better, so let's see what happens.

You may also try disabling nested loops - the other join algorithms
usually perform better with large row counts.

    SET enable_nestloop = false;

This is not a production-suitable solution, but for experimenting that's OK.

ISTM what the aggregation (or the whole mergejoin) does is selecting the
last s_f_touchpoint_execution_status_history record for each
touchpoint_execution_id.

There are better ways to determine that, IMHO. For example:

 (1) adding a 'is_last' flag to s_f_touchpoint_execution_status_history

     This however requires maintaining that flag somehow, but the join
     would not be needed at all.

     The "last IDs" might be maintained in a separate table - the join
     would be still necessary, but it might be less intrusive and
     cheper to maintain.

 (2) using window functions, e.g. like this:

     SELECT * FROM (
       SELECT *,
            ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id
                               ORDER BY FROM max_creation_dt) AS rn
       FROM s_f_touchpoint_execution_status_history
     ) foo WHERE rn = 1

     But estimating this is also rather difficult ...

 (3) Using temporary table / MV - this really depends on your
     requirements, load schedule, how you run the queries etc. It would
     however fix the estimation errors (probably).

The 2nd view seems to suffer because of the same issue (underestimates
leading to choice of nested loops), but caused by something else:

->  Hash Join  (cost=1954.13..6249.67 rows=13 width=108)
         (actual time=31.777..210.346 rows=72670 loops=1)
      Hash Cond: ((tp_exec.touchpoint_id = tp.touchpoint_id)
              AND (wave_exec.wave_id = tp.wave_id))

Estimating cardinality of joins with multi-column conditions is
difficult, no idea how to fix that at the moment.












--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance issues

От
Thomas Kellerer
Дата:
Tomas Vondra schrieb am 17.03.2015 um 14:55:
>  (2) using window functions, e.g. like this:
>
>      SELECT * FROM (
>        SELECT *,
>             ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id
>                                ORDER BY FROM max_creation_dt) AS rn
>        FROM s_f_touchpoint_execution_status_history
>      ) foo WHERE rn = 1
>
>      But estimating this is also rather difficult ...


From my experience rewriting something like the above using DISTINCT ON is usually faster.

e.g.:

select distinct on (touchpoint_execution_id) *
from s_f_touchpoint_execution_status_history
order by touchpoint_execution_id, max_creation_dt;




Re: Performance issues

От
Tomas Vondra
Дата:
On 17.3.2015 15:19, Thomas Kellerer wrote:
> Tomas Vondra schrieb am 17.03.2015 um 14:55:
>>  (2) using window functions, e.g. like this:
>>
>>      SELECT * FROM (
>>        SELECT *,
>>             ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id
>>                                ORDER BY FROM max_creation_dt) AS rn
>>        FROM s_f_touchpoint_execution_status_history
>>      ) foo WHERE rn = 1
>>
>>      But estimating this is also rather difficult ...
>
>
> From my experience rewriting something like the above using DISTINCT
> ON is usually faster.

How do you get the last record (with respect to a timestamp column)
using a DISTINCT ON?


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance issues

От
Vivekanand Joshi
Дата:
The confusion for me here is that :


I am getting results from the view in around 3 seconds
(S_V_D_CAMPAIGN_HIERARCHY) and 25 seconds (S_V_F_PROMOTION_HISTORY_EMAIL)

But when I am using these two views in the query as the joining tables, it
doesn't give any result.  As per my understanding, the planner is making new
plan and that is costly instead of using output from the view, which is
actually understandable.

Is there a way, we can do anything about it?

I hope I am making some sense here.

Regards,
Vivek

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tomas Vondra
Sent: Tuesday, March 17, 2015 8:13 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issues

On 17.3.2015 15:19, Thomas Kellerer wrote:
> Tomas Vondra schrieb am 17.03.2015 um 14:55:
>>  (2) using window functions, e.g. like this:
>>
>>      SELECT * FROM (
>>        SELECT *,
>>             ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id
>>                                ORDER BY FROM max_creation_dt) AS rn
>>        FROM s_f_touchpoint_execution_status_history
>>      ) foo WHERE rn = 1
>>
>>      But estimating this is also rather difficult ...
>
>
> From my experience rewriting something like the above using DISTINCT
> ON is usually faster.

How do you get the last record (with respect to a timestamp column) using a
DISTINCT ON?


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: Performance issues

От
Vivekanand Joshi
Дата:
This is the explain for a simple query:

explain Select * from S_V_F_PROMOTION_HISTORY_EMAIL a inner join
S_V_D_CAMPAIGN_HIERARCHY b on a.touchpoint_execution_id =
b.touchpoint_execution_id;


http://explain.depesz.com/s/gse

I am wondering the total cost here is less even then the result is not
coming out.

Regards,
Vivek

-----Original Message-----
From: Vivekanand Joshi [mailto:vjoshi@zetainteractive.com]
Sent: Tuesday, March 17, 2015 8:40 PM
To: 'Tomas Vondra'; 'pgsql-performance@postgresql.org'
Subject: RE: [PERFORM] Performance issues

The confusion for me here is that :


I am getting results from the view in around 3 seconds
(S_V_D_CAMPAIGN_HIERARCHY) and 25 seconds (S_V_F_PROMOTION_HISTORY_EMAIL)

But when I am using these two views in the query as the joining tables, it
doesn't give any result.  As per my understanding, the planner is making new
plan and that is costly instead of using output from the view, which is
actually understandable.

Is there a way, we can do anything about it?

I hope I am making some sense here.

Regards,
Vivek

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tomas Vondra
Sent: Tuesday, March 17, 2015 8:13 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issues

On 17.3.2015 15:19, Thomas Kellerer wrote:
> Tomas Vondra schrieb am 17.03.2015 um 14:55:
>>  (2) using window functions, e.g. like this:
>>
>>      SELECT * FROM (
>>        SELECT *,
>>             ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id
>>                                ORDER BY FROM max_creation_dt) AS rn
>>        FROM s_f_touchpoint_execution_status_history
>>      ) foo WHERE rn = 1
>>
>>      But estimating this is also rather difficult ...
>
>
> From my experience rewriting something like the above using DISTINCT
> ON is usually faster.

How do you get the last record (with respect to a timestamp column) using a
DISTINCT ON?


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Вложения

Re: Performance issues

От
Thomas Kellerer
Дата:
Tomas Vondra schrieb am 17.03.2015 um 15:43:
> On 17.3.2015 15:19, Thomas Kellerer wrote:
>> Tomas Vondra schrieb am 17.03.2015 um 14:55:
>>>  (2) using window functions, e.g. like this:
>>>
>>>      SELECT * FROM (
>>>        SELECT *,
>>>             ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id
>>>                                ORDER BY FROM max_creation_dt) AS rn
>>>        FROM s_f_touchpoint_execution_status_history
>>>      ) foo WHERE rn = 1
>>>
>>>      But estimating this is also rather difficult ...
>>
>>
>> From my experience rewriting something like the above using DISTINCT
>> ON is usually faster.
>
> How do you get the last record (with respect to a timestamp column)
> using a DISTINCT ON?

You need to use "order by ... desc". See here: http://sqlfiddle.com/#!15/d4846/2

Btw: your row_number() usage wouldn't return the "latest" row either.
It would return the "oldest" row.








Re: Performance issues

От
Tomas Vondra
Дата:
On 17.3.2015 16:10, Vivekanand Joshi wrote:
> The confusion for me here is that :
>
>
> I am getting results from the view in around 3 seconds
> (S_V_D_CAMPAIGN_HIERARCHY) and 25 seconds (S_V_F_PROMOTION_HISTORY_EMAIL)
>
> But when I am using these two views in the query as the joining
> tables, it doesn't give any result. As per my understanding, the
> planner is making new plan and that is costly instead of using
> output from the view, which is actually understandable.

In general, yes. The problem is that the plan is constructed based on
the estimates, and those are very inaccurate in this case.

The planner may do various changes, but let's assume that does not
happen and the plans are executed and and the results are joined.

For example what might happen is this:

   for each row in 's_v_d_campaign_hierarchy' (1 row expected):
      execute s_v_f_promotion_history_email & join (11644 rows exp.)

But then it gets 45k rows from s_v_d_campaign_hierarchy, and ~400x more
rows from s_v_f_promotion_history_email (I'm neglecting the join
condition here, but that's not really significant). Kaboooom!

In reality, the plan is reorganized (e.g. different join order), but the
misestimates are still lurking there.

> Is there a way, we can do anything about it?

Rephrasing the query so that the planner can estimate it more accurately.



--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance issues

От
Tomas Vondra
Дата:
On 17.3.2015 16:24, Thomas Kellerer wrote:
> Tomas Vondra schrieb am 17.03.2015 um 15:43:
>> On 17.3.2015 15:19, Thomas Kellerer wrote:
>>> Tomas Vondra schrieb am 17.03.2015 um 14:55:
>>>>  (2) using window functions, e.g. like this:
>>>>
>>>>      SELECT * FROM (
>>>>        SELECT *,
>>>>             ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id
>>>>                                ORDER BY FROM max_creation_dt) AS rn
>>>>        FROM s_f_touchpoint_execution_status_history
>>>>      ) foo WHERE rn = 1
>>>>
>>>>      But estimating this is also rather difficult ...
>>>
>>>
>>> From my experience rewriting something like the above using DISTINCT
>>> ON is usually faster.
>>
>> How do you get the last record (with respect to a timestamp column)
>> using a DISTINCT ON?
>
> You need to use "order by ... desc". See here: http://sqlfiddle.com/#!15/d4846/2

Nice, thanks!

>
> Btw: your row_number() usage wouldn't return the "latest" row either.
> It would return the "oldest" row.

Oh, right. I forgot the DESC in the window.


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance issues

От
Vivekanand Joshi
Дата:
So, here is the first taste of success and which gives me the confidence
that if properly worked out with a good hardware and proper tuning,
PostgreSQL could be a good replacement.

Out of the 9 reports which needs to be migrated in PostgreSQL, 3 are now
running.

Report 4 was giving an issue and I will see it tomorrow.

Just to inform you guys that, the thing that helped most is setting
enable_nestloops to false worked. Plans are now not miscalculated.

But this is not a production-suitable setting. So what do you think how to
get a work around this?


Regards,
Vivek

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tomas Vondra
Sent: Tuesday, March 17, 2015 9:00 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issues

On 17.3.2015 16:24, Thomas Kellerer wrote:
> Tomas Vondra schrieb am 17.03.2015 um 15:43:
>> On 17.3.2015 15:19, Thomas Kellerer wrote:
>>> Tomas Vondra schrieb am 17.03.2015 um 14:55:
>>>>  (2) using window functions, e.g. like this:
>>>>
>>>>      SELECT * FROM (
>>>>        SELECT *,
>>>>             ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id
>>>>                                ORDER BY FROM max_creation_dt) AS rn
>>>>        FROM s_f_touchpoint_execution_status_history
>>>>      ) foo WHERE rn = 1
>>>>
>>>>      But estimating this is also rather difficult ...
>>>
>>>
>>> From my experience rewriting something like the above using DISTINCT
>>> ON is usually faster.
>>
>> How do you get the last record (with respect to a timestamp column)
>> using a DISTINCT ON?
>
> You need to use "order by ... desc". See here:
> http://sqlfiddle.com/#!15/d4846/2

Nice, thanks!

>
> Btw: your row_number() usage wouldn't return the "latest" row either.
> It would return the "oldest" row.

Oh, right. I forgot the DESC in the window.


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: Performance issues

От
Felipe Santos
Дата:


2015-03-18 14:31 GMT-03:00 Vivekanand Joshi <vjoshi@zetainteractive.com>:
So, here is the first taste of success and which gives me the confidence
that if properly worked out with a good hardware and proper tuning,
PostgreSQL could be a good replacement.

Out of the 9 reports which needs to be migrated in PostgreSQL, 3 are now
running.

Report 4 was giving an issue and I will see it tomorrow.

Just to inform you guys that, the thing that helped most is setting
enable_nestloops to false worked. Plans are now not miscalculated.




Regards,
Vivek

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tomas Vondra
Sent: Tuesday, March 17, 2015 9:00 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issues

On 17.3.2015 16:24, Thomas Kellerer wrote:
> Tomas Vondra schrieb am 17.03.2015 um 15:43:
>> On 17.3.2015 15:19, Thomas Kellerer wrote:
>>> Tomas Vondra schrieb am 17.03.2015 um 14:55:
>>>>  (2) using window functions, e.g. like this:
>>>>
>>>>      SELECT * FROM (
>>>>        SELECT *,
>>>>             ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id
>>>>                                ORDER BY FROM max_creation_dt) AS rn
>>>>        FROM s_f_touchpoint_execution_status_history
>>>>      ) foo WHERE rn = 1
>>>>
>>>>      But estimating this is also rather difficult ...
>>>
>>>
>>> From my experience rewriting something like the above using DISTINCT
>>> ON is usually faster.
>>
>> How do you get the last record (with respect to a timestamp column)
>> using a DISTINCT ON?
>
> You need to use "order by ... desc". See here:
> http://sqlfiddle.com/#!15/d4846/2

Nice, thanks!

>
> Btw: your row_number() usage wouldn't return the "latest" row either.
> It would return the "oldest" row.

Oh, right. I forgot the DESC in the window.


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




"But this is not a production-suitable setting. So what do you think how to
get a work around this?"

What about creating a read-only replica and apply this setting there?

Re: Performance issues

От
Jerry Sievers
Дата:
Vivekanand Joshi <vjoshi@zetainteractive.com> writes:

> So, here is the first taste of success and which gives me the confidence
> that if properly worked out with a good hardware and proper tuning,
> PostgreSQL could be a good replacement.
>
> Out of the 9 reports which needs to be migrated in PostgreSQL, 3 are now
> running.
>
> Report 4 was giving an issue and I will see it tomorrow.
>
> Just to inform you guys that, the thing that helped most is setting
> enable_nestloops to false worked. Plans are now not miscalculated.
>
> But this is not a production-suitable setting. So what do you think how to
> get a work around this?

Consider just disabling that setting for 1 or a few odd queries you have
for which they are known  to plan badly.

begin;
set local enable_nestloops to false;
select ...;
commit/abort;

I'd say never make that sort of setting DB or cluster-wide.


>
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tomas Vondra
> Sent: Tuesday, March 17, 2015 9:00 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Performance issues
>
> On 17.3.2015 16:24, Thomas Kellerer wrote:
>> Tomas Vondra schrieb am 17.03.2015 um 15:43:
>>> On 17.3.2015 15:19, Thomas Kellerer wrote:
>>>> Tomas Vondra schrieb am 17.03.2015 um 14:55:
>>>>>  (2) using window functions, e.g. like this:
>>>>>
>>>>>      SELECT * FROM (
>>>>>        SELECT *,
>>>>>             ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id
>>>>>                                ORDER BY FROM max_creation_dt) AS rn
>>>>>        FROM s_f_touchpoint_execution_status_history
>>>>>      ) foo WHERE rn = 1
>>>>>
>>>>>      But estimating this is also rather difficult ...
>>>>
>>>>
>>>> From my experience rewriting something like the above using DISTINCT
>>>> ON is usually faster.
>>>
>>> How do you get the last record (with respect to a timestamp column)
>>> using a DISTINCT ON?
>>
>> You need to use "order by ... desc". See here:
>> http://sqlfiddle.com/#!15/d4846/2
>
> Nice, thanks!
>
>>
>> Btw: your row_number() usage wouldn't return the "latest" row either.
>> It would return the "oldest" row.
>
> Oh, right. I forgot the DESC in the window.
>
>
> --
> Tomas Vondra                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800


Re: Performance issues

От
Vivekanand Joshi
Дата:
The issue here is that the queries are running inside a Jasper Reports. So
we cannot set this only for a one single query.

We are accessing our reports from a web-browser, which in turn runs the
report from Application Server (Jasper). This server connects to
PostgreSQL server.

Inside a JRXML(Jasper report file) file we cannot set this parameter.

I am attaching a JRXML file for a feel.  You can open this file in
notepad. I don't think we can set server level property in this file. So
how about a workaround?

Vivek



-----Original Message-----
From: Jerry Sievers [mailto:gsievers19@comcast.net]
Sent: Thursday, March 19, 2015 12:06 AM
To: vjoshi@zetainteractive.com
Cc: Tomas Vondra; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issues

Vivekanand Joshi <vjoshi@zetainteractive.com> writes:

> So, here is the first taste of success and which gives me the
> confidence that if properly worked out with a good hardware and proper
> tuning, PostgreSQL could be a good replacement.
>
> Out of the 9 reports which needs to be migrated in PostgreSQL, 3 are
> now running.
>
> Report 4 was giving an issue and I will see it tomorrow.
>
> Just to inform you guys that, the thing that helped most is setting
> enable_nestloops to false worked. Plans are now not miscalculated.
>
> But this is not a production-suitable setting. So what do you think
> how to get a work around this?

Consider just disabling that setting for 1 or a few odd queries you have
for which they are known  to plan badly.

begin;
set local enable_nestloops to false;
select ...;
commit/abort;

I'd say never make that sort of setting DB or cluster-wide.


>
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tomas
> Vondra
> Sent: Tuesday, March 17, 2015 9:00 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Performance issues
>
> On 17.3.2015 16:24, Thomas Kellerer wrote:
>> Tomas Vondra schrieb am 17.03.2015 um 15:43:
>>> On 17.3.2015 15:19, Thomas Kellerer wrote:
>>>> Tomas Vondra schrieb am 17.03.2015 um 14:55:
>>>>>  (2) using window functions, e.g. like this:
>>>>>
>>>>>      SELECT * FROM (
>>>>>        SELECT *,
>>>>>             ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id
>>>>>                                ORDER BY FROM max_creation_dt) AS rn
>>>>>        FROM s_f_touchpoint_execution_status_history
>>>>>      ) foo WHERE rn = 1
>>>>>
>>>>>      But estimating this is also rather difficult ...
>>>>
>>>>
>>>> From my experience rewriting something like the above using
>>>> DISTINCT ON is usually faster.
>>>
>>> How do you get the last record (with respect to a timestamp column)
>>> using a DISTINCT ON?
>>
>> You need to use "order by ... desc". See here:
>> http://sqlfiddle.com/#!15/d4846/2
>
> Nice, thanks!
>
>>
>> Btw: your row_number() usage wouldn't return the "latest" row either.
>> It would return the "oldest" row.
>
> Oh, right. I forgot the DESC in the window.
>
>
> --
> Tomas Vondra                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-performance mailing list
> (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800

Вложения

Re: Performance issues

От
Vitalii Tymchyshyn
Дата:

You can set it for the db user or use stored proc.

Best regards, Vitalii Tymchyshyn

Ср, 18 бер. 2015 14:48 Vivekanand Joshi <vjoshi@zetainteractive.com> пише:
The issue here is that the queries are running inside a Jasper Reports. So
we cannot set this only for a one single query.

We are accessing our reports from a web-browser, which in turn runs the
report from Application Server (Jasper). This server connects to
PostgreSQL server.

Inside a JRXML(Jasper report file) file we cannot set this parameter.

I am attaching a JRXML file for a feel.  You can open this file in
notepad. I don't think we can set server level property in this file. So
how about a workaround?

Vivek



-----Original Message-----
From: Jerry Sievers [mailto:gsievers19@comcast.net]
Sent: Thursday, March 19, 2015 12:06 AM
To: vjoshi@zetainteractive.com
Cc: Tomas Vondra; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issues

Vivekanand Joshi <vjoshi@zetainteractive.com> writes:

> So, here is the first taste of success and which gives me the
> confidence that if properly worked out with a good hardware and proper
> tuning, PostgreSQL could be a good replacement.
>
> Out of the 9 reports which needs to be migrated in PostgreSQL, 3 are
> now running.
>
> Report 4 was giving an issue and I will see it tomorrow.
>
> Just to inform you guys that, the thing that helped most is setting
> enable_nestloops to false worked. Plans are now not miscalculated.
>
> But this is not a production-suitable setting. So what do you think
> how to get a work around this?

Consider just disabling that setting for 1 or a few odd queries you have
for which they are known  to plan badly.

begin;
set local enable_nestloops to false;
select ...;
commit/abort;

I'd say never make that sort of setting DB or cluster-wide.


>
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tomas
> Vondra
> Sent: Tuesday, March 17, 2015 9:00 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Performance issues
>
> On 17.3.2015 16:24, Thomas Kellerer wrote:
>> Tomas Vondra schrieb am 17.03.2015 um 15:43:
>>> On 17.3.2015 15:19, Thomas Kellerer wrote:
>>>> Tomas Vondra schrieb am 17.03.2015 um 14:55:
>>>>>  (2) using window functions, e.g. like this:
>>>>>
>>>>>      SELECT * FROM (
>>>>>        SELECT *,
>>>>>             ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id
>>>>>                                ORDER BY FROM max_creation_dt) AS rn
>>>>>        FROM s_f_touchpoint_execution_status_history
>>>>>      ) foo WHERE rn = 1
>>>>>
>>>>>      But estimating this is also rather difficult ...
>>>>
>>>>
>>>> From my experience rewriting something like the above using
>>>> DISTINCT ON is usually faster.
>>>
>>> How do you get the last record (with respect to a timestamp column)
>>> using a DISTINCT ON?
>>
>> You need to use "order by ... desc". See here:
>> http://sqlfiddle.com/#!15/d4846/2
>
> Nice, thanks!
>
>>
>> Btw: your row_number() usage wouldn't return the "latest" row either.
>> It would return the "oldest" row.
>
> Oh, right. I forgot the DESC in the window.
>
>
> --
> Tomas Vondra                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-performance mailing list
> (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance issues

От
Tomas Vondra
Дата:
Hi,

On 18.3.2015 18:31, Vivekanand Joshi wrote:
> So, here is the first taste of success and which gives me the
> confidence that if properly worked out with a good hardware and
> proper tuning, PostgreSQL could be a good replacement.
>
> Out of the 9 reports which needs to be migrated in PostgreSQL, 3 are
> now running.
>
> Report 4 was giving an issue and I will see it tomorrow.
>
> Just to inform you guys that, the thing that helped most is setting
> enable_nestloops to false worked. Plans are now not miscalculated.

The estimates are still miscalculated, but you're forcing the database
not to use the nested loop. The problem is the nested loop may be
appropriate in some cases (maybe only in a few places of the plan) so
this is really corse-grained solution.

> But this is not a production-suitable setting. So what do you think
> how to get a work around this?

(a) Try to identify why the queries are poorly estimated, and rephrase
    them somehow. This is the best solution, but takes time, expertise
    and may not be feasible in some cases.

(b) Tweak the database structure, possibly introducing intermediate
    tables, materialized views (or tables maintained by triggers - this
    might work for the 'latest record' subquery), etc.

(c) Try to tweak the cost parameters, to make the nested loops more
    expensive (and thus less likely to be selected), but in a more
    gradual way than enable_nestloops=false.

regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance issues

От
Josh Krupka
Дата:
The other approaches of fixing the estimates, cost params, etc are the right way of fixing it.  *However* if you needed a quick fix for just this report and can't find a way of setting it in Jaspersoft for just the report (I don't think it will let you run multiple sql statements by default, maybe not at all) there are still a couple more options.  You can define a new datasource in jasper, point this report to that datasource, and have that new datasource configured to not use the nested loops.  You could do that either by making the new datasource use a different user than everything else, and disable nested loops for that user in postgres, or you could probably have the datasource initialization process disable nested loops.

Re: Performance issues

От
Vivekanand Joshi
Дата:

Any documentation regarding how to configure postgresql.conf file as per individual user?

On 21 Mar 2015 13:10, "Josh Krupka" <jkrupka@gmail.com> wrote:
The other approaches of fixing the estimates, cost params, etc are the right way of fixing it.  *However* if you needed a quick fix for just this report and can't find a way of setting it in Jaspersoft for just the report (I don't think it will let you run multiple sql statements by default, maybe not at all) there are still a couple more options.  You can define a new datasource in jasper, point this report to that datasource, and have that new datasource configured to not use the nested loops.  You could do that either by making the new datasource use a different user than everything else, and disable nested loops for that user in postgres, or you could probably have the datasource initialization process disable nested loops.

Re: Performance issues

От
Tomas Vondra
Дата:
On 22.3.2015 22:50, Vivekanand Joshi wrote:
> Any documentation regarding how to configure postgresql.conf file as per
> individual user?

That can't be done in postgresql.conf, but by ALTER ROLE commands.



--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services