Re: Join runs for > 10 hours and then fills up >1.3TB of disk space
От | kevin kempter |
---|---|
Тема | Re: Join runs for > 10 hours and then fills up >1.3TB of disk space |
Дата | |
Msg-id | 614B0322-C9A7-45B2-9BB0-828075B041BE@kevinkempterllc.com обсуждение исходный текст |
Ответ на | Join runs for > 10 hours and then fills up >1.3TB of disk space (kevin kempter <kevin@kevinkempterllc.com>) |
Список | pgsql-performance |
Sorry I goofed on the query text Here's the correct query: select f14.xpublisher_dim_id, f14.xtime_dim_id, f14.xlocation_dim_id, f14.xreferrer_dim_id, f14.xsite_dim_id, f14.xsystem_cfg_dim_id, f14.xaffiliate_dim_id, f14.customer_id, f14.pf_dts_id, f14.episode_id, f14.sessionid, f14.bytes_received, f14.bytes_transmitted, f14.total_played_time_sec, segdim.xsegment_dim_id as episode_level_segid from bigtab_stats_fact_tmp14 f14, xsegment_dim segdim where f14.customer_id = segdim.customer_srcid and f14.show_id = segdim.show_srcid and f14.season_id = segdim.season_srcid and f14.episode_id = segdim.episode_srcid and segdim.segment_srcid is NULL; On May 16, 2008, at 12:31 AM, kevin kempter wrote: > Hi List; > > I have a table with 9,961,914 rows in it (see the describe of > bigtab_stats_fact_tmp14 below) > > I also have a table with 7,785 rows in it (see the describe of > xsegment_dim below) > > I'm running the join shown below and it takes > 10 hours and > eventually runs out of disk space on a 1.4TB file system > > I've included below a describe of both tables, the join and an > explain plan, any help / suggestions would be much appreciated ! > > I need to get this beast to run as quickly as possible (without > filling up my file system) > > > Thanks in advance... > > > > > > > > > > > > select > f14.xpublisher_dim_id, > f14.xtime_dim_id, > f14.xlocation_dim_id, > f14.xreferrer_dim_id, > f14.xsite_dim_id, > f14.xsystem_cfg_dim_id, > f14.xaffiliate_dim_id, > f14.customer_id, > pf_dts_id, > episode_id, > sessionid, > bytes_received, > bytes_transmitted, > total_played_time_sec, > segdim.xsegment_dim_id as episode_level_segid > from > bigtab_stats_fact_tmp14 f14, > xsegment_dim segdim > where > f14.customer_id = segdim.customer_srcid > and f14.show_id = segdim.show_srcid > and f14.season_id = segdim.season_srcid > and f14.episode_id = segdim.episode_srcid > and segdim.segment_srcid is NULL; > > > > > > > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Merge Join (cost=1757001.74..73569676.49 rows=3191677219 width=118) > Merge Cond: ((segdim.episode_srcid = f14.episode_id) AND > (segdim.customer_srcid = f14.customer_id) AND (segdim.show_srcid = > f14.show_id) AND (segdim.season_srcid = f14.season_id)) > -> Sort (cost=1570.35..1579.46 rows=3643 width=40) > Sort Key: segdim.episode_srcid, segdim.customer_srcid, > segdim.show_srcid, segdim.season_srcid > -> Seq Scan on xsegment_dim segdim (cost=0.00..1354.85 rows=3643 > width=40) > Filter: (segment_srcid IS NULL) > -> Sort (cost=1755323.26..1780227.95 rows=9961874 width=126) > Sort Key: f14.episode_id, f14.customer_id, f14.show_id, f14.season_id > -> Seq Scan on bigtab_stats_fact_tmp14 f14 (cost=0.00..597355.74 > rows=9961874 width=126) > (9 rows) > > > > > > > > > > # \d bigtab_stats_fact_tmp14 > Table "public.bigtab_stats_fact_tmp14" > Column | Type | Modifiers > --------------------------+-----------------------------+----------- > pf_dts_id | bigint | > pf_device_id | bigint | > segment_id | bigint | > cdn_id | bigint | > collector_id | bigint | > digital_envoy_id | bigint | > maxmind_id | bigint | > quova_id | bigint | > website_id | bigint | > referrer_id | bigint | > affiliate_id | bigint | > custom_info_id | bigint | > start_dt | timestamp without time zone | > total_played_time_sec | numeric(18,5) | > bytes_received | bigint | > bytes_transmitted | bigint | > stall_count | integer | > stall_duration_sec | numeric(18,5) | > hiccup_count | integer | > hiccup_duration_sec | numeric(18,5) | > watched_duration_sec | numeric(18,5) | > rewatched_duration_sec | numeric(18,5) | > requested_start_position | numeric(18,5) | > requested_stop_position | numeric(18,5) | > post_position | numeric(18,5) | > is_vod | numeric(1,0) | > sessionid | bigint | > create_dt | timestamp without time zone | > segment_type_id | bigint | > customer_id | bigint | > content_publisher_id | bigint | > content_owner_id | bigint | > episode_id | bigint | > duration_sec | numeric(18,5) | > device_id | bigint | > os_id | bigint | > browser_id | bigint | > cpu_id | bigint | > xsystem_cfg_dim_id | bigint | > xreferrer_dim_id | bigint | > xaffiliate_dim_id | bigint | > xsite_dim_id | bigint | > xpublisher_dim_id | bigint | > season_id | bigint | > show_id | bigint | > xsegment_dim_id | bigint | > location_id | bigint | > zipcode | character varying(20) | > xlocation_dim_id | bigint | > location_srcid | bigint | > timezone | real | > xtime_dim_id | bigint | > Indexes: > "bigtab_stats_fact_tmp14_idx1" btree (customer_id) > "bigtab_stats_fact_tmp14_idx2" btree (show_id) > "bigtab_stats_fact_tmp14_idx3" btree (season_id) > "bigtab_stats_fact_tmp14_idx4" btree (episode_id) > > > > > > > # \d xsegment_dim > Table "public.xsegment_dim" > Column | Type > | Modifiers > ----------------------+----------------------------- > +------------------------------------------------------------- > xsegment_dim_id | bigint | not null default > nextval('xsegment_dim_seq'::regclass) > customer_srcid | bigint | not null > show_srcid | bigint | not null > show_name | character varying(500) | not null > season_srcid | bigint | not null > season_name | character varying(500) | not null > episode_srcid | bigint | not null > episode_name | character varying(500) | not null > segment_type_id | integer | > segment_type | character varying(500) | > segment_srcid | bigint | > segment_name | character varying(500) | > effective_dt | timestamp without time zone | not null > default now() > inactive_dt | timestamp without time zone | > last_update_dt | timestamp without time zone | not null > default now() > Indexes: > "xsegment_dim_pk" PRIMARY KEY, btree (xsegment_dim_id) > "seg1" btree (customer_srcid) > "seg2" btree (show_srcid) > "seg3" btree (season_srcid) > "seg4" btree (episode_srcid) > "seg5" btree (segment_srcid) > "xsegment_dim_ix1" btree (customer_srcid) > > > > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org > ) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
В списке pgsql-performance по дате отправления:
Предыдущее
От: kevin kempterДата:
Сообщение: Re: Join runs for > 10 hours and then fills up >1.3TB of disk space
Следующее
От: Simon RiggsДата:
Сообщение: Re: Join runs for > 10 hours and then fills up >1.3TB of disk space