Join runs for > 10 hours and then fills up >1.3TB of disk space
От | kevin kempter |
---|---|
Тема | Join runs for > 10 hours and then fills up >1.3TB of disk space |
Дата | |
Msg-id | CEFBA6FF-0E0A-41A0-959E-7DBBDA296825@kevinkempterllc.com обсуждение исходный текст |
Ответы |
Re: Join runs for > 10 hours and then fills up >1.3TB of disk space
("Claus Guttesen" <kometen@gmail.com>)
Re: Join runs for > 10 hours and then fills up >1.3TB of disk space (kevin kempter <kevin@kevinkempterllc.com>) Re: Join runs for > 10 hours and then fills up >1.3TB of disk space (Simon Riggs <simon@2ndquadrant.com>) Re: Join runs for > 10 hours and then fills up >1.3TB of disk space (Richard Huxton <dev@archonet.com>) Re: Join runs for > 10 hours and then fills up >1.3TB of disk space (Richard Huxton <dev@archonet.com>) |
Список | pgsql-performance |
Hi List; I have a table with 9,961,914 rows in it (see the describe of bigtab_stats_fact_tmp14 below) I also have a table with 7,785 rows in it (see the describe of xsegment_dim below) I'm running the join shown below and it takes > 10 hours and eventually runs out of disk space on a 1.4TB file system I've included below a describe of both tables, the join and an explain plan, any help / suggestions would be much appreciated ! I need to get this beast to run as quickly as possible (without filling up my file system) Thanks in advance... select f14.xpublisher_dim_id, f14.xtime_dim_id, f14.xlocation_dim_id, f14.xreferrer_dim_id, f14.xsite_dim_id, f14.xsystem_cfg_dim_id, f14.xaffiliate_dim_id, f14.customer_id, pf_dts_id, episode_id, sessionid, bytes_received, bytes_transmitted, total_played_time_sec, segdim.xsegment_dim_id as episode_level_segid from bigtab_stats_fact_tmp14 f14, xsegment_dim segdim where f14.customer_id = segdim.customer_srcid and f14.show_id = segdim.show_srcid and f14.season_id = segdim.season_srcid and f14.episode_id = segdim.episode_srcid and segdim.segment_srcid is NULL; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=1757001.74..73569676.49 rows=3191677219 width=118) Merge Cond: ((segdim.episode_srcid = f14.episode_id) AND (segdim.customer_srcid = f14.customer_id) AND (segdim.show_srcid = f14.show_id) AND (segdim.season_srcid = f14.season_id)) -> Sort (cost=1570.35..1579.46 rows=3643 width=40) Sort Key: segdim.episode_srcid, segdim.customer_srcid, segdim.show_srcid, segdim.season_srcid -> Seq Scan on xsegment_dim segdim (cost=0.00..1354.85 rows=3643 width=40) Filter: (segment_srcid IS NULL) -> Sort (cost=1755323.26..1780227.95 rows=9961874 width=126) Sort Key: f14.episode_id, f14.customer_id, f14.show_id, f14.season_id -> Seq Scan on bigtab_stats_fact_tmp14 f14 (cost=0.00..597355.74 rows=9961874 width=126) (9 rows) # \d bigtab_stats_fact_tmp14 Table "public.bigtab_stats_fact_tmp14" Column | Type | Modifiers --------------------------+-----------------------------+----------- pf_dts_id | bigint | pf_device_id | bigint | segment_id | bigint | cdn_id | bigint | collector_id | bigint | digital_envoy_id | bigint | maxmind_id | bigint | quova_id | bigint | website_id | bigint | referrer_id | bigint | affiliate_id | bigint | custom_info_id | bigint | start_dt | timestamp without time zone | total_played_time_sec | numeric(18,5) | bytes_received | bigint | bytes_transmitted | bigint | stall_count | integer | stall_duration_sec | numeric(18,5) | hiccup_count | integer | hiccup_duration_sec | numeric(18,5) | watched_duration_sec | numeric(18,5) | rewatched_duration_sec | numeric(18,5) | requested_start_position | numeric(18,5) | requested_stop_position | numeric(18,5) | post_position | numeric(18,5) | is_vod | numeric(1,0) | sessionid | bigint | create_dt | timestamp without time zone | segment_type_id | bigint | customer_id | bigint | content_publisher_id | bigint | content_owner_id | bigint | episode_id | bigint | duration_sec | numeric(18,5) | device_id | bigint | os_id | bigint | browser_id | bigint | cpu_id | bigint | xsystem_cfg_dim_id | bigint | xreferrer_dim_id | bigint | xaffiliate_dim_id | bigint | xsite_dim_id | bigint | xpublisher_dim_id | bigint | season_id | bigint | show_id | bigint | xsegment_dim_id | bigint | location_id | bigint | zipcode | character varying(20) | xlocation_dim_id | bigint | location_srcid | bigint | timezone | real | xtime_dim_id | bigint | Indexes: "bigtab_stats_fact_tmp14_idx1" btree (customer_id) "bigtab_stats_fact_tmp14_idx2" btree (show_id) "bigtab_stats_fact_tmp14_idx3" btree (season_id) "bigtab_stats_fact_tmp14_idx4" btree (episode_id) # \d xsegment_dim Table "public.xsegment_dim" Column | Type | Modifiers ----------------------+----------------------------- +------------------------------------------------------------- xsegment_dim_id | bigint | not null default nextval('xsegment_dim_seq'::regclass) customer_srcid | bigint | not null show_srcid | bigint | not null show_name | character varying(500) | not null season_srcid | bigint | not null season_name | character varying(500) | not null episode_srcid | bigint | not null episode_name | character varying(500) | not null segment_type_id | integer | segment_type | character varying(500) | segment_srcid | bigint | segment_name | character varying(500) | effective_dt | timestamp without time zone | not null default now() inactive_dt | timestamp without time zone | last_update_dt | timestamp without time zone | not null default now() Indexes: "xsegment_dim_pk" PRIMARY KEY, btree (xsegment_dim_id) "seg1" btree (customer_srcid) "seg2" btree (show_srcid) "seg3" btree (season_srcid) "seg4" btree (episode_srcid) "seg5" btree (segment_srcid) "xsegment_dim_ix1" btree (customer_srcid)
В списке pgsql-performance по дате отправления:
Следующее
От: "Claus Guttesen"Дата:
Сообщение: Re: Join runs for > 10 hours and then fills up >1.3TB of disk space