Re: oracle to psql migration - slow query in postgres

Поиск
Список
Период
Сортировка
От Igor Neyman
Тема Re: oracle to psql migration - slow query in postgres
Дата
Msg-id F4C27E77F7A33E4CA98C19A9DC6722A2069F3459@EXCHANGE.corp.perceptron.com
обсуждение исходный текст
Ответ на oracle to psql migration - slow query in postgres  (Tony Capobianco <tcapobianco@prospectiv.com>)
Ответы Re: oracle to psql migration - slow query in postgres
Список pgsql-performance

> -----Original Message-----
> From: Tony Capobianco [mailto:tcapobianco@prospectiv.com]
> Sent: Thursday, October 14, 2010 3:43 PM
> To: pgsql-performance@postgresql.org
> Subject: oracle to psql migration - slow query in postgres
>
> We are in the process of testing migration of our oracle data
> warehouse over to postgres.  A potential showstopper are full
> table scans on our members table.  We can't function on
> postgres effectively unless index scans are employed.  I'm
> thinking I don't have something set correctly in my
> postgresql.conf file, but I'm not sure what.
>
> This table has approximately 300million rows.
>
> Version:
> SELECT version();
>
> version
> --------------------------------------------------------------
> ----------------------------------------------------
>  PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC
> gcc (GCC)
> 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit
>
> We have 4 quad-core processors and 32GB of RAM.  The below
> query uses the members_sorted_idx_001 index in oracle, but in
> postgres, the optimizer chooses a sequential scan.
>
> explain analyze create table tmp_srcmem_emws1 as select
> emailaddress, websiteid
>   from members
>  where emailok = 1
>    and emailbounced = 0;
>                                                           QUERY
> PLAN
> --------------------------------------------------------------
> ----------------------------------------------------------------
>  Seq Scan on members  (cost=0.00..14137154.64 rows=238177981
> width=29) (actual time=0.052..685834.785 rows=236660930 loops=1)
>    Filter: ((emailok = 1::numeric) AND (emailbounced =
> 0::numeric))  Total runtime: 850306.220 ms
> (3 rows)
>
> show shared_buffers ;
>  shared_buffers
> ----------------
>  7680MB
> (1 row)
>
> show effective_cache_size ;
>  effective_cache_size
> ----------------------
>  22GB
> (1 row)
>
> show work_mem ;
>  work_mem
> ----------
>  768MB
> (1 row)
>
> show enable_seqscan ;
>  enable_seqscan
> ----------------
>  on
> (1 row)
>
> Below are the data definitions for the table/indexes in question:
>
> \d members
>                      Table "members"
>        Column        |            Type             | Modifiers
> ---------------------+-----------------------------+-----------
>  memberid            | numeric                     | not null
>  firstname           | character varying(50)       |
>  lastname            | character varying(50)       |
>  emailaddress        | character varying(50)       |
>  password            | character varying(50)       |
>  address1            | character varying(50)       |
>  address2            | character varying(50)       |
>  city                | character varying(50)       |
>  statecode           | character varying(50)       |
>  zipcode             | character varying(50)       |
>  birthdate           | date                        |
>  emailok             | numeric(2,0)                |
>  gender              | character varying(1)        |
>  addeddate           | timestamp without time zone |
>  emailbounced        | numeric(2,0)                |
>  changedate          | timestamp without time zone |
>  optoutsource        | character varying(100)      |
>  websiteid           | numeric                     |
>  promotionid         | numeric                     |
>  sourceid            | numeric                     |
>  siteid              | character varying(64)       |
>  srcwebsiteid        | numeric                     |
>  homephone           | character varying(20)       |
>  homeareacode        | character varying(10)       |
>  campaignid          | numeric                     |
>  srcmemberid         | numeric                     |
>  optoutdate          | date                        |
>  regcomplete         | numeric(1,0)                |
>  regcompletesourceid | numeric                     |
>  ipaddress           | character varying(25)       |
>  pageid              | numeric                     |
>  streetaddressstatus | numeric(1,0)                |
>  middlename          | character varying(50)       |
>  optinprechecked     | numeric(1,0)                |
>  optinposition       | numeric                     |
>  homephonestatus     | numeric                     |
>  addeddate_id        | numeric                     |
>  changedate_id       | numeric                     |
>  rpmindex            | numeric                     |
>  optmode             | numeric(1,0)                |
>  countryid           | numeric                     |
>  confirmoptin        | numeric(2,0)                |
>  bouncedate          | date                        |
>  memberageid         | numeric                     |
>  sourceid2           | numeric                     |
>  remoteuserid        | character varying(50)       |
>  goal                | numeric(1,0)                |
>  flowdepth           | numeric                     |
>  pagetype            | numeric                     |
>  savepassword        | character varying(50)       |
>  customerprofileid   | numeric                     |
> Indexes:
>     "email_website_unq" UNIQUE, btree (emailaddress,
> websiteid), tablespace "members_idx"
>     "member_addeddateid_idx" btree (addeddate_id), tablespace
> "members_idx"
>     "member_changedateid_idx" btree (changedate_id),
> tablespace "members_idx"
>     "members_fdate_idx" btree
> (to_char_year_month(addeddate)), tablespace "esave_idx"
>     "members_memberid_idx" btree (memberid), tablespace "members_idx"
>     "members_mid_emailok_idx" btree (memberid, emailaddress,
> zipcode, firstname, emailok), tablespace "members_idx"
>     "members_sorted_idx_001" btree (websiteid, emailok,
> emailbounced, addeddate, memberid, zipcode, statecode,
> emailaddress), tablespace "members_idx"
>     "members_src_idx" btree (websiteid, emailbounced,
> sourceid), tablespace "members_idx"
>     "members_wid_idx" btree (websiteid), tablespace "members_idx"
>
> select tablename, indexname, tablespace, indexdef from
> pg_indexes where tablename = 'members'; -[ RECORD
> 1
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | members_fdate_idx
> tablespace | esave_idx
> indexdef   | CREATE INDEX members_fdate_idx ON members USING btree
> (to_char_year_month(addeddate))
> -[ RECORD
> 2
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | member_changedateid_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX member_changedateid_idx ON members
> USING btree
> (changedate_id)
> -[ RECORD
> 3
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | member_addeddateid_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX member_addeddateid_idx ON members
> USING btree
> (addeddate_id)
> -[ RECORD
> 4
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | members_wid_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX members_wid_idx ON members USING btree
> (websiteid)
> -[ RECORD
> 5
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | members_src_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX members_src_idx ON members USING btree
> (websiteid, emailbounced, sourceid)
> -[ RECORD
> 6
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | members_sorted_idx_001
> tablespace | members_idx
> indexdef   | CREATE INDEX members_sorted_idx_001 ON members
> USING btree
> (websiteid, emailok, emailbounced, addeddate, memberid,
> zipcode, statecode, emailaddress) -[ RECORD
> 7
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | members_mid_emailok_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX members_mid_emailok_idx ON members
> USING btree
> (memberid, emailaddress, zipcode, firstname, emailok) -[ RECORD
> 8
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | members_memberid_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX members_memberid_idx ON members USING btree
> (memberid)
> -[ RECORD
> 9
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | email_website_unq
> tablespace | members_idx
> indexdef   | CREATE UNIQUE INDEX email_website_unq ON members USING
> btree (emailaddress, websiteid)
>
>
> This table has also been vacuumed analyzed as well:
>
> select * from pg_stat_all_tables where relname = 'members';
> -[ RECORD 1 ]----+------------------------------
> relid            | 3112786
> schemaname       | xxxxx
> relname          | members
> seq_scan         | 298
> seq_tup_read     | 42791828896
> idx_scan         | 31396925
> idx_tup_fetch    | 1083796963
> n_tup_ins        | 291308316
> n_tup_upd        | 0
> n_tup_del        | 4188020
> n_tup_hot_upd    | 0
> n_live_tup       | 285364632
> n_dead_tup       | 109658
> last_vacuum      | 2010-10-12 20:26:01.227393-04
> last_autovacuum  |
> last_analyze     | 2010-10-12 20:28:01.105656-04
> last_autoanalyze | 2010-09-16 20:50:00.712418-04
>
>


Tony,
For your query:

> select
> emailaddress, websiteid
>   from members
>  where emailok = 1
>    and emailbounced = 0;

your table doesn't have any indexes where "emailok" or "emailbounced"
are leading columns.
That's why existing indexes can not be used.

If you specified "websiteid" in the "where" clause then (most probably)
the index members_sorted_idx_001 will be used (based on selectivity and
statistics known to optimizer).

If this query (as is - without "websiteid") is important for your app,
create another index on (emailok, emailbounced) which should help, of
course if selectivity of your where clause is good enough (not to
perform full table scan).

Regards,
Igor Neyman

В списке pgsql-performance по дате отправления:

Предыдущее
От: Brandon Casci
Дата:
Сообщение: help with understanding EXPLAIN
Следующее
От: Tony Capobianco
Дата:
Сообщение: Re: oracle to psql migration - slow query in postgres