Обсуждение: oracle to psql migration - slow query in postgres

Поиск
Список
Период
Сортировка

oracle to psql migration - slow query in postgres

От
Tony Capobianco
Дата:
We are in the process of testing migration of our oracle data warehouse
over to postgres.  A potential showstopper are full table scans on our
members table.  We can't function on postgres effectively unless index
scans are employed.  I'm thinking I don't have something set correctly
in my postgresql.conf file, but I'm not sure what.

This table has approximately 300million rows.

Version:
SELECT version();

version
------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit

We have 4 quad-core processors and 32GB of RAM.  The below query uses
the members_sorted_idx_001 index in oracle, but in postgres, the
optimizer chooses a sequential scan.

explain analyze create table tmp_srcmem_emws1
as
select emailaddress, websiteid
  from members
 where emailok = 1
   and emailbounced = 0;
                                                          QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on members  (cost=0.00..14137154.64 rows=238177981 width=29)
(actual time=0.052..685834.785 rows=236660930 loops=1)
   Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric))
 Total runtime: 850306.220 ms
(3 rows)

show shared_buffers ;
 shared_buffers
----------------
 7680MB
(1 row)

show effective_cache_size ;
 effective_cache_size
----------------------
 22GB
(1 row)

show work_mem ;
 work_mem
----------
 768MB
(1 row)

show enable_seqscan ;
 enable_seqscan
----------------
 on
(1 row)

Below are the data definitions for the table/indexes in question:

\d members
                     Table "members"
       Column        |            Type             | Modifiers
---------------------+-----------------------------+-----------
 memberid            | numeric                     | not null
 firstname           | character varying(50)       |
 lastname            | character varying(50)       |
 emailaddress        | character varying(50)       |
 password            | character varying(50)       |
 address1            | character varying(50)       |
 address2            | character varying(50)       |
 city                | character varying(50)       |
 statecode           | character varying(50)       |
 zipcode             | character varying(50)       |
 birthdate           | date                        |
 emailok             | numeric(2,0)                |
 gender              | character varying(1)        |
 addeddate           | timestamp without time zone |
 emailbounced        | numeric(2,0)                |
 changedate          | timestamp without time zone |
 optoutsource        | character varying(100)      |
 websiteid           | numeric                     |
 promotionid         | numeric                     |
 sourceid            | numeric                     |
 siteid              | character varying(64)       |
 srcwebsiteid        | numeric                     |
 homephone           | character varying(20)       |
 homeareacode        | character varying(10)       |
 campaignid          | numeric                     |
 srcmemberid         | numeric                     |
 optoutdate          | date                        |
 regcomplete         | numeric(1,0)                |
 regcompletesourceid | numeric                     |
 ipaddress           | character varying(25)       |
 pageid              | numeric                     |
 streetaddressstatus | numeric(1,0)                |
 middlename          | character varying(50)       |
 optinprechecked     | numeric(1,0)                |
 optinposition       | numeric                     |
 homephonestatus     | numeric                     |
 addeddate_id        | numeric                     |
 changedate_id       | numeric                     |
 rpmindex            | numeric                     |
 optmode             | numeric(1,0)                |
 countryid           | numeric                     |
 confirmoptin        | numeric(2,0)                |
 bouncedate          | date                        |
 memberageid         | numeric                     |
 sourceid2           | numeric                     |
 remoteuserid        | character varying(50)       |
 goal                | numeric(1,0)                |
 flowdepth           | numeric                     |
 pagetype            | numeric                     |
 savepassword        | character varying(50)       |
 customerprofileid   | numeric                     |
Indexes:
    "email_website_unq" UNIQUE, btree (emailaddress, websiteid),
tablespace "members_idx"
    "member_addeddateid_idx" btree (addeddate_id), tablespace
"members_idx"
    "member_changedateid_idx" btree (changedate_id), tablespace
"members_idx"
    "members_fdate_idx" btree (to_char_year_month(addeddate)),
tablespace "esave_idx"
    "members_memberid_idx" btree (memberid), tablespace "members_idx"
    "members_mid_emailok_idx" btree (memberid, emailaddress, zipcode,
firstname, emailok), tablespace "members_idx"
    "members_sorted_idx_001" btree (websiteid, emailok, emailbounced,
addeddate, memberid, zipcode, statecode, emailaddress), tablespace
"members_idx"
    "members_src_idx" btree (websiteid, emailbounced, sourceid),
tablespace "members_idx"
    "members_wid_idx" btree (websiteid), tablespace "members_idx"

select tablename, indexname, tablespace, indexdef from pg_indexes where
tablename = 'members';
-[ RECORD
1
]----------------------------------------------------------------------------------------------------------------------------------------------------
tablename  | members
indexname  | members_fdate_idx
tablespace | esave_idx
indexdef   | CREATE INDEX members_fdate_idx ON members USING btree
(to_char_year_month(addeddate))
-[ RECORD
2
]----------------------------------------------------------------------------------------------------------------------------------------------------
tablename  | members
indexname  | member_changedateid_idx
tablespace | members_idx
indexdef   | CREATE INDEX member_changedateid_idx ON members USING btree
(changedate_id)
-[ RECORD
3
]----------------------------------------------------------------------------------------------------------------------------------------------------
tablename  | members
indexname  | member_addeddateid_idx
tablespace | members_idx
indexdef   | CREATE INDEX member_addeddateid_idx ON members USING btree
(addeddate_id)
-[ RECORD
4
]----------------------------------------------------------------------------------------------------------------------------------------------------
tablename  | members
indexname  | members_wid_idx
tablespace | members_idx
indexdef   | CREATE INDEX members_wid_idx ON members USING btree
(websiteid)
-[ RECORD
5
]----------------------------------------------------------------------------------------------------------------------------------------------------
tablename  | members
indexname  | members_src_idx
tablespace | members_idx
indexdef   | CREATE INDEX members_src_idx ON members USING btree
(websiteid, emailbounced, sourceid)
-[ RECORD
6
]----------------------------------------------------------------------------------------------------------------------------------------------------
tablename  | members
indexname  | members_sorted_idx_001
tablespace | members_idx
indexdef   | CREATE INDEX members_sorted_idx_001 ON members USING btree
(websiteid, emailok, emailbounced, addeddate, memberid, zipcode,
statecode, emailaddress)
-[ RECORD
7
]----------------------------------------------------------------------------------------------------------------------------------------------------
tablename  | members
indexname  | members_mid_emailok_idx
tablespace | members_idx
indexdef   | CREATE INDEX members_mid_emailok_idx ON members USING btree
(memberid, emailaddress, zipcode, firstname, emailok)
-[ RECORD
8
]----------------------------------------------------------------------------------------------------------------------------------------------------
tablename  | members
indexname  | members_memberid_idx
tablespace | members_idx
indexdef   | CREATE INDEX members_memberid_idx ON members USING btree
(memberid)
-[ RECORD
9
]----------------------------------------------------------------------------------------------------------------------------------------------------
tablename  | members
indexname  | email_website_unq
tablespace | members_idx
indexdef   | CREATE UNIQUE INDEX email_website_unq ON members USING
btree (emailaddress, websiteid)


This table has also been vacuumed analyzed as well:

select * from pg_stat_all_tables where relname = 'members';
-[ RECORD 1 ]----+------------------------------
relid            | 3112786
schemaname       | xxxxx
relname          | members
seq_scan         | 298
seq_tup_read     | 42791828896
idx_scan         | 31396925
idx_tup_fetch    | 1083796963
n_tup_ins        | 291308316
n_tup_upd        | 0
n_tup_del        | 4188020
n_tup_hot_upd    | 0
n_live_tup       | 285364632
n_dead_tup       | 109658
last_vacuum      | 2010-10-12 20:26:01.227393-04
last_autovacuum  |
last_analyze     | 2010-10-12 20:28:01.105656-04
last_autoanalyze | 2010-09-16 20:50:00.712418-04



Re: oracle to psql migration - slow query in postgres

От
bricklen
Дата:
On Thu, Oct 14, 2010 at 12:43 PM, Tony Capobianco
<tcapobianco@prospectiv.com> wrote:
> We have 4 quad-core processors and 32GB of RAM.  The below query uses
> the members_sorted_idx_001 index in oracle, but in postgres, the
> optimizer chooses a sequential scan.
>
> explain analyze create table tmp_srcmem_emws1
> as
> select emailaddress, websiteid
>  from members
>  where emailok = 1
>   and emailbounced = 0;


Maybe a couple indexes to try:

create index members_emailok_emailbounced_idx on members (emailok,emailbounced);

or a functional index (will likely be smaller, depending on the
contents of your table):
create index members_emailok_emailbounced_idx on members
(emailok,emailbounced) where emailok = 1 and emailbounced = 0; -- if
you use that combination of 1 and 0 regularly

Re: oracle to psql migration - slow query in postgres

От
Ivan Voras
Дата:
On 10/14/10 21:43, Tony Capobianco wrote:


> We have 4 quad-core processors and 32GB of RAM.  The below query uses
> the members_sorted_idx_001 index in oracle, but in postgres, the
> optimizer chooses a sequential scan.
>
> explain analyze create table tmp_srcmem_emws1
> as
> select emailaddress, websiteid
>    from members
>   where emailok = 1
>     and emailbounced = 0;
>                                                            QUERY
> PLAN
>
------------------------------------------------------------------------------------------------------------------------------
>   Seq Scan on members  (cost=0.00..14137154.64 rows=238177981 width=29)
> (actual time=0.052..685834.785 rows=236660930 loops=1)
>     Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric))
>   Total runtime: 850306.220 ms
> (3 rows)

> Indexes:
>      "email_website_unq" UNIQUE, btree (emailaddress, websiteid),
> tablespace "members_idx"
>      "member_addeddateid_idx" btree (addeddate_id), tablespace
> "members_idx"
>      "member_changedateid_idx" btree (changedate_id), tablespace
> "members_idx"
>      "members_fdate_idx" btree (to_char_year_month(addeddate)),
> tablespace "esave_idx"
>      "members_memberid_idx" btree (memberid), tablespace "members_idx"
>      "members_mid_emailok_idx" btree (memberid, emailaddress, zipcode,
> firstname, emailok), tablespace "members_idx"
>      "members_sorted_idx_001" btree (websiteid, emailok, emailbounced,
> addeddate, memberid, zipcode, statecode, emailaddress), tablespace
> "members_idx"
>      "members_src_idx" btree (websiteid, emailbounced, sourceid),
> tablespace "members_idx"
>      "members_wid_idx" btree (websiteid), tablespace "members_idx"

PostgreSQL doesn't fetch data directly from indexes, so there is no way
for it to reasonably use an index declared like:

"members_sorted_idx_001" btree (websiteid, emailok, emailbounced,
addeddate, memberid, zipcode, statecode, emailaddress)

You need a direct index on the fields you are using in your query, i.e.
an index on (emailok, emailbounced).

OTOH, those columns look boolean-like. It depends on what your data set
is, but if the majority of records contain (emailok=1 and
emailbounced=0) an index may not help you much.

Re: oracle to psql migration - slow query in postgres

От
Cédric Villemain
Дата:
2010/10/14 Tony Capobianco <tcapobianco@prospectiv.com>:
> We are in the process of testing migration of our oracle data warehouse
> over to postgres.  A potential showstopper are full table scans on our
> members table.  We can't function on postgres effectively unless index
> scans are employed.  I'm thinking I don't have something set correctly
> in my postgresql.conf file, but I'm not sure what.
>
> This table has approximately 300million rows.

and your query grab rows=236 660 930 of them. An index might be
useless in this situation.

>
> Version:
> SELECT version();
>
> version
> ------------------------------------------------------------------------------------------------------------------
>  PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
> 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit
>
> We have 4 quad-core processors and 32GB of RAM.  The below query uses
> the members_sorted_idx_001 index in oracle, but in postgres, the
> optimizer chooses a sequential scan.
>
> explain analyze create table tmp_srcmem_emws1
> as
> select emailaddress, websiteid
>  from members
>  where emailok = 1
>   and emailbounced = 0;
>                                                          QUERY
> PLAN
>
------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on members  (cost=0.00..14137154.64 rows=238177981 width=29)
> (actual time=0.052..685834.785 rows=236660930 loops=1)
>   Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric))
>  Total runtime: 850306.220 ms
> (3 rows)
>
> show shared_buffers ;
>  shared_buffers
> ----------------
>  7680MB
> (1 row)
>
> show effective_cache_size ;
>  effective_cache_size
> ----------------------
>  22GB
> (1 row)
>
> show work_mem ;
>  work_mem
> ----------
>  768MB
> (1 row)
>
> show enable_seqscan ;
>  enable_seqscan
> ----------------
>  on
> (1 row)
>
> Below are the data definitions for the table/indexes in question:
>
> \d members
>                     Table "members"
>       Column        |            Type             | Modifiers
> ---------------------+-----------------------------+-----------
>  memberid            | numeric                     | not null
>  firstname           | character varying(50)       |
>  lastname            | character varying(50)       |
>  emailaddress        | character varying(50)       |
>  password            | character varying(50)       |
>  address1            | character varying(50)       |
>  address2            | character varying(50)       |
>  city                | character varying(50)       |
>  statecode           | character varying(50)       |
>  zipcode             | character varying(50)       |
>  birthdate           | date                        |
>  emailok             | numeric(2,0)                |
>  gender              | character varying(1)        |
>  addeddate           | timestamp without time zone |
>  emailbounced        | numeric(2,0)                |
>  changedate          | timestamp without time zone |
>  optoutsource        | character varying(100)      |
>  websiteid           | numeric                     |
>  promotionid         | numeric                     |
>  sourceid            | numeric                     |
>  siteid              | character varying(64)       |
>  srcwebsiteid        | numeric                     |
>  homephone           | character varying(20)       |
>  homeareacode        | character varying(10)       |
>  campaignid          | numeric                     |
>  srcmemberid         | numeric                     |
>  optoutdate          | date                        |
>  regcomplete         | numeric(1,0)                |
>  regcompletesourceid | numeric                     |
>  ipaddress           | character varying(25)       |
>  pageid              | numeric                     |
>  streetaddressstatus | numeric(1,0)                |
>  middlename          | character varying(50)       |
>  optinprechecked     | numeric(1,0)                |
>  optinposition       | numeric                     |
>  homephonestatus     | numeric                     |
>  addeddate_id        | numeric                     |
>  changedate_id       | numeric                     |
>  rpmindex            | numeric                     |
>  optmode             | numeric(1,0)                |
>  countryid           | numeric                     |
>  confirmoptin        | numeric(2,0)                |
>  bouncedate          | date                        |
>  memberageid         | numeric                     |
>  sourceid2           | numeric                     |
>  remoteuserid        | character varying(50)       |
>  goal                | numeric(1,0)                |
>  flowdepth           | numeric                     |
>  pagetype            | numeric                     |
>  savepassword        | character varying(50)       |
>  customerprofileid   | numeric                     |
> Indexes:
>    "email_website_unq" UNIQUE, btree (emailaddress, websiteid),
> tablespace "members_idx"
>    "member_addeddateid_idx" btree (addeddate_id), tablespace
> "members_idx"
>    "member_changedateid_idx" btree (changedate_id), tablespace
> "members_idx"
>    "members_fdate_idx" btree (to_char_year_month(addeddate)),
> tablespace "esave_idx"
>    "members_memberid_idx" btree (memberid), tablespace "members_idx"
>    "members_mid_emailok_idx" btree (memberid, emailaddress, zipcode,
> firstname, emailok), tablespace "members_idx"
>    "members_sorted_idx_001" btree (websiteid, emailok, emailbounced,
> addeddate, memberid, zipcode, statecode, emailaddress), tablespace
> "members_idx"
>    "members_src_idx" btree (websiteid, emailbounced, sourceid),
> tablespace "members_idx"
>    "members_wid_idx" btree (websiteid), tablespace "members_idx"
>
> select tablename, indexname, tablespace, indexdef from pg_indexes where
> tablename = 'members';
> -[ RECORD
> 1
]----------------------------------------------------------------------------------------------------------------------------------------------------
> tablename  | members
> indexname  | members_fdate_idx
> tablespace | esave_idx
> indexdef   | CREATE INDEX members_fdate_idx ON members USING btree
> (to_char_year_month(addeddate))
> -[ RECORD
> 2
]----------------------------------------------------------------------------------------------------------------------------------------------------
> tablename  | members
> indexname  | member_changedateid_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX member_changedateid_idx ON members USING btree
> (changedate_id)
> -[ RECORD
> 3
]----------------------------------------------------------------------------------------------------------------------------------------------------
> tablename  | members
> indexname  | member_addeddateid_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX member_addeddateid_idx ON members USING btree
> (addeddate_id)
> -[ RECORD
> 4
]----------------------------------------------------------------------------------------------------------------------------------------------------
> tablename  | members
> indexname  | members_wid_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX members_wid_idx ON members USING btree
> (websiteid)
> -[ RECORD
> 5
]----------------------------------------------------------------------------------------------------------------------------------------------------
> tablename  | members
> indexname  | members_src_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX members_src_idx ON members USING btree
> (websiteid, emailbounced, sourceid)
> -[ RECORD
> 6
]----------------------------------------------------------------------------------------------------------------------------------------------------
> tablename  | members
> indexname  | members_sorted_idx_001
> tablespace | members_idx
> indexdef   | CREATE INDEX members_sorted_idx_001 ON members USING btree
> (websiteid, emailok, emailbounced, addeddate, memberid, zipcode,
> statecode, emailaddress)
> -[ RECORD
> 7
]----------------------------------------------------------------------------------------------------------------------------------------------------
> tablename  | members
> indexname  | members_mid_emailok_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX members_mid_emailok_idx ON members USING btree
> (memberid, emailaddress, zipcode, firstname, emailok)
> -[ RECORD
> 8
]----------------------------------------------------------------------------------------------------------------------------------------------------
> tablename  | members
> indexname  | members_memberid_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX members_memberid_idx ON members USING btree
> (memberid)
> -[ RECORD
> 9
]----------------------------------------------------------------------------------------------------------------------------------------------------
> tablename  | members
> indexname  | email_website_unq
> tablespace | members_idx
> indexdef   | CREATE UNIQUE INDEX email_website_unq ON members USING
> btree (emailaddress, websiteid)
>
>
> This table has also been vacuumed analyzed as well:
>
> select * from pg_stat_all_tables where relname = 'members';
> -[ RECORD 1 ]----+------------------------------
> relid            | 3112786
> schemaname       | xxxxx
> relname          | members
> seq_scan         | 298
> seq_tup_read     | 42791828896
> idx_scan         | 31396925
> idx_tup_fetch    | 1083796963
> n_tup_ins        | 291308316
> n_tup_upd        | 0
> n_tup_del        | 4188020
> n_tup_hot_upd    | 0
> n_live_tup       | 285364632
> n_dead_tup       | 109658
> last_vacuum      | 2010-10-12 20:26:01.227393-04
> last_autovacuum  |
> last_analyze     | 2010-10-12 20:28:01.105656-04
> last_autoanalyze | 2010-09-16 20:50:00.712418-04
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: oracle to psql migration - slow query in postgres

От
"Pierre C"
Дата:
>  emailok             | numeric(2,0)                |

Note that NUMERIC is meant for
- really large numbers with lots of digits
- or controlled precision and rounding (ie, order total isn't
99.999999999999 $)

Accordingly, NUMERIC is a lot slower in all operations, and uses a lot
more space, than all the other numeric types.

I see many columns in your table that are declared as NUMERIC but should
be BOOLs, or SMALLINTs, or INTs, or BIGINTs.

Perhaps Oracle handles these differently, I dunno.

Re: oracle to psql migration - slow query in postgres

От
Jon Nelson
Дата:
Just my take on this.

The first thing I'd do is think real hard about whether you really
really want 'numeric' instead of boolean, smallint, or integer.  The
second thing is that none of your indices (which specify a whole bunch
of fields, by the way) have only just emailok, emailbounced, or only
the pair of them. Without knowing the needs of your app, I would
reconsider your index choices and go with fewer columns per index.

For this particular query I would think either two indexes (depending
on the cardinality of the data, one for each of emailok, emailbounced)
or one index (containing both emailok, emailbounced) would make quite
a bit of difference. Consider creating the indexes using a WITH
clause, for example:

CREATE INDEX members_just_an_example_idx ON members (emailok,
emailbounced) WHERE emailok = 1 AND emailbounced = 0;

Obviously that index is only useful in situations where both fields
are specified with those values. Furthermore, if the result is such
that a very high percentage of the table has those conditions a
sequential scan is going to be cheaper, anyway.

--
Jon

Re: oracle to psql migration - slow query in postgres

От
Mladen Gogala
Дата:
  On 10/14/2010 4:10 PM, Jon Nelson wrote:
> The first thing I'd do is think real hard about whether you really
> really want 'numeric' instead of boolean, smallint, or integer.  The
> second thing is that none of your indices (which specify a whole bunch
> of fields, by the way) have only just emailok, emailbounced, or only
> the pair of them. Without knowing the needs of your app, I would
> reconsider your index choices and go with fewer columns per index.
>
Also, make sure that the statistics is good, that histograms are large
enough and that Geico (the genetic query optimizer) will really work
hard to save you 15% or more on the query execution time. You can also
make sure that any index existing index is used,  by disabling the
sequential scan and then activating and de-activating indexes with the
dummy expressions,  just as it was done with Oracle's rule based optimizer.
I agree that a good data model is even more crucial for Postgres than is
the case with Oracle. Oracle, because of its rich assortment of tweaking
& hacking tools and parameters, can be made to perform, even if the
model is designed by someone who didn't apply the rules of good design.
Postgres is much more susceptible to bad models and it is much harder to
work around a badly designed model in Postgres than in Oracle. What
people do not understand is that every application in the world will
become badly designed after years of maintenance, adding columns,
creating additional indexes, views, tables and triggers and than
deploying various tools to design applications.  As noted by Murphy,
things develop from bad to worse. Keep Postgres models simple and
separated, because it's much easier to keep clearly defined models
simple and effective than to keep models with 700 tables and 350 views,
frequently with conflicting names, different columns named the same and
same columns named differently. And monitor, monitor, monitor. Use
strace, ltrace,  pgstatspack, auto_explain, pgfouine, pgadmin, top, sar,
iostat and all tools you can get hold of. Without the event interface,
it's frequently a guessing game.  It is, however, possible to manage
things.  If working with partitioning, be very aware that PostgreSQL
optimizer has certain problems with partitions, especially with group
functions. If you want speed, everything must be prefixed with
partitioning column: indexes, expressions, joins. There is no explicit
star schema and creating hash indexes will not buy you much, as a matter
of fact, Postgres community is extremely suspicious of the hash indexes
and I don't see them widely used.
Having said that, I was able to solve the problems with my speed and
partitioning.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: oracle to psql migration - slow query in postgres

От
Samuel Gendler
Дата:


On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala <mladen.gogala@vmsinfo.com> wrote:
 If working with partitioning, be very aware that PostgreSQL optimizer has certain problems with partitions, especially with group functions. If you want speed, everything must be prefixed with partitioning column: indexes, expressions, joins. There is no explicit star schema and creating hash indexes will not buy you much, as a matter of fact, Postgres community is extremely suspicious of the hash indexes and I don't see them widely used.
Having said that, I was able to solve the problems with my speed and partitioning.


Could you elaborate on this, please? What do you mean by 'everythin must be prefixed with partitioning column?'

--sam

Re: oracle to psql migration - slow query in postgres

От
Mladen Gogala
Дата:
Samuel Gendler wrote:
>
>
> On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala
> <mladen.gogala@vmsinfo.com <mailto:mladen.gogala@vmsinfo.com>> wrote:
>
>      If working with partitioning, be very aware that PostgreSQL
>     optimizer has certain problems with partitions, especially with
>     group functions. If you want speed, everything must be prefixed
>     with partitioning column: indexes, expressions, joins. There is no
>     explicit star schema and creating hash indexes will not buy you
>     much, as a matter of fact, Postgres community is extremely
>     suspicious of the hash indexes and I don't see them widely used.
>     Having said that, I was able to solve the problems with my speed
>     and partitioning.
>
>
> Could you elaborate on this, please? What do you mean by 'everythin
> must be prefixed with partitioning column?'
>
> --sam
If you have partitioned table part_tab, partitioned on the column
item_date and if there is a global primary key in Oracle, let's call it
item_id, then queries like "select * from part_tab where item_id=12345"
will perform worse than queries with item_date"

select * from part_tab where item_id=12345 and item_date='2010-10-15'

This also applies to inserts and updates. Strictly speaking, the
item_date column in the query above is not necessary, after all, the
item_id column is the primary key. However, with range scans you will
get much better results if you include the item_date column than if you
use combination of columns without. The term "prefixed indexes" is
borrowed from Oracle RDBMS and means that the beginning column in the
index is the column on which the table is partitioned. Oracle, as
opposed to Postgres, has global indexes, the indexes that span all
partitions. PostgreSQL only maintains indexes on each of the partitions
separately.  Oracle calls such indexes "local indexes" and defines them
on the partitioned table level. Here is a brief and rather succinct
explanation of the terminology:

http://www.oracle-base.com/articles/8i/PartitionedTablesAndIndexes.php


Of, course, there are other differences between Oracle partitioning and
PostgreSQL partitioning. The main difference is $10000/CPU.
I am talking from experience:

news=> \d moreover_documents
              Table "moreover.moreover_documents"
        Column        |            Type             | Modifiers
----------------------+-----------------------------+-----------
 document_id          | bigint                      | not null
 dre_reference        | bigint                      | not null
 headline             | character varying(4000)     |
 author               | character varying(200)      |
 url                  | character varying(1000)     |
 rank                 | bigint                      |
 content              | text                        |
 stories_like_this    | character varying(1000)     |
 internet_web_site_id | bigint                      | not null
 harvest_time         | timestamp without time zone |
 valid_time           | timestamp without time zone |
 keyword              | character varying(200)      |
 article_id           | bigint                      | not null
 media_type           | character varying(20)       |
 source_type          | character varying(20)       |
 created_at           | timestamp without time zone |
 autonomy_fed_at      | timestamp without time zone |
 language             | character varying(150)      |
Indexes:
    "moreover_documents_pkey" PRIMARY KEY, btree (document_id)
Triggers:
    insert_moreover_trigger BEFORE INSERT ON moreover_documents FOR EACH
ROW EXE
CUTE PROCEDURE moreover_insert_trgfn()
Number of child tables: 8 (Use \d+ to list them.)

The child tables are, of course, partitions.

Here is the original:


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> desc moreover_documents
 Name                       Null?    Type
 ----------------------------------------- --------
----------------------------
 DOCUMENT#                   NOT NULL NUMBER
 DRE_REFERENCE                   NOT NULL NUMBER
 HEADLINE                        VARCHAR2(4000)
 AUTHOR                         VARCHAR2(200)
 URL                            VARCHAR2(1000)
 RANK                            NUMBER
 CONTENT                        CLOB
 STORIES_LIKE_THIS                    VARCHAR2(1000)
 INTERNET_WEB_SITE#               NOT NULL NUMBER
 HARVEST_TIME                        DATE
 VALID_TIME                        DATE
 KEYWORD                        VARCHAR2(200)
 ARTICLE_ID                   NOT NULL NUMBER
 MEDIA_TYPE                        VARCHAR2(20)
 CREATED_AT                        DATE
 SOURCE_TYPE                        VARCHAR2(50)
 PUBLISH_DATE                        DATE
 AUTONOMY_FED_AT                    DATE
 LANGUAGE                        VARCHAR2(150)

SQL>



I must say that it took me some time to get things right.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: oracle to psql migration - slow query in postgres

От
Merlin Moncure
Дата:
On Thu, Oct 14, 2010 at 3:43 PM, Tony Capobianco
<tcapobianco@prospectiv.com> wrote:
> explain analyze create table tmp_srcmem_emws1
> as
> select emailaddress, websiteid
>  from members
>  where emailok = 1
>   and emailbounced = 0;

*) as others have noted, none of your indexes will back this
expression.  For an index to match properly the index must have all
the fields matched in the 'where' clause in left to right order.  you
could rearrange indexes you already have and probably get things to
work properly.

*) If you want things to go really fast, and the combination of
emailok, emailbounced is a small percentage (say, less than 5) in the
table, and you are not interested in the schema level changes your
table is screaming, and the (1,0) combination is what you want to
frequently match and you should consider:

create function email_interesting(ok numeric, bounced numeric) returns bool as
$$
  select $1 = 1 and $2 = 0;
$$ language sql immutable;

create function members_email_interesting_idx on
  members(email_interesting(emailok, emailbounced)) where email_interesting();

This will build a partial index which you can query via:
select emailaddress, websiteid
 from members
 where email_interesting(emailok, emailbounced);

merlin

Re: oracle to psql migration - slow query in postgres

От
Tony Capobianco
Дата:
The recommendations on the numeric columns are fantastic.  Thank you
very much.  We will revisit our methods of assigning datatypes when we
migrate our data over from Oracle.
Regarding the full table scans; it appears inevitable that full table
scans are necessary for the volume of data involved and the present
design of our indexes.  Over time, indexes were added/removed to satisfy
particular functionality.  Considering this is our most important table,
I will research exactly how this table is queried to better
optimize/reorganize our indexes.

Thanks for your help.
Tony


On Thu, 2010-10-14 at 23:59 -0400, Mladen Gogala wrote:
> On 10/14/2010 4:10 PM, Jon Nelson wrote:
> > The first thing I'd do is think real hard about whether you really
> > really want 'numeric' instead of boolean, smallint, or integer.  The
> > second thing is that none of your indices (which specify a whole bunch
> > of fields, by the way) have only just emailok, emailbounced, or only
> > the pair of them. Without knowing the needs of your app, I would
> > reconsider your index choices and go with fewer columns per index.
> >
> Also, make sure that the statistics is good, that histograms are large
> enough and that Geico (the genetic query optimizer) will really work
> hard to save you 15% or more on the query execution time. You can also
> make sure that any index existing index is used,  by disabling the
> sequential scan and then activating and de-activating indexes with the
> dummy expressions,  just as it was done with Oracle's rule based optimizer.
> I agree that a good data model is even more crucial for Postgres than is
> the case with Oracle. Oracle, because of its rich assortment of tweaking
> & hacking tools and parameters, can be made to perform, even if the
> model is designed by someone who didn't apply the rules of good design.
> Postgres is much more susceptible to bad models and it is much harder to
> work around a badly designed model in Postgres than in Oracle. What
> people do not understand is that every application in the world will
> become badly designed after years of maintenance, adding columns,
> creating additional indexes, views, tables and triggers and than
> deploying various tools to design applications.  As noted by Murphy,
> things develop from bad to worse. Keep Postgres models simple and
> separated, because it's much easier to keep clearly defined models
> simple and effective than to keep models with 700 tables and 350 views,
> frequently with conflicting names, different columns named the same and
> same columns named differently. And monitor, monitor, monitor. Use
> strace, ltrace,  pgstatspack, auto_explain, pgfouine, pgadmin, top, sar,
> iostat and all tools you can get hold of. Without the event interface,
> it's frequently a guessing game.  It is, however, possible to manage
> things.  If working with partitioning, be very aware that PostgreSQL
> optimizer has certain problems with partitions, especially with group
> functions. If you want speed, everything must be prefixed with
> partitioning column: indexes, expressions, joins. There is no explicit
> star schema and creating hash indexes will not buy you much, as a matter
> of fact, Postgres community is extremely suspicious of the hash indexes
> and I don't see them widely used.
> Having said that, I was able to solve the problems with my speed and
> partitioning.
>
> --
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> www.vmsinfo.com
>
>



Re: oracle to psql migration - slow query in postgres

От
Maciek Sakrejda
Дата:
>> This table has approximately 300million rows.
>
> and your query grab rows=236 660 930 of them. An index might be
> useless in this situation.

I want to point out that this is probably the most important comment
here. A couple of people have noted out that the index won't work for
this query, but more importantly, an index is (probably) not desirable
for this query. As an analogy (since everyone loves half-baked
programming analogies), if you want to find a couple of bakeries to
sponsor your MySQL Data Integrity Issues Awareness Walk by donating
scones, you use the yellow pages. If you want to hit up every business
in the area to donate whatever they can, you're better off canvasing
the neighborhood.
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: oracle to psql migration - slow query in postgres

От
"Igor Neyman"
Дата:

> -----Original Message-----
> From: Tony Capobianco [mailto:tcapobianco@prospectiv.com]
> Sent: Thursday, October 14, 2010 3:43 PM
> To: pgsql-performance@postgresql.org
> Subject: oracle to psql migration - slow query in postgres
>
> We are in the process of testing migration of our oracle data
> warehouse over to postgres.  A potential showstopper are full
> table scans on our members table.  We can't function on
> postgres effectively unless index scans are employed.  I'm
> thinking I don't have something set correctly in my
> postgresql.conf file, but I'm not sure what.
>
> This table has approximately 300million rows.
>
> Version:
> SELECT version();
>
> version
> --------------------------------------------------------------
> ----------------------------------------------------
>  PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC
> gcc (GCC)
> 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit
>
> We have 4 quad-core processors and 32GB of RAM.  The below
> query uses the members_sorted_idx_001 index in oracle, but in
> postgres, the optimizer chooses a sequential scan.
>
> explain analyze create table tmp_srcmem_emws1 as select
> emailaddress, websiteid
>   from members
>  where emailok = 1
>    and emailbounced = 0;
>                                                           QUERY
> PLAN
> --------------------------------------------------------------
> ----------------------------------------------------------------
>  Seq Scan on members  (cost=0.00..14137154.64 rows=238177981
> width=29) (actual time=0.052..685834.785 rows=236660930 loops=1)
>    Filter: ((emailok = 1::numeric) AND (emailbounced =
> 0::numeric))  Total runtime: 850306.220 ms
> (3 rows)
>
> show shared_buffers ;
>  shared_buffers
> ----------------
>  7680MB
> (1 row)
>
> show effective_cache_size ;
>  effective_cache_size
> ----------------------
>  22GB
> (1 row)
>
> show work_mem ;
>  work_mem
> ----------
>  768MB
> (1 row)
>
> show enable_seqscan ;
>  enable_seqscan
> ----------------
>  on
> (1 row)
>
> Below are the data definitions for the table/indexes in question:
>
> \d members
>                      Table "members"
>        Column        |            Type             | Modifiers
> ---------------------+-----------------------------+-----------
>  memberid            | numeric                     | not null
>  firstname           | character varying(50)       |
>  lastname            | character varying(50)       |
>  emailaddress        | character varying(50)       |
>  password            | character varying(50)       |
>  address1            | character varying(50)       |
>  address2            | character varying(50)       |
>  city                | character varying(50)       |
>  statecode           | character varying(50)       |
>  zipcode             | character varying(50)       |
>  birthdate           | date                        |
>  emailok             | numeric(2,0)                |
>  gender              | character varying(1)        |
>  addeddate           | timestamp without time zone |
>  emailbounced        | numeric(2,0)                |
>  changedate          | timestamp without time zone |
>  optoutsource        | character varying(100)      |
>  websiteid           | numeric                     |
>  promotionid         | numeric                     |
>  sourceid            | numeric                     |
>  siteid              | character varying(64)       |
>  srcwebsiteid        | numeric                     |
>  homephone           | character varying(20)       |
>  homeareacode        | character varying(10)       |
>  campaignid          | numeric                     |
>  srcmemberid         | numeric                     |
>  optoutdate          | date                        |
>  regcomplete         | numeric(1,0)                |
>  regcompletesourceid | numeric                     |
>  ipaddress           | character varying(25)       |
>  pageid              | numeric                     |
>  streetaddressstatus | numeric(1,0)                |
>  middlename          | character varying(50)       |
>  optinprechecked     | numeric(1,0)                |
>  optinposition       | numeric                     |
>  homephonestatus     | numeric                     |
>  addeddate_id        | numeric                     |
>  changedate_id       | numeric                     |
>  rpmindex            | numeric                     |
>  optmode             | numeric(1,0)                |
>  countryid           | numeric                     |
>  confirmoptin        | numeric(2,0)                |
>  bouncedate          | date                        |
>  memberageid         | numeric                     |
>  sourceid2           | numeric                     |
>  remoteuserid        | character varying(50)       |
>  goal                | numeric(1,0)                |
>  flowdepth           | numeric                     |
>  pagetype            | numeric                     |
>  savepassword        | character varying(50)       |
>  customerprofileid   | numeric                     |
> Indexes:
>     "email_website_unq" UNIQUE, btree (emailaddress,
> websiteid), tablespace "members_idx"
>     "member_addeddateid_idx" btree (addeddate_id), tablespace
> "members_idx"
>     "member_changedateid_idx" btree (changedate_id),
> tablespace "members_idx"
>     "members_fdate_idx" btree
> (to_char_year_month(addeddate)), tablespace "esave_idx"
>     "members_memberid_idx" btree (memberid), tablespace "members_idx"
>     "members_mid_emailok_idx" btree (memberid, emailaddress,
> zipcode, firstname, emailok), tablespace "members_idx"
>     "members_sorted_idx_001" btree (websiteid, emailok,
> emailbounced, addeddate, memberid, zipcode, statecode,
> emailaddress), tablespace "members_idx"
>     "members_src_idx" btree (websiteid, emailbounced,
> sourceid), tablespace "members_idx"
>     "members_wid_idx" btree (websiteid), tablespace "members_idx"
>
> select tablename, indexname, tablespace, indexdef from
> pg_indexes where tablename = 'members'; -[ RECORD
> 1
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | members_fdate_idx
> tablespace | esave_idx
> indexdef   | CREATE INDEX members_fdate_idx ON members USING btree
> (to_char_year_month(addeddate))
> -[ RECORD
> 2
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | member_changedateid_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX member_changedateid_idx ON members
> USING btree
> (changedate_id)
> -[ RECORD
> 3
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | member_addeddateid_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX member_addeddateid_idx ON members
> USING btree
> (addeddate_id)
> -[ RECORD
> 4
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | members_wid_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX members_wid_idx ON members USING btree
> (websiteid)
> -[ RECORD
> 5
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | members_src_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX members_src_idx ON members USING btree
> (websiteid, emailbounced, sourceid)
> -[ RECORD
> 6
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | members_sorted_idx_001
> tablespace | members_idx
> indexdef   | CREATE INDEX members_sorted_idx_001 ON members
> USING btree
> (websiteid, emailok, emailbounced, addeddate, memberid,
> zipcode, statecode, emailaddress) -[ RECORD
> 7
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | members_mid_emailok_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX members_mid_emailok_idx ON members
> USING btree
> (memberid, emailaddress, zipcode, firstname, emailok) -[ RECORD
> 8
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | members_memberid_idx
> tablespace | members_idx
> indexdef   | CREATE INDEX members_memberid_idx ON members USING btree
> (memberid)
> -[ RECORD
> 9
> ]-------------------------------------------------------------
> --------------------------------------------------------------
> -------------------------
> tablename  | members
> indexname  | email_website_unq
> tablespace | members_idx
> indexdef   | CREATE UNIQUE INDEX email_website_unq ON members USING
> btree (emailaddress, websiteid)
>
>
> This table has also been vacuumed analyzed as well:
>
> select * from pg_stat_all_tables where relname = 'members';
> -[ RECORD 1 ]----+------------------------------
> relid            | 3112786
> schemaname       | xxxxx
> relname          | members
> seq_scan         | 298
> seq_tup_read     | 42791828896
> idx_scan         | 31396925
> idx_tup_fetch    | 1083796963
> n_tup_ins        | 291308316
> n_tup_upd        | 0
> n_tup_del        | 4188020
> n_tup_hot_upd    | 0
> n_live_tup       | 285364632
> n_dead_tup       | 109658
> last_vacuum      | 2010-10-12 20:26:01.227393-04
> last_autovacuum  |
> last_analyze     | 2010-10-12 20:28:01.105656-04
> last_autoanalyze | 2010-09-16 20:50:00.712418-04
>
>


Tony,
For your query:

> select
> emailaddress, websiteid
>   from members
>  where emailok = 1
>    and emailbounced = 0;

your table doesn't have any indexes where "emailok" or "emailbounced"
are leading columns.
That's why existing indexes can not be used.

If you specified "websiteid" in the "where" clause then (most probably)
the index members_sorted_idx_001 will be used (based on selectivity and
statistics known to optimizer).

If this query (as is - without "websiteid") is important for your app,
create another index on (emailok, emailbounced) which should help, of
course if selectivity of your where clause is good enough (not to
perform full table scan).

Regards,
Igor Neyman

Re: oracle to psql migration - slow query in postgres

От
Tony Capobianco
Дата:
Thanks for all your responses. What's interesting is that an index is
used when this query is executed in Oracle.  It appears to do some
parallel processing:

SQL> set line 200
delete from plan_table;
explain plan for
select websiteid, emailaddress
  from members
 where emailok = 1
   and emailbounced = 0;

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
SQL>
3 rows deleted.

SQL>   2    3    4    5
Explained.

SQL> SQL>
PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4247959398


-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                   | Rows  | Bytes
| Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |

-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                        |   237M|
7248M|   469K  (2)| 01:49:33 |        |      |            |
|   1 |  PX COORDINATOR         |                        |       |
|            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000               |   237M|
7248M|   469K  (2)| 01:49:33 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR    |                        |   237M|
7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWC |            |
|*  4 |     INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 |   237M|
7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWP |            |

-------------------------------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1)

16 rows selected.


On Fri, 2010-10-15 at 13:43 -0400, Igor Neyman wrote:
>
> > -----Original Message-----
> > From: Tony Capobianco [mailto:tcapobianco@prospectiv.com]
> > Sent: Thursday, October 14, 2010 3:43 PM
> > To: pgsql-performance@postgresql.org
> > Subject: oracle to psql migration - slow query in postgres
> >
> > We are in the process of testing migration of our oracle data
> > warehouse over to postgres.  A potential showstopper are full
> > table scans on our members table.  We can't function on
> > postgres effectively unless index scans are employed.  I'm
> > thinking I don't have something set correctly in my
> > postgresql.conf file, but I'm not sure what.
> >
> > This table has approximately 300million rows.
> >
> > Version:
> > SELECT version();
> >
> > version
> > --------------------------------------------------------------
> > ----------------------------------------------------
> >  PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC
> > gcc (GCC)
> > 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit
> >
> > We have 4 quad-core processors and 32GB of RAM.  The below
> > query uses the members_sorted_idx_001 index in oracle, but in
> > postgres, the optimizer chooses a sequential scan.
> >
> > explain analyze create table tmp_srcmem_emws1 as select
> > emailaddress, websiteid
> >   from members
> >  where emailok = 1
> >    and emailbounced = 0;
> >                                                           QUERY
> > PLAN
> > --------------------------------------------------------------
> > ----------------------------------------------------------------
> >  Seq Scan on members  (cost=0.00..14137154.64 rows=238177981
> > width=29) (actual time=0.052..685834.785 rows=236660930 loops=1)
> >    Filter: ((emailok = 1::numeric) AND (emailbounced =
> > 0::numeric))  Total runtime: 850306.220 ms
> > (3 rows)
> >
> > show shared_buffers ;
> >  shared_buffers
> > ----------------
> >  7680MB
> > (1 row)
> >
> > show effective_cache_size ;
> >  effective_cache_size
> > ----------------------
> >  22GB
> > (1 row)
> >
> > show work_mem ;
> >  work_mem
> > ----------
> >  768MB
> > (1 row)
> >
> > show enable_seqscan ;
> >  enable_seqscan
> > ----------------
> >  on
> > (1 row)
> >
> > Below are the data definitions for the table/indexes in question:
> >
> > \d members
> >                      Table "members"
> >        Column        |            Type             | Modifiers
> > ---------------------+-----------------------------+-----------
> >  memberid            | numeric                     | not null
> >  firstname           | character varying(50)       |
> >  lastname            | character varying(50)       |
> >  emailaddress        | character varying(50)       |
> >  password            | character varying(50)       |
> >  address1            | character varying(50)       |
> >  address2            | character varying(50)       |
> >  city                | character varying(50)       |
> >  statecode           | character varying(50)       |
> >  zipcode             | character varying(50)       |
> >  birthdate           | date                        |
> >  emailok             | numeric(2,0)                |
> >  gender              | character varying(1)        |
> >  addeddate           | timestamp without time zone |
> >  emailbounced        | numeric(2,0)                |
> >  changedate          | timestamp without time zone |
> >  optoutsource        | character varying(100)      |
> >  websiteid           | numeric                     |
> >  promotionid         | numeric                     |
> >  sourceid            | numeric                     |
> >  siteid              | character varying(64)       |
> >  srcwebsiteid        | numeric                     |
> >  homephone           | character varying(20)       |
> >  homeareacode        | character varying(10)       |
> >  campaignid          | numeric                     |
> >  srcmemberid         | numeric                     |
> >  optoutdate          | date                        |
> >  regcomplete         | numeric(1,0)                |
> >  regcompletesourceid | numeric                     |
> >  ipaddress           | character varying(25)       |
> >  pageid              | numeric                     |
> >  streetaddressstatus | numeric(1,0)                |
> >  middlename          | character varying(50)       |
> >  optinprechecked     | numeric(1,0)                |
> >  optinposition       | numeric                     |
> >  homephonestatus     | numeric                     |
> >  addeddate_id        | numeric                     |
> >  changedate_id       | numeric                     |
> >  rpmindex            | numeric                     |
> >  optmode             | numeric(1,0)                |
> >  countryid           | numeric                     |
> >  confirmoptin        | numeric(2,0)                |
> >  bouncedate          | date                        |
> >  memberageid         | numeric                     |
> >  sourceid2           | numeric                     |
> >  remoteuserid        | character varying(50)       |
> >  goal                | numeric(1,0)                |
> >  flowdepth           | numeric                     |
> >  pagetype            | numeric                     |
> >  savepassword        | character varying(50)       |
> >  customerprofileid   | numeric                     |
> > Indexes:
> >     "email_website_unq" UNIQUE, btree (emailaddress,
> > websiteid), tablespace "members_idx"
> >     "member_addeddateid_idx" btree (addeddate_id), tablespace
> > "members_idx"
> >     "member_changedateid_idx" btree (changedate_id),
> > tablespace "members_idx"
> >     "members_fdate_idx" btree
> > (to_char_year_month(addeddate)), tablespace "esave_idx"
> >     "members_memberid_idx" btree (memberid), tablespace "members_idx"
> >     "members_mid_emailok_idx" btree (memberid, emailaddress,
> > zipcode, firstname, emailok), tablespace "members_idx"
> >     "members_sorted_idx_001" btree (websiteid, emailok,
> > emailbounced, addeddate, memberid, zipcode, statecode,
> > emailaddress), tablespace "members_idx"
> >     "members_src_idx" btree (websiteid, emailbounced,
> > sourceid), tablespace "members_idx"
> >     "members_wid_idx" btree (websiteid), tablespace "members_idx"
> >
> > select tablename, indexname, tablespace, indexdef from
> > pg_indexes where tablename = 'members'; -[ RECORD
> > 1
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename  | members
> > indexname  | members_fdate_idx
> > tablespace | esave_idx
> > indexdef   | CREATE INDEX members_fdate_idx ON members USING btree
> > (to_char_year_month(addeddate))
> > -[ RECORD
> > 2
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename  | members
> > indexname  | member_changedateid_idx
> > tablespace | members_idx
> > indexdef   | CREATE INDEX member_changedateid_idx ON members
> > USING btree
> > (changedate_id)
> > -[ RECORD
> > 3
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename  | members
> > indexname  | member_addeddateid_idx
> > tablespace | members_idx
> > indexdef   | CREATE INDEX member_addeddateid_idx ON members
> > USING btree
> > (addeddate_id)
> > -[ RECORD
> > 4
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename  | members
> > indexname  | members_wid_idx
> > tablespace | members_idx
> > indexdef   | CREATE INDEX members_wid_idx ON members USING btree
> > (websiteid)
> > -[ RECORD
> > 5
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename  | members
> > indexname  | members_src_idx
> > tablespace | members_idx
> > indexdef   | CREATE INDEX members_src_idx ON members USING btree
> > (websiteid, emailbounced, sourceid)
> > -[ RECORD
> > 6
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename  | members
> > indexname  | members_sorted_idx_001
> > tablespace | members_idx
> > indexdef   | CREATE INDEX members_sorted_idx_001 ON members
> > USING btree
> > (websiteid, emailok, emailbounced, addeddate, memberid,
> > zipcode, statecode, emailaddress) -[ RECORD
> > 7
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename  | members
> > indexname  | members_mid_emailok_idx
> > tablespace | members_idx
> > indexdef   | CREATE INDEX members_mid_emailok_idx ON members
> > USING btree
> > (memberid, emailaddress, zipcode, firstname, emailok) -[ RECORD
> > 8
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename  | members
> > indexname  | members_memberid_idx
> > tablespace | members_idx
> > indexdef   | CREATE INDEX members_memberid_idx ON members USING btree
> > (memberid)
> > -[ RECORD
> > 9
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename  | members
> > indexname  | email_website_unq
> > tablespace | members_idx
> > indexdef   | CREATE UNIQUE INDEX email_website_unq ON members USING
> > btree (emailaddress, websiteid)
> >
> >
> > This table has also been vacuumed analyzed as well:
> >
> > select * from pg_stat_all_tables where relname = 'members';
> > -[ RECORD 1 ]----+------------------------------
> > relid            | 3112786
> > schemaname       | xxxxx
> > relname          | members
> > seq_scan         | 298
> > seq_tup_read     | 42791828896
> > idx_scan         | 31396925
> > idx_tup_fetch    | 1083796963
> > n_tup_ins        | 291308316
> > n_tup_upd        | 0
> > n_tup_del        | 4188020
> > n_tup_hot_upd    | 0
> > n_live_tup       | 285364632
> > n_dead_tup       | 109658
> > last_vacuum      | 2010-10-12 20:26:01.227393-04
> > last_autovacuum  |
> > last_analyze     | 2010-10-12 20:28:01.105656-04
> > last_autoanalyze | 2010-09-16 20:50:00.712418-04
> >
> >
>
>
> Tony,
> For your query:
>
> > select
> > emailaddress, websiteid
> >   from members
> >  where emailok = 1
> >    and emailbounced = 0;
>
> your table doesn't have any indexes where "emailok" or "emailbounced"
> are leading columns.
> That's why existing indexes can not be used.
>
> If you specified "websiteid" in the "where" clause then (most probably)
> the index members_sorted_idx_001 will be used (based on selectivity and
> statistics known to optimizer).
>
> If this query (as is - without "websiteid") is important for your app,
> create another index on (emailok, emailbounced) which should help, of
> course if selectivity of your where clause is good enough (not to
> perform full table scan).
>
> Regards,
> Igor Neyman
>



Re: oracle to psql migration - slow query in postgres

От
"Igor Neyman"
Дата:
> -----Original Message-----
> From: Tony Capobianco [mailto:tcapobianco@prospectiv.com]
> Sent: Friday, October 15, 2010 2:14 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: oracle to psql migration - slow query in postgres
>
> Thanks for all your responses. What's interesting is that an
> index is used when this query is executed in Oracle.  It
> appears to do some parallel processing:
>
> SQL> set line 200
> delete from plan_table;
> explain plan for
> select websiteid, emailaddress
>   from members
>  where emailok = 1
>    and emailbounced = 0;
>
> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
> SQL>
> 3 rows deleted.
>
> SQL>   2    3    4    5
> Explained.
>
> SQL> SQL>
> PLAN_TABLE_OUTPUT
> --------------------------------------------------------------
> --------------------------------------------------------------
> --------------------------------------------------------------
> --------------
> Plan hash value: 4247959398
>
> --------------------------------------------------------------
> -----------------------------------------------------------------
> | Id  | Operation               | Name                   |
> Rows  | Bytes
> | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
> --------------------------------------------------------------
> -----------------------------------------------------------------
> |   0 | SELECT STATEMENT        |                        |   237M|
> 7248M|   469K  (2)| 01:49:33 |        |      |            |
> |   1 |  PX COORDINATOR         |                        |       |
> |            |          |        |      |            |
> |   2 |   PX SEND QC (RANDOM)   | :TQ10000               |   237M|
> 7248M|   469K  (2)| 01:49:33 |  Q1,00 | P->S | QC (RAND)  |
> |   3 |    PX BLOCK ITERATOR    |                        |   237M|
> 7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWC |            |
> |*  4 |     INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 |   237M|
> 7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWP |            |
> --------------------------------------------------------------
> -----------------------------------------------------------------
>
> PLAN_TABLE_OUTPUT
> --------------------------------------------------------------
> --------------------------------------------------------------
> --------------------------------------------------------------
> --------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
>    4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1)
>
> 16 rows selected.
>
>

1. Postgres doesn't have "FAST FULL SCAN" because even if all the info
is in the index, it need to visit the row in the table ("visibility"
issue).

2. Postgres doesn't have parallel executions.

BUT, it's free anf has greate community support, as you already saw.

Regards,
Igor Neyman

Re: oracle to psql migration - slow query in postgres

От
Tony Capobianco
Дата:
Very true Igor!  Free is my favorite price.
I'll figure a way around this issue.

Thanks for your help.
Tony

On Fri, 2010-10-15 at 14:54 -0400, Igor Neyman wrote:
> > -----Original Message-----
> > From: Tony Capobianco [mailto:tcapobianco@prospectiv.com]
> > Sent: Friday, October 15, 2010 2:14 PM
> > To: pgsql-performance@postgresql.org
> > Subject: Re: oracle to psql migration - slow query in postgres
> >
> > Thanks for all your responses. What's interesting is that an
> > index is used when this query is executed in Oracle.  It
> > appears to do some parallel processing:
> >
> > SQL> set line 200
> > delete from plan_table;
> > explain plan for
> > select websiteid, emailaddress
> >   from members
> >  where emailok = 1
> >    and emailbounced = 0;
> >
> > SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
> > SQL>
> > 3 rows deleted.
> >
> > SQL>   2    3    4    5
> > Explained.
> >
> > SQL> SQL>
> > PLAN_TABLE_OUTPUT
> > --------------------------------------------------------------
> > --------------------------------------------------------------
> > --------------------------------------------------------------
> > --------------
> > Plan hash value: 4247959398
> >
> > --------------------------------------------------------------
> > -----------------------------------------------------------------
> > | Id  | Operation               | Name                   |
> > Rows  | Bytes
> > | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
> > --------------------------------------------------------------
> > -----------------------------------------------------------------
> > |   0 | SELECT STATEMENT        |                        |   237M|
> > 7248M|   469K  (2)| 01:49:33 |        |      |            |
> > |   1 |  PX COORDINATOR         |                        |       |
> > |            |          |        |      |            |
> > |   2 |   PX SEND QC (RANDOM)   | :TQ10000               |   237M|
> > 7248M|   469K  (2)| 01:49:33 |  Q1,00 | P->S | QC (RAND)  |
> > |   3 |    PX BLOCK ITERATOR    |                        |   237M|
> > 7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWC |            |
> > |*  4 |     INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 |   237M|
> > 7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWP |            |
> > --------------------------------------------------------------
> > -----------------------------------------------------------------
> >
> > PLAN_TABLE_OUTPUT
> > --------------------------------------------------------------
> > --------------------------------------------------------------
> > --------------------------------------------------------------
> > --------------
> >
> > Predicate Information (identified by operation id):
> > ---------------------------------------------------
> >
> >    4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1)
> >
> > 16 rows selected.
> >
> >
>
> 1. Postgres doesn't have "FAST FULL SCAN" because even if all the info
> is in the index, it need to visit the row in the table ("visibility"
> issue).
>
> 2. Postgres doesn't have parallel executions.
>
> BUT, it's free anf has greate community support, as you already saw.
>
> Regards,
> Igor Neyman
>