Re: oracle to psql migration - slow query in postgres

Поиск
Список
Период
Сортировка
От Mladen Gogala
Тема Re: oracle to psql migration - slow query in postgres
Дата
Msg-id 4CB86A84.1000101@vmsinfo.com
обсуждение исходный текст
Ответ на Re: oracle to psql migration - slow query in postgres  (Samuel Gendler <sgendler@ideasculptor.com>)
Список pgsql-performance
Samuel Gendler wrote:
>
>
> On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala
> <mladen.gogala@vmsinfo.com <mailto:mladen.gogala@vmsinfo.com>> wrote:
>
>      If working with partitioning, be very aware that PostgreSQL
>     optimizer has certain problems with partitions, especially with
>     group functions. If you want speed, everything must be prefixed
>     with partitioning column: indexes, expressions, joins. There is no
>     explicit star schema and creating hash indexes will not buy you
>     much, as a matter of fact, Postgres community is extremely
>     suspicious of the hash indexes and I don't see them widely used.
>     Having said that, I was able to solve the problems with my speed
>     and partitioning.
>
>
> Could you elaborate on this, please? What do you mean by 'everythin
> must be prefixed with partitioning column?'
>
> --sam
If you have partitioned table part_tab, partitioned on the column
item_date and if there is a global primary key in Oracle, let's call it
item_id, then queries like "select * from part_tab where item_id=12345"
will perform worse than queries with item_date"

select * from part_tab where item_id=12345 and item_date='2010-10-15'

This also applies to inserts and updates. Strictly speaking, the
item_date column in the query above is not necessary, after all, the
item_id column is the primary key. However, with range scans you will
get much better results if you include the item_date column than if you
use combination of columns without. The term "prefixed indexes" is
borrowed from Oracle RDBMS and means that the beginning column in the
index is the column on which the table is partitioned. Oracle, as
opposed to Postgres, has global indexes, the indexes that span all
partitions. PostgreSQL only maintains indexes on each of the partitions
separately.  Oracle calls such indexes "local indexes" and defines them
on the partitioned table level. Here is a brief and rather succinct
explanation of the terminology:

http://www.oracle-base.com/articles/8i/PartitionedTablesAndIndexes.php


Of, course, there are other differences between Oracle partitioning and
PostgreSQL partitioning. The main difference is $10000/CPU.
I am talking from experience:

news=> \d moreover_documents
              Table "moreover.moreover_documents"
        Column        |            Type             | Modifiers
----------------------+-----------------------------+-----------
 document_id          | bigint                      | not null
 dre_reference        | bigint                      | not null
 headline             | character varying(4000)     |
 author               | character varying(200)      |
 url                  | character varying(1000)     |
 rank                 | bigint                      |
 content              | text                        |
 stories_like_this    | character varying(1000)     |
 internet_web_site_id | bigint                      | not null
 harvest_time         | timestamp without time zone |
 valid_time           | timestamp without time zone |
 keyword              | character varying(200)      |
 article_id           | bigint                      | not null
 media_type           | character varying(20)       |
 source_type          | character varying(20)       |
 created_at           | timestamp without time zone |
 autonomy_fed_at      | timestamp without time zone |
 language             | character varying(150)      |
Indexes:
    "moreover_documents_pkey" PRIMARY KEY, btree (document_id)
Triggers:
    insert_moreover_trigger BEFORE INSERT ON moreover_documents FOR EACH
ROW EXE
CUTE PROCEDURE moreover_insert_trgfn()
Number of child tables: 8 (Use \d+ to list them.)

The child tables are, of course, partitions.

Here is the original:


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> desc moreover_documents
 Name                       Null?    Type
 ----------------------------------------- --------
----------------------------
 DOCUMENT#                   NOT NULL NUMBER
 DRE_REFERENCE                   NOT NULL NUMBER
 HEADLINE                        VARCHAR2(4000)
 AUTHOR                         VARCHAR2(200)
 URL                            VARCHAR2(1000)
 RANK                            NUMBER
 CONTENT                        CLOB
 STORIES_LIKE_THIS                    VARCHAR2(1000)
 INTERNET_WEB_SITE#               NOT NULL NUMBER
 HARVEST_TIME                        DATE
 VALID_TIME                        DATE
 KEYWORD                        VARCHAR2(200)
 ARTICLE_ID                   NOT NULL NUMBER
 MEDIA_TYPE                        VARCHAR2(20)
 CREATED_AT                        DATE
 SOURCE_TYPE                        VARCHAR2(50)
 PUBLISH_DATE                        DATE
 AUTONOMY_FED_AT                    DATE
 LANGUAGE                        VARCHAR2(150)

SQL>



I must say that it took me some time to get things right.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




В списке pgsql-performance по дате отправления:

Предыдущее
От: Devrim GÜNDÜZ
Дата:
Сообщение: Re: Slow count(*) again...
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: oracle to psql migration - slow query in postgres