Select count(*), the sequel

Поиск
Список
Период
Сортировка
От	Mladen Gogala
Тема	Select count(*), the sequel
Дата	18 октября 2010 г. 03:04:06
Msg-id	4CB9D82F.9040505@gmail.com обсуждение исходный текст
Ответы	Re: Select count(*), the sequel
Список	pgsql-performance
Дерево обсуждения
There was some doubt as for the speed of doing the select count(*) in
PostgreSQL and Oracle.
To that end, I copied the most part of the Oracle table I used before to
Postgres. Although the copy
wasn't complete, the resulting table is already significantly larger
than the table it was copied from. The result still shows that Oracle is
significantly faster:
Oracle result:

SQL> alter system flush buffer_cache;

System altered.

SQL> select /*+ full(NO) noparallel */ count(*) from ni_occurrence no;

   COUNT(*)
----------
  402062638

Elapsed: 00:03:16.45



Hints are necessary because Oracle table is declared as parallel and I
didn't want the PK index to be used for counting. Oracle has a good
habit of using PK's for counting, if available.


SQL> select bytes/1048576 as MB
   2  from user_segments
   3  where segment_name='NI_OCCURRENCE';

     MB
----------
      35329

Elapsed: 00:00:00.85
SQL>

So, oracle stores 402 million records in 35GB and counts them in 3
minutes 16.45 seconds  The very same table was partially copied to
Postgres, copying died with ORA-01555 snapshot too old sometimes this
morning. I ran vacuumdb -f -z on the database after the copy completed
and the results are below.

mgogala=# select count(*) from ni_occurrence;
    count
-----------
  382400476
(1 row)

Time: 221716.466 ms
mgogala=#
mgogala=# select 221/60::real;
      ?column?
------------------
  3.68333333333333
(1 row)

Time: 0.357 ms
mgogala=#
mgogala=# select pg_size_pretty(pg_table_size('ni_occurrence'));
  pg_size_pretty
----------------
  46 GB
(1 row)

Time: 0.420 ms
mgogala=#

The database wasn't restarted, no caches were flushed, the comparison
was done with a serious advantage for PostgreSQL. Postgres needed 3.68
minutes to complete the count which is about the same Oracle but still
somewhat slower. Also, I am worried about the sizes. Postgres table is
11GB larger than the original, despite having less data. That was an
unfair and unbalanced comparison because Oracle's cache was flushed and
Oracle was artificially restrained to use the full table scan without
the aid of parallelism. Here is the same result, with no hints and the
autotrace on, which shows what happens if I turn the hints off:

SQL> select count(*) from ni_occurrence no;

   COUNT(*)
----------
  402062638

Elapsed: 00:00:52.61

Execution Plan
----------------------------------------------------------
Plan hash value: 53476935

--------------------------------------------------------------------------------
----------------------------------------

| Id  | Operation          | Name          | Rows  | Cost (%CPU)|
  Time      |    TQ  |IN-OUT| PQ Distrib |

--------------------------------------------------------------------------------
----------------------------------------

|   0 | SELECT STATEMENT      |              |    1 | 54001  (19)|
  00:01:08 |       |      |           |

|   1 |  SORT AGGREGATE       |              |    1 |           |
       |       |      |           |

|   2 |   PX COORDINATOR      |              |      |           |
       |       |      |           |

|   3 |    PX SEND QC (RANDOM)      | :TQ10000          |    1 |           |
       |  Q1,00 | P->S | QC (RAND)  |

|   4 |     SORT AGGREGATE      |              |    1 |           |
       |  Q1,00 | PCWP |           |

|   5 |      PX BLOCK ITERATOR      |              |   402M| 54001  (19)|
  00:01:08 |  Q1,00 | PCWC |           |

|   6 |       INDEX FAST FULL SCAN| IDX_NI_OCCURRENCE_PID |   402M|
54001  (19)|
  00:01:08 |  Q1,00 | PCWP |           |

--------------------------------------------------------------------------------
----------------------------------------

It took just 52 seconds to count everything, but Oracle didn't even scan
the table, it scanned a unique index, in parallel. That is the
algorithmic advantage that forced me to restrict the execution plan with
hints. My conclusion is that the speed of the full scan is OK, about the
same as Oracle speed.  There are, however, three significant algorithm
advantages on the Oracle's side:

1) Oracle can use indexes to calculate "select count"
2) Oracle can use parallelism.
3) Oracle can use indexes in combination with the parallel processing.



Here are the descriptions:

SQL> desc ni_occurrence
  Name                       Null?    Type
  ----------------------------------------- --------
----------------------------
  ID                       NOT NULL NUMBER(22)
  PERMANENT_ID                   NOT NULL VARCHAR2(12)
  CALL_LETTERS                   NOT NULL VARCHAR2(5)
  AIRDATE                   NOT NULL DATE
  DURATION                   NOT NULL NUMBER(4)
  PROGRAM_TITLE                        VARCHAR2(360)
  COST                            NUMBER(15)
  ASSETID                        NUMBER(12)
  MARKET_ID                        NUMBER
  GMT_TIME                        DATE
  ORIG_ST_OCC_ID                     NUMBER
  EPISODE                        VARCHAR2(450)
  IMPRESSIONS                        NUMBER

SQL>
mgogala=# \d ni_occurrence
                Table "public.ni_occurrence"
      Column     |            Type             | Modifiers
----------------+-----------------------------+-----------
  id             | bigint                      | not null
  permanent_id   | character varying(12)       | not null
  call_letters   | character varying(5)        | not null
  airdate        | timestamp without time zone | not null
  duration       | smallint                    | not null
  program_title  | character varying(360)      |
  cost           | bigint                      |
  assetid        | bigint                      |
  market_id      | bigint                      |
  gmt_time       | timestamp without time zone |
  orig_st_occ_id | bigint                      |
  episode        | character varying(450)      |
  impressions    | bigint                      |
Indexes:
     "ni_occurrence_pk" PRIMARY KEY, btree (id)

mgogala=#

Oracle block is 16k, version is 10.2.0.5 RAC, 64 bit (is anybody still
using 32bit db servers?) . Postgres is 9.0.1, 64 bit. Both machines are
running Red Hat 5.5:


[mgogala@lpo-postgres-d01 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
[mgogala@lpo-postgres-d01 ~]$

Linux lpo-postgres-d01 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT
2010 x86_64 x86_64 x86_64 GNU/Linux
[mgogala@lpo-postgres-d01 ~]$



--
http://mgogala.freehostia.com
В списке pgsql-performance по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Select count(*), the sequel