Обсуждение: Query seem to slow if table have more than 200 million rows

Поиск
Список
Период
Сортировка

Query seem to slow if table have more than 200 million rows

От
"Ahmad Fajar"
Дата:

If I do a simple query like:

Select ids, keywords from dict where keywords='blabla' ('blabla' is a single word);

The table have 200 million rows, I have index the keywords field. On the first time my query seem to slow to get the result, about 15-60 sec to get the result. But if I repeat the query I will get fast result. My question is why on the first time the query seem very slow.

Table structure is quite simple:

Ids bigint, keywords varchar(150), weight varchar(1), dpos int.

 

I use latest pgAdmin3 to test all queries. My linux box is Redhat 4 AS, kernel 2.6.9-11, postgresql version 8.0.3, 2x200 GB SATA 7200 RPM configure as RAID0 with ext3 file system for postgresql data only. 80 GB EIDE 7200 RPM with ext3 file system for OS only. The server has 2 GB RAM with P4 3,2 GHz.

 

If I do this query on mssql server, with the same hardware spesification and same data, mssql server beat postgresql, the query about 0-4 sec to get the result. What wrong with my postgresql.

 

wassalam,

ahmad fajar

 

Re: Query seem to slow if table have more than 200 million rows

От
"Qingqing Zhou"
Дата:
""Ahmad Fajar"" <gendowo@konphalindo.or.id> wrote
>
> Select ids, keywords from dict where keywords='blabla' ('blabla' is a
> single
> word);
>
> The table have 200 million rows, I have index the keywords field. On the
> first time my query seem to slow to get the result, about 15-60 sec to get
> the result. But if I repeat the query I will get fast result. My question
> is
> why on the first time the query seem very slow.
>
> Table structure is quite simple:
>
> Ids bigint, keywords varchar(150), weight varchar(1), dpos int.
>

The first slowness is obviously caused by disk IOs. The second time is
faster because all data pages it requires are already in buffer pool. 200
million rows is not a problem for btree index, even if your client tool
appends some spaces to your keywords at your insertion time, the ideal btree
is 5 to 6 layers high at most. Can you show the iostats of index from your
statistics view?
http://www.postgresql.org/docs/8.0/static/monitoring-stats.html#MONITORING-STATS-VIEWS

Regards,
Qingqing



Re: Query seem to slow if table have more than 200 million rows

От
"Ahmad Fajar"
Дата:
Hi Qingqing,

I don't know whether the statistic got is bad or good, this is the
statistic:
scooby=# select a.relid, a.relname, b.indexrelid, b.indexrelname,
c.idx_scan, c.idx_tup_read, c.idx_tup_fetch,
scooby-# a.heap_blks_read, a.heap_blks_hit, a.idx_blks_read, a.idx_blks_hit,
scooby-# a.toast_blks_read, a.toast_blks_hit, a.tidx_blks_read,
a.tidx_blks_hit, b.idx_blks_read, b.idx_blks_hit
scooby-# from pg_statio_user_tables a, pg_statio_user_indexes b,
pg_stat_all_indexes c
scooby-# where a.relid=b.relid and a.relid=c.relid and
b.indexrelid=c.indexrelid and a.relname=b.relname and
scooby-# a.relname=c.relname and a.relname='fti_dict1';
  relid   |  relname  | indexrelid | indexrelname | idx_scan | idx_tup_read
| idx_tup_fetch | heap_blks_read | heap_blks_hit | idx
_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
tidx_blks_read | tidx_blks_hit | idx_blks_read | idx_blks_hit
----------+-----------+------------+--------------+----------+--------------
+---------------+----------------+---------------+----
-----------+--------------+-----------------+----------------+--------------
--+---------------+---------------+--------------
 22880226 | fti_dict1 |   22880231 | idx_dict3    |        0 |            0
|             0 |              0 |             0 |
         0 |            0 |                 |                |
|               |             0 |            0
 22880226 | fti_dict1 |   22880230 | idx_dict2    |        7 |       592799
|        592799 |              0 |             0 |
         0 |            0 |                 |                |
|               |             0 |            0
 22880226 | fti_dict1 |   22880229 | idx_dict1    |        0 |            0
|             0 |              0 |             0 |
         0 |            0 |                 |                |
|               |             0 |            0
(3 rows)

I have try several time the query below with different keyword, but I just
got idx_tup_read and idx_tup_fetch changed, others keep zero.
The Index are:
Ids (Idx_dict1),
keywords (idx_dict2 varchar_ops),
keywords (idx_dict3 varchar_pattern_ops) ==> I use this index for query ...
keywords like 'blabla%', just for testing purpose

Regards,
ahmad fajar

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Qingqing Zhou
Sent: Selasa, 27 September 2005 8:43
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query seem to slow if table have more than 200
million rows


""Ahmad Fajar"" <gendowo@konphalindo.or.id> wrote
>
> Select ids, keywords from dict where keywords='blabla' ('blabla' is a
> single
> word);
>
> The table have 200 million rows, I have index the keywords field. On the
> first time my query seem to slow to get the result, about 15-60 sec to get
> the result. But if I repeat the query I will get fast result. My question
> is
> why on the first time the query seem very slow.
>
> Table structure is quite simple:
>
> Ids bigint, keywords varchar(150), weight varchar(1), dpos int.
>

The first slowness is obviously caused by disk IOs. The second time is
faster because all data pages it requires are already in buffer pool. 200
million rows is not a problem for btree index, even if your client tool
appends some spaces to your keywords at your insertion time, the ideal btree

is 5 to 6 layers high at most. Can you show the iostats of index from your
statistics view?
http://www.postgresql.org/docs/8.0/static/monitoring-stats.html#MONITORING-S
TATS-VIEWS

Regards,
Qingqing



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


Re: Query seem to slow if table have more than 200 million rows

От
"Qingqing Zhou"
Дата:
""Ahmad Fajar"" <gendowo@konphalindo.or.id> wrote
> Hi Qingqing,
>
> I don't know whether the statistic got is bad or good, this is the
> statistic:

Please do it in this way:

1. Start postmaster with "stats_start_collector=true" and
"stats_block_level=true".

2. Use psql connect it, do something like this:

test=# select pg_stat_reset();
 pg_stat_reset
---------------
 t
(1 row)

test=# select * from pg_statio_user_indexes ;
 relid | indexrelid | schemaname | relname | indexrelname | idx_blks_read |
idx_
blks_hit
-------+------------+------------+---------+--------------+---------------+-----
---------
 16385 |      16390 | public     | test    | test_idx     |             0 |
       0
(1 row)

test=# select count(*) from test where a <= 1234;
 count
-------
  7243
(1 row)

test=# select * from pg_statio_user_indexes ;
 relid | indexrelid | schemaname | relname | indexrelname | idx_blks_read |
idx_
blks_hit
-------+------------+------------+---------+--------------+---------------+-----
---------
 16385 |      16390 | public     | test    | test_idx     |            55 |
       0
(1 row)


This gives us that to get "select count(*) from test where a <= 1234", I
have to read 55 index blocks (no index block hit since I just restart
postmaster so the bufferpool is empty).


Regards,
Qingqing