Various performance questions

Поиск
Список
Период
Сортировка
От Dror Matalon
Тема Various performance questions
Дата
Msg-id 20031026194449.GD2979@rlx11.zapatec.com
обсуждение исходный текст
Ответы Re: Various performance questions  (Greg Stark <gsstark@mit.edu>)
Re: Various performance questions  (Tarhon-Onu Victor <mituc@iasi.rdsnet.ro>)
Список pgsql-performance
Hi,

We're in the process of setting up a new database server. The
application is an online rss aggregator which you can see at
www.fastbuzz.com (still running with the old hardware).

The new machine is a dual Xeon with 2 Gigs of ram

The OS is freebsd 4.9.

shared_buffers = 10000
sort_mem = 32768
effective_cache_size = 25520  -- freebsd forumla: vfs.hibufspace / 8192

1. While it seems to work correctly, I'm unclear on why this number is
correct.  25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it
seems like the number should be more like 1 - 1.5 Gigs.

2.  The main performance challenges are with the items table which has around
five million rows and grows at the rate of more than 100,000 rows a day.

If I do a select count(*) from the items table it take 55 - 60 seconds
to execute. I find it interesting that it takes that long whether it's
doing it the first time and fetching the pages from disk or on
subsequent request where it fetches the pages from memory.
I know that it's not touching the disks because I'm running an iostat in
a different window. Here's the explain analyze:

explain analyze select count(*) from items;
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=245377.53..245377.53 rows=1 width=0) (actual time=55246.035..55246.040 rows=1 loops=1)
   ->  Seq Scan on items  (cost=0.00..233100.62 rows=4910762 width=0)
(actual time=0.054..30220.641 rows=4910762 loops=1)
 Total runtime: 55246.129 ms
(3 rows)

and the number of pages:

select relpages from pg_class where relname = 'items';
 relpages
----------
   183993


So does it make sense that it would take close to a minute to count the 5 million rows
even if all pages are in memory?

3. Relpages is 183993 so file size should be  183993*8192 = 1507270656,
roughly 1.5 gig. The actual disk size is 1073741824 or roughly 1 gig.
Why the difference?



4. If I put a certain filter/condition on the query it tells me that it's doing
a sequential scan, and yet it takes less time than a full sequential
scan:

explain analyze select count(*) from items where channel < 5000;
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1)
   ->  Seq Scan on items  (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057
loops=1)
         Filter: (channel < 5000)
 Total runtime: 26224.703 ms


How can it do a sequential scan and apply a filter to it in less time
than the full sequential scan? Is it actually using an index without
really telling me?


Here's the structure of the items table

    Column     |           Type           | Modifiers
---------------+--------------------------+-----------
 articlenumber | integer                  | not null
 channel       | integer                  | not null
 title         | character varying        |
 link          | character varying        |
 description   | character varying        |
 comments      | character varying(500)   |
 dtstamp       | timestamp with time zone |
 signature     | character varying(32)    |
 pubdate       | timestamp with time zone |
Indexes:
    "item_channel_link" btree (channel, link)
    "item_created" btree (dtstamp)
    "item_signature" btree (signature)
    "items_channel_article" btree (channel, articlenumber)
    "items_channel_tstamp" btree (channel, dtstamp)


5. Any other comments/suggestions on the above setup.

Thanks,

Dror

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

В списке pgsql-performance по дате отправления:

Предыдущее
От: Reece Hart
Дата:
Сообщение: explicit casting required for index use
Следующее
От: Yonatan Goraly
Дата:
Сообщение: Re: Slow performance with no apparent reason