improve select performance...

Поиск
Список
Период
Сортировка
От Steve Holdoway
Тема improve select performance...
Дата
Msg-id 20070516110810.ba3554fe.steve.holdoway@firetrust.com
обсуждение исходный текст
Ответы Re: improve select performance...  ("Phillip Smith" <phillip.smith@weatherbeeta.com.au>)
Re: improve select performance...  (Steve Holdoway <steve.holdoway@firetrust.com>)
Список pgsql-admin
Can anyone out there help me, I'm having a bit of a problem trying to improve the performance of a php script that does
adata load? 

At the moment, the script checks a database to see if the entry is present, and, if not, it adds it. The table has
about400,000 rows, and I'm checking a bulk load of about 50,000 entries. The table is defined as follows (not my
design!):

Welcome to psql 8.2.4, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

db=> \d badurls
                     Table "public.badurls"
   Column    |            Type             |     Modifiers
-------------+-----------------------------+--------------------
 url         | character varying           | not null
 ip          | character(20)               |
 dateadded   | timestamp without time zone | not null
 type        | character(1)                |
 publishdate | timestamp without time zone |
 status      | integer                     | not null default 1
 version     | integer                     | default 0
 edited      | integer                     | default 0
 category    | character(4)[]              |
Indexes:
    "idxbadurls_url" hash (url)
    "idxbadurls_version" btree (version)


and I'm accessing the url column. (I've tried a btree index as well... not much difference)

In an attempt to improve the performance, I'm getting a list of data that's already there in a single hit, and not
attemptingto insert them ( I'm still checking the rest first, just in case... ). 

So, I'm trying to run a select url from badurls where url in ( .... ) to get this list. However, this is also
desperatelyslow... 

1000 in the list: 14 sec
5000 in the list: 2:40
10000 in the list: 5:50

( run from a client machine over a 100mbit lan ).

Anyway, that's enough to prove to me that I need to rejig stuff to get this to work!!!

Server's a dual xeon machine, with hardware raid, and running debian. It's got 4GB of memory, 1GB is currently not even
usedfor io buffers. 

The postgres instance is configured as standard, except for increasing the shared_buffers to 200MB.

Explain plan for about 50 entries in the 'in' clause returns

Bitmap Heap Scan on badurls  (cost=4531.22..15748.67 rows=60462 width=32)
   Recheck Cond: ((url)::text = ANY (('{.....}'::character varying[])::text[]))
   ->  Bitmap Index Scan on idxbadurls_url  (cost=0.00..4516.10 rows=60462 width=0)
         Index Cond: ((url)::text = ANY (('{.....}'::character varying[])::text[]))
(4 rows)


Can anyone suggest what I should be reconfiguring???

Cheers,


Steve

В списке pgsql-admin по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: How overcome wait for vacuum full?
Следующее
От: Geoff Tolley
Дата:
Сообщение: Re: How overcome wait for vacuum full?