Small Queries Really Fast, Large Queries Really Slow...

Поиск
Список
Период
Сортировка
От Tom Wilcox
Тема Small Queries Really Fast, Large Queries Really Slow...
Дата
Msg-id AANLkTikDkh7JsXoG_xBm36lD4W75tcqwRYC2p_sZ8Qn_@mail.gmail.com
обсуждение исходный текст
Ответы Re: Small Queries Really Fast, Large Queries Really Slow...  (tv@fuzzy.cz)
Список pgsql-performance
Hi again!

I have finally got my Ubuntu VirtualBox VM running PostgreSQL with PL/Python and am now looking at performance.

So here's the scenario:

We have a great big table:

cse=# \d nlpg.match_data
                                         Table "nlpg.match_data"
      Column       |   Type   |                                Modifiers                                
-------------------+----------+--------------------------------------------------------------------------
 premise_id        | integer  |
 usrn              | bigint   |
 org               | text     |
 sao               | text     |
 level             | text     |
 pao               | text     |
 name              | text     |
 street            | text     |
 town              | text     |
 postcode          | text     |
 match_data_id     | integer  | not null default nextval('nlpg.match_data_match_data_id_seq1'::regclass)
 addr_str          | text     |
 tssearch_name     | tsvector |
 tssearch_street   | tsvector |
 tssearch_addr_str | tsvector |
Indexes:
    "match_data_pkey1" PRIMARY KEY, btree (match_data_id)
    "index_match_data_mid" btree (match_data_id)
    "index_match_data_pid" btree (premise_id)
    "index_match_data_tssearch_addr_str" gin (tssearch_addr_str)
    "index_match_data_tssearch_name" gin (tssearch_name)
    "index_match_data_tssearch_street" gin (tssearch_street)
    "index_match_data_usrn" btree (usrn)

KEY NOTE:
nlpg.match_data has approximately 27,000,000 rows..

Running this query:

EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org WHERE match_data_id < 1000000;

I get this:

"Index Scan using match_data_pkey1 on match_data  (cost=0.00..1452207.14 rows=1913756 width=302) (actual time=23.448..61559.652 rows=999999 loops=1)"
"  Index Cond: (match_data_id < 1000000)"
"Total runtime: 403855.675 ms"

I copied a chunk of the table like this:

CREATE TABLE nlpg.md_copy AS SELECT * FROM nlpg.match_data WHERE match_data_id < 1000000;

Then ran the same query on the smaller copy table:

EXPLAIN ANALYZE UPDATE nlpg.md_copy SET org = org WHERE match_data_id < 1000000;

And get this:

"Seq Scan on md_copy  (cost=0.00..96935.99 rows=999899 width=301) (actual time=26.745..33944.923 rows=999999 loops=1)"
"  Filter: (match_data_id < 1000000)"
"Total runtime: 57169.419 ms"

As you can see this is much faster per row with the smaller table chunk. I then tried running the same first query with 10 times the number of rows:

EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org WHERE match_data_id < 10000000;

This takes a massive amount of time (still running) and is definitely a non-linear increase in the run time in comparison with the previous query.

EXPLAIN UPDATE nlpg.match_data SET org = org WHERE match_data_id < 10000000;
"Seq Scan on match_data  (cost=0.00..3980053.11 rows=19172782 width=302)"
"  Filter: (match_data_id < 10000000)"

Any suggestions on what I can do to speed things up? I presume if I turn off Sequential Scan then it might default to Index Scan.. Is there anything else?

Cheers,
Tom

В списке pgsql-performance по дате отправления:

Предыдущее
От: Matthew Wakeling
Дата:
Сообщение: Re: WAL+Os on a single disk
Следующее
От: tv@fuzzy.cz
Дата:
Сообщение: Re: Small Queries Really Fast, Large Queries Really Slow...