Обсуждение: slow select in big table

Поиск
Список
Период
Сортировка

slow select in big table

От
rafalak
Дата:
Hello i have big table
80mln records, ~6GB data, 2columns (int, int)

if query
select count(col1) from tab where col2=1234;
return low records (1-10) time is good 30-40ms
but when records is >1000 time is >12s


How to increse performace ?


my postgresql.conf
shared_buffers = 810MB
temp_buffers = 128MB
work_mem = 512MB
maintenance_work_mem = 256MB
max_stack_depth = 7MB
effective_cache_size = 800MB


db 8.3.7
server, atlon dual-core 2,0Ghz, 2GB RAM, SATA


Re: slow select in big table

От
Abbas
Дата:


On Fri, Apr 3, 2009 at 2:18 AM, rafalak <rafalak@gmail.com> wrote:
Hello i have big table
80mln records, ~6GB data, 2columns (int, int)

if query
select count(col1) from tab where col2=1234;
return low records (1-10) time is good 30-40ms
but when records is >1000 time is >12s


How to increse performace ?


my postgresql.conf
shared_buffers = 810MB
temp_buffers = 128MB
work_mem = 512MB
maintenance_work_mem = 256MB
max_stack_depth = 7MB
effective_cache_size = 800MB


db 8.3.7
server, atlon dual-core 2,0Ghz, 2GB RAM, SATA


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Is the table has indexes?
Decreasing the work_mem also increase performance.
Monitor these changes by explain the query plan.

Regards,
Abbas.

Re: slow select in big table

От
Scott Marlowe
Дата:
On Thu, Apr 2, 2009 at 2:48 PM, rafalak <rafalak@gmail.com> wrote:
> Hello i have big table
> 80mln records, ~6GB data, 2columns (int, int)
>
> if query
> select count(col1) from tab where col2=1234;
> return low records (1-10) time is good 30-40ms
> but when records is >1000 time is >12s
>
>
> How to increse performace ?
>
>
> my postgresql.conf
> shared_buffers = 810MB
> temp_buffers = 128MB
> work_mem = 512MB
> maintenance_work_mem = 256MB
> max_stack_depth = 7MB
> effective_cache_size = 800MB

Try lowering random_page_cost close to the setting of seq_page_cost
(i.e. just over 1 on a default seq_page_cost) and see if that helps.

Re: slow select in big table

От
rafalak
Дата:
> shared_buffers = 810MB
> temp_buffers = 128MB
> work_mem = 512MB
> maintenance_work_mem = 256MB
> max_stack_depth = 7MB
> effective_cache_size = 800MB

QUERY PLAN without changes
Aggregate  (cost=98018.96..98018.97 rows=1 width=4) (actual
time=64049.326..64049.328 rows=1 loops=1)
  ->  Bitmap Heap Scan on tbl_photos_keywords  (cost=533.23..97940.02
rows=31577 width=4) (actual time=157.787..63905.939 rows=119154
loops=1)
        Recheck Cond: (keyword_id = 14)
        ->  Bitmap Index Scan on keyword_id  (cost=0.00..525.33
rows=31577 width=0) (actual time=120.876..120.876 rows=119154 loops=1)
              Index Cond: (keyword_id = 14)

Total runtime: 64049.686 ms
--------------------------------------------
shared_buffers = 810MB
temp_buffers = 128MB
work_mem = 12MB
maintenance_work_mem = 256MB
max_stack_depth = 7MB
enable_seqscan = off
seq_page_cost = 1.0
random_page_cost = 1.0
effective_cache_size = 800MB

QUERY PLAN with changes
Aggregate  (cost=30546.30..30546.31 rows=1 width=4) (actual
time=1710.100..1710.102 rows=1 loops=1)
  ->  Index Scan using keyword_id on tbl_photos_keywords
(cost=0.00..30467.36 rows=31577 width=4) (actual time=0.150..1573.843
rows=119154 loops=1)
        Index Cond: (keyword_id = 14)

Total runtime: 1710.185 ms

What else can be changed ?
Thx for help.

Re: slow select in big table

От
Sam Mason
Дата:
On Fri, Apr 03, 2009 at 01:20:33AM -0700, rafalak wrote:
> QUERY PLAN without changes
> Aggregate  (cost=98018.96..98018.97 rows=1 width=4) (actual time=64049.326..64049.328 rows=1 loops=1)
>   ->  Bitmap Heap Scan on tbl_photos_keywords  (cost=533.23..97940.02 rows=31577 width=4) (actual
time=157.787..63905.939rows=119154 loops=1) 
>         Recheck Cond: (keyword_id = 14)
>         ->  Bitmap Index Scan on keyword_id  (cost=0.00..525.33 rows=31577 width=0) (actual time=120.876..120.876
rows=119154loops=1) 
>               Index Cond: (keyword_id = 14)

Pulling in 120k rows from disk is always going to take a long time.

> Total runtime: 64049.686 ms

If they're scattered randomly across the table it's going to take a
lot longer even than this; assuming a 8ms average seek time that's 30
minutes (two seeks for each row) and is why PG was preferring to pick a
bitmap heap scan over an index scan.  The reason it only took one minute
is because the data isn't randomly distributed.

> QUERY PLAN with changes

> Aggregate  (cost=30546.30..30546.31 rows=1 width=4) (actual time=1710.100..1710.102 rows=1 loops=1)
>   ->  Index Scan using keyword_id on tbl_photos_keywords (cost=0.00..30467.36 rows=31577 width=4) (actual
time=0.150..1573.843rows=119154 loops=1) 
>         Index Cond: (keyword_id = 14)

> Total runtime: 1710.185 ms

Is the cache hot now? i.e. did you run the tests in that order and hence
the first run pulled the data off disk and into memory where the second
test could use it without hitting the disk?

If you're running a recent Linux, you can tell it to drop the disk
cache:

  http://linux-mm.org/Drop_Caches

If you do this and then restart PG (because PG has its own caches) you
should be able to see what's going on better.

> What else can be changed ?

Disks are slow and it's the seek time which is killing you; as your
dataset isn't too big you could upgrade the memory in the machine so the
entire table can be in memory at once and then you won't have a problem.

If you want to optimise this case you could also look at CLUSTERing this
table on the keyword.

--
  Sam  http://samason.me.uk/

Re: slow select in big table

От
Tom Lane
Дата:
rafalak <rafalak@gmail.com> writes:
> QUERY PLAN without changes
> Aggregate  (cost=98018.96..98018.97 rows=1 width=4) (actual
> time=64049.326..64049.328 rows=1 loops=1)
>   ->  Bitmap Heap Scan on tbl_photos_keywords  (cost=533.23..97940.02
> rows=31577 width=4) (actual time=157.787..63905.939 rows=119154
> loops=1)
>         Recheck Cond: (keyword_id = 14)
>         ->  Bitmap Index Scan on keyword_id  (cost=0.00..525.33
> rows=31577 width=0) (actual time=120.876..120.876 rows=119154 loops=1)
>               Index Cond: (keyword_id = 14)

> enable_seqscan = off

This is a bad idea (and did not affect your plan anyway)

> random_page_cost = 1.0

This might or might not be a good idea, depending on whether your
database fits in RAM or not.

> QUERY PLAN with changes
> Aggregate  (cost=30546.30..30546.31 rows=1 width=4) (actual
> time=1710.100..1710.102 rows=1 loops=1)
>   ->  Index Scan using keyword_id on tbl_photos_keywords
> (cost=0.00..30467.36 rows=31577 width=4) (actual time=0.150..1573.843
> rows=119154 loops=1)
>         Index Cond: (keyword_id = 14)

It's hard to believe that this plan actually beats the other one on a
consistent basis; and especially not by that much.  I think what really
happened here is that the data was all cached in the second test,
because the first test read it all in already.  You need to test both
cases (cached and not) to get a clearer picture of what you're doing.

            regards, tom lane