Re: Expected performance of querying 5k records from 4 million records?

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Expected performance of querying 5k records from 4 million records?
Дата
Msg-id CAMkU=1zn+Z9iw96nEqJjqSzysv53DT8Y0Kba2GTw9a9qHtEC6w@mail.gmail.com
обсуждение исходный текст
Ответ на Expected performance of querying 5k records from 4 million records?  (Anish Kejariwal <anishkej@gmail.com>)
Ответы Re: Expected performance of querying 5k records from 4 million records?  (Anish Kejariwal <anishkej@gmail.com>)
Список pgsql-performance
On Fri, Jun 15, 2012 at 9:17 AM, Anish Kejariwal <anishkej@gmail.com> wrote:
>
> Below are the tables, queries, and execution plans with my questions with
> more detail.  (Since I have 250 partitions, I can query one partition after
> the other to ensure that I'm not pulling results form the cache)

Doesn't that explain why it is slow?  If you have 15000 rpm drives and
each row is in a different block and uncached, it would take 20
seconds to read them all in.  You are getting 10 times better than
that, either due to caching or because your rows are clustered, or
because effective_io_concurrency is doing its thing.

>
> explain analyze select data_id, dataset_id, stat from data_part_201 where
> dataset_id = 201
> and stat_id = 6 and data_id>=50544630 and data_id<=50549979;

What does "explain (analyze, buffers)" show?


> QUESTION 1: you can see that the query is very simple.  is this the optimal
> execution plan? any tips on what to look into to increase performance?
>
> I then tried adding the following multi-column index:
> "data_part_202_dataset_regionset_data_idx" btree (dataset_id, data_id,
> stat_id)

Since you query stat_id for equality and data_id for range, you should
probably reverse the order of those columns in the index.


>
> QUESTION 3:
> If I do the following:  reindex table data_part_204 the query now takes
> 50-70 milliseconds.  Is this because the table is getting cached?  How do I
> know if a particular query is coming from the cache?

Using explain (analyze, buffers) will show you if it is coming from
the shared_buffers cache.

It is harder to see if it is coming from the file system cache.  If
the server is mostly idle other than your stuff, you can run vmstat
and see how much physical IO is caused by your activity.

Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: Expected performance of querying 5k records from 4 million records?
Следующее
От: Benedict Holland
Дата:
Сообщение: Update blocking a select count(*)?