Re: Help in avoiding a query 'Warm-Up' period/shared buffer cache

Поиск
Список
Период
Сортировка
От Mark Liberman
Тема Re: Help in avoiding a query 'Warm-Up' period/shared buffer cache
Дата
Msg-id 200601051815.36181.mliberman@mixedsignals.com
обсуждение исходный текст
Ответ на Re: Help in avoiding a query 'Warm-Up' period/shared buffer cache  ("Qingqing Zhou" <zhouqq@cs.toronto.edu>)
Ответы Re: Help in avoiding a query 'Warm-Up' period/shared buffer  (David Lang <dlang@invendra.net>)
Список pgsql-performance
On Thursday 05 January 2006 15:12, Qingqing Zhou wrote:
> "Mark Liberman" <mliberman@mixedsignals.com> wrote
>
> > First run, after a night of inactivity:
> >
> >                     ->  Bitmap Index Scan on
> > 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0)
> > (actual time=313.468..313.468 rows=11082
> > loops=1)
> >                           Index Cond: (file_id = 137271)
> > Total runtime: 313.643 ms
> >
> > Second run, after that:
> >
> >                     ->  Bitmap Index Scan on
> > 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0)
> > (actual time=2.106..2.106 rows=11082 loops=1)
> >                           Index Cond: (file_id = 137271)
> > Total runtime: 2.276 ms
>
> It is clear that the first query takes longer time because of the IO time
> of index 1min_events_file_id_begin_idx (see 313.468 vs. 2.106). I am afraid
> currently there is no easy solution for this situation, unless you could
> predicate which part of relation/index your query will use, then you can
> preload or "warm-up" cache for it.
>
> Regards,
> Qingqing


Thanks Qingqing,

this actually helped me determine that the compound index,
1min_events_file_id_begin_idx, is not the proper index to use as it is based
on file_id and begin_time - the later of which is not involved in the where
clause.  It is only using that index to "filter" out the listed file_id.

Now, my follow-up question / assumption.  I am assuming that the  IO time is
so long on that index because it has to read the entire index (for that
file_id) into memory (because it cannot just scan the rows with a certain
date range because we are not using begin_time in the where clause).

But, if I replaced that compound index with the proper compound index of
file_id / end_time, it would give similar performance results to the scan on
1min_events_end_idx (which was < 1 ms).  E.g. the latest rows that were
updated are more likely to be in the cache - and it is smart enough to only
read the index rows that it needs.

Alternatively, I could create a single index on file_id (and rely upon the new
bitmap scan capabilities in 1.2).  But, I fear that, although this will be
smaller than the erroneous compound index on file_id / begin_time, it will
still display the same behavior in that it will need to read all rows from
that index for the appropriate file_id - and since the data goes back every
minute for 60 days, that IO might be large.

Obviously, I will be testing this - but it might take a few days, as I haven't
figure out how to simulate the "period of inactivity" to get the data flushed
out of the cache ... so I have to run this each morning.  But, any
confirmation / corrections to my assumptions are greatly appreciated. E.g. is
the compound index the way to go, or the solo index on file_id?

Thanks,

Mark


В списке pgsql-performance по дате отправления:

Предыдущее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: improving write performance for logging
Следующее
От: David Lang
Дата:
Сообщение: Re: Help in avoiding a query 'Warm-Up' period/shared buffer