Re: GiST, caching, and consistency

От: Matthew Wakeling
Тема: Re: GiST, caching, and consistency
Дата: ,
Msg-id: alpine.DEB.2.00.0908051104590.18938@aragorn.flymine.org
(см: обсуждение, исходный текст)
Ответ на: Re: GiST, caching, and consistency  (Greg Stark)
Ответы: Re: GiST, caching, and consistency  (Robert Haas)
Re: GiST, caching, and consistency  (Tom Lane)
Список: pgsql-performance

Скрыть дерево обсуждения

GiST, caching, and consistency  (Matthew Wakeling, )
 Re: GiST, caching, and consistency  (Robert Haas, )
  Re: GiST, caching, and consistency  (Greg Stark, )
   Re: GiST, caching, and consistency  (Matthew Wakeling, )
    Re: GiST, caching, and consistency  (Robert Haas, )
     Re: GiST, caching, and consistency  (Matthew Wakeling, )
    Re: GiST, caching, and consistency  (Tom Lane, )

On Wed, 5 Aug 2009, Greg Stark wrote:
> On Tue, Aug 4, 2009 at 11:56 PM, Robert Haas<> wrote:
>> Beats me.  It looks like the first few queries are pulling stuff into
>> cache, and then after that it settles down, but I'm not sure why it
>> takes 5 repetitions to do that.  Is the plan changing?
>
> Yeah, we're just guessing without the explain analyze output.
>
> But as long as we're guessing, perhaps it's doing a sequential scan on
> one of the tables and each query is reading in new parts of the table
> until the whole table is in cache. Is this a machine with lots of RAM
> but a small setting for shared_buffers?

modmine_overlap_test=# explain analyse select count(*) from (select * FROM
locatedsequencefeatureoverlappingfeatures limit 1000000) AS a;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=478847.24..478847.25 rows=1 width=0)
             (actual time=27546.424..27546.428 rows=1 loops=1)
    ->  Limit  (cost=0.01..466347.23 rows=1000000 width=8)
               (actual time=0.104..24349.407 rows=1000000 loops=1)
          ->  Nested Loop
                       (cost=0.01..9138533.31 rows=19595985 width=8)
                       (actual time=0.099..17901.571 rows=1000000 loops=1)
                Join Filter: (l1.subjectid <> l2.subjectid)
                ->  Seq Scan on location l1
                       (cost=0.00..90092.22 rows=4030122 width=16)
                       (actual time=0.013..11.467 rows=3396 loops=1)
                ->  Index Scan using location_object_bioseg on location l2
                       (cost=0.01..1.46 rows=35 width=16)
                       (actual time=0.130..3.339 rows=295 loops=3396)
                      Index Cond: ((l2.objectid = l1.objectid) AND (bioseg_create(l1.intermine_start, l1.intermine_end)
&&bioseg_create(l2.intermine_start, l2.intermine_end))) 
  Total runtime: 27546.534 ms
(8 rows)

Time: 27574.164 ms

It is certainly doing a sequential scan. So are you saying that it will
start a sequential scan from a different part of the table each time, even
in the absence of other simultaneous sequential scans? Looks like I'm
going to have to remove the limit to get sensible results - I only added
that to make the query return in a sensible time for performance testing.

Some trivial testing with "select * from location limit 10;" indicates
that it starts the sequential scan in the same place each time - but is
this different from the above query?

To answer your question:

shared_buffers = 450MB
Machine has 16GB or RAM
The location table is 389 MB
The location_object_bioseg index is 182 MB

Matthew

--
 What goes up must come down. Ask any system administrator.


В списке pgsql-performance по дате сообщения:

От: Merlin Moncure
Дата:
Сообщение: Re: PostgreSQL 8.4 performance tuning questions
От: Scott Carey
Дата:
Сообщение: Re: PostgreSQL 8.4 performance tuning questions