Re: Seq scans roadmap

Поиск
Список
Период
Сортировка
От CK Tan
Тема Re: Seq scans roadmap
Дата
Msg-id 30E8D12C-C5C1-48DA-BF06-08353C398C35@greenplum.com
обсуждение исходный текст
Ответ на Re: Seq scans roadmap  (Heikki Linnakangas <heikki@enterprisedb.com>)
Ответы Re: Seq scans roadmap  ("Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at>)
Список pgsql-hackers
Hi,

In reference to the seq scans roadmap, I have just submitted a patch  
that addresses some of the concerns.

The patch does this:

1. for small relation (smaller than 60% of bufferpool), use the  
current logic
2. for big relation:- use a ring buffer in heap scan- pin first 12 pages when scan starts- on consumption of every
4-page,read and pin the next 4-page- invalidate used pages of in the scan so they do not force out  
 
other useful pages

4 files changed:
bufmgr.c, bufmgr.h, heapam.c, relscan.h

If there are interests, I can submit another scan patch that returns  
N tuples at a time, instead of current one-at-a-time interface. This  
improves code locality and further improve performance by another  
10-20%.

For TPCH 1G tables, we are seeing more than 20% improvement in scans  
on the same hardware.

------------------------------------------------------------------------ 
-
----- PATCHED VERSION
------------------------------------------------------------------------ 
-
gptest=# select count(*) from lineitem;  count
---------
6001215
(1 row)

Time: 2117.025 ms

------------------------------------------------------------------------ 
-
----- ORIGINAL CVS HEAD VERSION
------------------------------------------------------------------------ 
-
gptest=# select count(*) from lineitem;  count
---------
6001215
(1 row)

Time: 2722.441 ms


Suggestions for improvement are welcome.

Regards,
-cktan
Greenplum, Inc.

On May 8, 2007, at 5:57 AM, Heikki Linnakangas wrote:

> Luke Lonergan wrote:
>>> What do you mean with using readahead inside the heapscan?  
>>> Starting an async read request?
>> Nope - just reading N buffers ahead for seqscans.  Subsequent  
>> calls use
>> previously read pages.  The objective is to issue contiguous reads to
>> the OS in sizes greater than the PG page size (which is much smaller
>> than what is needed for fast sequential I/O).
>
> Are you filling multiple buffers in the buffer cache with a single  
> read-call? The OS should be doing readahead for us anyway, so I  
> don't see how just issuing multiple ReadBuffers one after each  
> other helps.
>
>> Yes, I think the ring buffer strategy should be used when the  
>> table size
>> is > 1 x bufcache and the ring buffer should be of a fixed size  
>> smaller
>> than L2 cache (32KB - 128KB seems to work well).
>
> I think we want to let the ring grow larger than that for updating  
> transactions and vacuums, though, to avoid the WAL flush problem.
>
> -- 
>   Heikki Linnakangas
>   EnterpriseDB   http://www.enterprisedb.com
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>




В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Fetter
Дата:
Сообщение: Problem with CREATE LANGUAGE in CVS TIP
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Re: [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory