Обсуждение: Add free-behind capability for large sequential scans

Поиск
Список
Период
Сортировка

Add free-behind capability for large sequential scans

От
Amit Kumar Khare
Дата:
Hi All,

(1)I am Amit Kumar Khare, I am doing MCS from UIUC USA
offcampus from India.

(2) We have been asked to enhance postgreSQL in one of
our assignments. So I
have chosen to pick "Add free-behind capability for
large sequential scans"
from TODO list. Many thanks to Mr. Bruce Momjian who
helped me out and
suggested to make a patch for this problem.

(3)As explained to me by Mr. Bruce, the problem
description is that if say
cache size is 1 mb and a sequential scan is done
through a 2mb file over and
over again the cache becomes useless.Because by the
time the second read of
the table happens the first 1mb has been forced out of
the cache already.Thus
the idea is not to cache very large sequential scans,
but to cache index scans
small sequential scans.

(4)what I think the problem arises because of default
LRU page replacement
policy. So I think we have to make use of MRU or LRU-K
page replacement
policies.

(5)But I am not sure and I wish more input into the
problem description from
you all. I have started reading the buffer manager
code and I found that
freelist.c may be needed to be modified and may be
some other too since we
have to identify the large sequential scans.

Please help me out

Regards
Amit Kumar Khare


__________________________________________________
Do You Yahoo!?
Send FREE Valentine eCards with Yahoo! Greetings!
http://greetings.yahoo.com


Re: Add free-behind capability for large sequential scans

От
Tom Lane
Дата:
Amit Kumar Khare <skamit2000@yahoo.com> writes:
> (4)what I think the problem arises because of default LRU page
> replacement policy. So I think we have to make use of MRU or LRU-K
> page replacement policies.

> (5)But I am not sure and I wish more input into the problem
> description from you all. I have started reading the buffer manager
> code and I found that freelist.c may be needed to be modified and may
> be some other too since we have to identify the large sequential
> scans.

I do not think it's a good idea for the buffer manager to directly try
to recognize sequential scans; any such attempt will fall down on the
fundamental problem that there may be more than one backend accessing
the same table concurrently.  Plus it would introduce undesirable
coupling between the buffer manager and higher-level code.  I like the
idea of using LRU-K or other advanced page replacement policies,
instead of plain LRU.

I did some experimentation with LRU-2 awhile back and didn't see any
measurable performance improvement in the small number of test cases
I tried.  But I was not looking at the issue of cache-flushing caused
by large seqscans (the test cases I tried probably didn't do any
seqscans at all).  It's quite possible that that's a sufficient reason
to adopt LRU-2 anyway.
        regards, tom lane


Re: Add free-behind capability for large sequential scans

От
Amit Kumar Khare
Дата:
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Kumar Khare <skamit2000@yahoo.com> writes:
> > (4)what I think the problem arises because of
> default LRU page
> > replacement policy. So I think we have to make use
> of MRU or LRU-K
> > page replacement policies.
> 
> > (5)But I am not sure and I wish more input into
> the problem
> > description from you all. I have started reading
> the buffer manager
> > code and I found that freelist.c may be needed to
> be modified and may
> > be some other too since we have to identify the
> large sequential
> > scans.
> 
> I do not think it's a good idea for the buffer
> manager to directly try
> to recognize sequential scans; any such attempt will
> fall down on the
> fundamental problem that there may be more than one
> backend accessing
> the same table concurrently.  Plus it would
> introduce undesirable
> coupling between the buffer manager and higher-level
> code.  I like the
> idea of using LRU-K or other advanced page
> replacement policies,
> instead of plain LRU.

Sir, What I have in my mind is some thing like what
Hong-Tai Chou and David J. DeWitt proposes in there
paper titled "An Evaluation of Buffer Manaagement
Strategies for Relational Database Systems", when talk
about there "DBMIN" algorithm.

The problem is same here(Please correct me if I am
wrong) what they talk of. They have recognized like
Professor Stonebraker certain Access patterns (like
clustered sequential, looping sequential etc.)in
Database Systems and recomend a "Composite page
replacement policy". But how the buffer manager will
know what policy has to be applied? They say "When a
file is opened, its associated locality set size and
REPLACEMENT POLICY are given to the buffer manager".

I am thinking of implementing similar strategy. Since
Planner/Optimizer hand over the execution plan to the
executor it can also pass the information regarding
page replacement. Then executor can pass this
information through 
heapam->relcache->smgr -> bufmgr -> finally to
freelist.c  (I may be wrong in the sequence, this
conclusion is from my first study of code. I know I
have to go through it over and over again to get hang
of it)

Hence as you said we can avoid the undesirable
coupling between the buffer manager and higher-level
code.

Sir, is this scheme feasible? if not then can you
guide me ?

Amit khare
> 
> I did some experimentation with LRU-2 awhile back
> and didn't see any
> measurable performance improvement in the small
> number of test cases
> I tried.  But I was not looking at the issue of
> cache-flushing caused
> by large seqscans (the test cases I tried probably
> didn't do any
> seqscans at all).  It's quite possible that that's a
> sufficient reason
> to adopt LRU-2 anyway.
>             regards, tom lane
> 
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the
> unregister command
>     (send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)


__________________________________________________
Do You Yahoo!?
Send FREE Valentine eCards with Yahoo! Greetings!
http://greetings.yahoo.com


Re: Add free-behind capability for large sequential scans

От
Neil Padgett
Дата:
On Wed, 2002-02-13 at 03:11, Amit Kumar Khare wrote:

[clip]

> The problem is same here(Please correct me if I am
> wrong) what they talk of. They have recognized like
> Professor Stonebraker certain Access patterns (like
> clustered sequential, looping sequential etc.)in
> Database Systems and recomend a "Composite page
> replacement policy". But how the buffer manager will
> know what policy has to be applied? They say "When a
> file is opened, its associated locality set size and
> REPLACEMENT POLICY are given to the buffer manager".

This is indeed one possibility. However, the problem, as pointed out in
[1] is that in multi-user situations, getting sane results from query
execution analysis is hard. The real problem is -- how do you handle the
interaction of multiple simultaneous queries with the buffer pool? 

This problem led for a search for a new approach, which in turn led to
simpler algorithms, like LRU-K [1] and 2Q [2]. I'd much rather we pursue
algorithms of this type.

Neil

[1] E.H. O'Neil, P.E. O'Neil, and G. Weikum. The LRU-K page replacement
algorithm for database disk buffering. _In Proceeedings of the 1993 ACM
Sigmod International Conference on Management of Data_, pages 297-306,
1993.

[2] Theodore Johnson and Dennis Shasha. 2Q: A Low Overhead High
Performace Buffer Amanagement Replacement algorithm. In _Proceedings of
the 20th VLDB Conference_, pages 439-450, 1994.

-- 
Neil Padgett
Red Hat Canada Ltd.                       E-Mail:  npadgett@redhat.com
2323 Yonge Street, Suite #300, 
Toronto, ON  M4P 2C9



Re: Add free-behind capability for large sequential scans

От
Neil Padgett
Дата:
On Wed, 2002-02-13 at 12:44, Neil Padgett wrote:
> 
> [1] E.H. O'Neil, P.E. O'Neil, and G. Weikum. The LRU-K page replacement
> algorithm for database disk buffering. _In Proceeedings of the 1993 ACM
> Sigmod International Conference on Management of Data_, pages 297-306,
> 1993.
> 
> [2] Theodore Johnson and Dennis Shasha. 2Q: A Low Overhead High
> Performace Buffer Amanagement Replacement algorithm. In _Proceedings of
> the 20th VLDB Conference_, pages 439-450, 1994.

I've received some mail expressing interest in reading these papers. So,
I've grabbed some electronic copies from CiteSeer, and I've made them
available at:

http://people.redhat.com/npadgett/buffering-papers/

Neil

-- 
Neil Padgett
Red Hat Canada Ltd.                       E-Mail:  npadgett@redhat.com
2323 Yonge Street, Suite #300, 
Toronto, ON  M4P 2C9