Обсуждение: postgresql shared buffers

Поиск
Список
Период
Сортировка

postgresql shared buffers

От
Praveen Kumar N
Дата:
hai...
        I have installed postgresql from sourcecode.I would like to know
how pages are replaced in the bufferpool when we join two relations.I 
tried to trace it by editing files 
pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I
am missing some information after observing extracted information abt 
buffer replacement.My input datasize is 10times more than main memory/RAM 
size.When I joined two relations,postgresql accessed both relations 
sequentially one by one and that too only once.Then how is it joining two 
relations by accessing only once? Is it storing that accessed relations 
some where other than main memory/bufferpool(Becos they cant fit into main 
memory).

So can anybdy tell me is there is ne thing I am 
missing? Is there any concept like postgresql cache similar to kernel 
cache otherthan sharedbuffers.If so how can we figure it out.

Is there any way by which postgresql is accessing database 
relations through,other than rotines in bufmgr.c nd freelist.c(I 
mean any other routines like ReadBuffer,StrategyGet etc.)

thanks in anticipation.

--  N Praveen Kumar Btech-IV CSE IIIT,Hyd AP,India
    Imagination is more important than knowledge...                                --Albert Einstein


Re: postgresql shared buffers

От
Heikki Linnakangas
Дата:
Praveen Kumar N wrote:
>         I have installed postgresql from sourcecode.I would like to know
> how pages are replaced in the bufferpool when we join two relations.I 
> tried to trace it by editing files 
> pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I
> am missing some information after observing extracted information abt 
> buffer replacement.My input datasize is 10times more than main 
> memory/RAM size.When I joined two relations,postgresql accessed both 
> relations sequentially one by one and that too only once.Then how is 
> it joining two relations by accessing only once? Is it storing that 
> accessed relations some where other than main memory/bufferpool(Becos 
> they cant fit into main memory).

What kind of a join is it? If it's a merge join, using indexes, it would 
only have to visit each heap page once.

> So can anybdy tell me is there is ne thing I am missing? Is there any 
> concept like postgresql cache similar to kernel cache otherthan 
> sharedbuffers.If so how can we figure it out.

No. All access to relations (except temporary relations) go through 
bufmgr and the shared memory buffer cache.

> Is there any way by which postgresql is accessing database relations 
> through,other than rotines in bufmgr.c nd freelist.c(I mean any other 
> routines like ReadBuffer,StrategyGet etc.)

No.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com



Re: postgresql shared buffers

От
Gregory Stark
Дата:
Praveen Kumar N <praveen_n@students.iiit.net> writes:

> hai...
>
>         I have installed postgresql from sourcecode.I would like to know
> how pages are replaced in the bufferpool when we join two relations.I tried to
> trace it by editing files
> pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I
> am missing some information after observing extracted information abt buffer
> replacement.

Try

explain select ...

The output may be enlightening.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: postgresql shared buffers

От
Praveen Kumar N
Дата:
Let me explain once more.

I have two relations which are 10 times more than bufferpool 
size.I have observed the following things when joined that two 
relations(it using merge join to join both relations)

1.It first accessed system catalog tables
2.Relation 1
3.Relation 2

my doubt is one whole relation cant fit in the main memory.That too when 
we use merge join, it should keep some part of 1st relations and should 
scan second relation as bufferpool size is less compared to size 
of each relation.similarly for the remainin part of 1st  relation.But 
it is not happening here.First whole Relation1 is scanned and then 
Relation 2 is scanned. Then how is it joining two relations using merge 
join? Am I missing something?
I traced scanning of relation by editing the functions ReadBuffer() and 
BufferAlloc(),StrategyGetBuffer().

I hope now it is clear.

thanks in anticipation.

bye


On Fri, 8 Sep 2006, Heikki Linnakangas wrote:

> Date: Fri, 08 Sep 2006 14:30:01 +0100
> From: Heikki Linnakangas <heikki@enterprisedb.com>
> To: Praveen Kumar N <praveen_n@students.iiit.net>
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] postgresql shared buffers
> 
> Praveen Kumar N wrote:
>>         I have installed postgresql from sourcecode.I would like to know
>> how pages are replaced in the bufferpool when we join two relations.I tried 
>> to trace it by editing files 
>> pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I
>> am missing some information after observing extracted information abt 
>> buffer replacement.My input datasize is 10times more than main memory/RAM 
>> size.When I joined two relations,postgresql accessed both relations 
>> sequentially one by one and that too only once.Then how is it joining two 
>> relations by accessing only once? Is it storing that accessed relations 
>> some where other than main memory/bufferpool(Becos they cant fit into main 
>> memory).
>
> What kind of a join is it? If it's a merge join, using indexes, it would only 
> have to visit each heap page once.
>
>> So can anybdy tell me is there is ne thing I am missing? Is there any 
>> concept like postgresql cache similar to kernel cache otherthan 
>> sharedbuffers.If so how can we figure it out.
>
> No. All access to relations (except temporary relations) go through bufmgr 
> and the shared memory buffer cache.
>
>> Is there any way by which postgresql is accessing database relations 
>> through,other than rotines in bufmgr.c nd freelist.c(I mean any other 
>> routines like ReadBuffer,StrategyGet etc.)
>
> No.
>
>

--  N Praveen Kumar Btech-IV CSE IIIT,Hyd AP,India
    Imagination is more important than knowledge...                                --Albert Einstein



Re: postgresql shared buffers

От
Heikki Linnakangas
Дата:
Praveen Kumar N wrote:
> Let me explain once more.
>
> I have two relations which are 10 times more than bufferpool size.I 
> have observed the following things when joined that two relations(it 
> using merge join to join both relations)
>
> 1.It first accessed system catalog tables
> 2.Relation 1
> 3.Relation 2
>
> my doubt is one whole relation cant fit in the main memory.That too 
> when we use merge join, it should keep some part of 1st relations and 
> should scan second relation as bufferpool size is less compared to 
> size of each relation.similarly for the remainin part of 1st  
> relation.But it is not happening here.First whole Relation1 is scanned 
> and then Relation 2 is scanned. Then how is it joining two relations 
> using merge join? Am I missing something?

Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what 
it really does, otherwise we're just guessing.

> I traced scanning of relation by editing the functions ReadBuffer() 
> and BufferAlloc(),StrategyGetBuffer().

That sounds valid.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com



Re: postgresql shared buffers

От
Alvaro Herrera
Дата:
Heikki Linnakangas wrote:
> Praveen Kumar N wrote:
> >Let me explain once more.
> >
> >I have two relations which are 10 times more than bufferpool size.I 
> >have observed the following things when joined that two relations(it 
> >using merge join to join both relations)
> >
> >1.It first accessed system catalog tables
> >2.Relation 1
> >3.Relation 2
> >
> >my doubt is one whole relation cant fit in the main memory.That too 
> >when we use merge join, it should keep some part of 1st relations and 
> >should scan second relation as bufferpool size is less compared to 
> >size of each relation.similarly for the remainin part of 1st  
> >relation.But it is not happening here.First whole Relation1 is scanned 
> >and then Relation 2 is scanned. Then how is it joining two relations 
> >using merge join? Am I missing something?
> 
> Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what 
> it really does, otherwise we're just guessing.

Another option would be that the tuples are written to a sort tape, I
think.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: postgresql shared buffers

От
Praveen Kumar N
Дата:
Following is the output of query i have executed.

praveen=# explain select count(*) from a_9000_0,b_9000_0 where a2=b2;                                     QUERY PLAN
------------------------------------------------------------------------------------ Aggregate
(cost=1924635.42..1924635.43rows=1 width=0)   ->  Merge Join  (cost=109515.42..1665435.42 rows=103680000 width=0)
 Merge Cond: (a_9000_0.a2 = b_9000_0.b2)         ->  Sort  (cost=54757.71..55117.71 rows=144000 width=260)
Sort Key: a_9000_0.a2               ->  Seq Scan on a_9000_0  (cost=0.00..6979.00 rows=144000 
 
width=260)         ->  Sort  (cost=54757.71..55117.71 rows=144000 width=260)               Sort Key: b_9000_0.b2
      ->  Seq Scan on b_9000_0  (cost=0.00..6979.00 rows=144000 
 
width=260)
(9 rows)




On Fri, 8 Sep 2006, Heikki Linnakangas wrote:

> Date: Fri, 08 Sep 2006 14:57:57 +0100
> From: Heikki Linnakangas <heikki@enterprisedb.com>
> To: Praveen Kumar N <praveen_n@students.iiit.net>
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] postgresql shared buffers
> 
> Praveen Kumar N wrote:
>> Let me explain once more.
>> 
>> I have two relations which are 10 times more than bufferpool size.I have 
>> observed the following things when joined that two relations(it using merge 
>> join to join both relations)
>> 
>> 1.It first accessed system catalog tables
>> 2.Relation 1
>> 3.Relation 2
>> 
>> my doubt is one whole relation cant fit in the main memory.That too when we 
>> use merge join, it should keep some part of 1st relations and should scan 
>> second relation as bufferpool size is less compared to size of each 
>> relation.similarly for the remainin part of 1st  relation.But it is not 
>> happening here.First whole Relation1 is scanned and then Relation 2 is 
>> scanned. Then how is it joining two relations using merge join? Am I 
>> missing something?
>
> Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what it 
> really does, otherwise we're just guessing.
>
>> I traced scanning of relation by editing the functions ReadBuffer() and 
>> BufferAlloc(),StrategyGetBuffer().
>
> That sounds valid.
>
>

--  N Praveen Kumar Btech-IV CSE IIIT,Hyd AP,India
    Imagination is more important than knowledge...                                --Albert Einstein



Re: postgresql shared buffers

От
Heikki Linnakangas
Дата:
Praveen Kumar N wrote:
>
> Following is the output of query i have executed.

Looks like Alvaro guessed right. It reads both relations in sequence, 
sorts them in temporary storage, outside bufmgr, and then does a merge 
join on the sorted inputs.

If you want to see the behavior you expected, I think you need to define 
indexes on a2 and b2, if you don't have them already, and coerce the 
planner to choose a nested loop join. I'd suggest using "SET 
enable_seqscan=false; SET enable_bitmapscan=false;" and see if that gets 
you a nested loop join.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com



Re: postgresql shared buffers

От
Praveen Kumar N
Дата:
On Fri, 8 Sep 2006, Heikki Linnakangas wrote:

> Date: Fri, 08 Sep 2006 15:22:19 +0100
> From: Heikki Linnakangas <heikki@enterprisedb.com>
> To: Praveen Kumar N <praveen_n@students.iiit.net>
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] postgresql shared buffers
> 
> Praveen Kumar N wrote:
>> 
>> Following is the output of query i have executed.
>
> Looks like Alvaro guessed right. It reads both relations in sequence, sorts 
> them in temporary storage, outside bufmgr, and then does a merge join on the 
> sorted inputs.

could you tell me how can we trace that? I mean which functions shall I 
checkout for that.


>
> If you want to see the behavior you expected, I think you need to define 
> indexes on a2 and b2, if you don't have them already, and coerce the planner 
> to choose a nested loop join. I'd suggest using "SET enable_seqscan=false; 
> SET enable_bitmapscan=false;" and see if that gets you a nested loop join.
>
>

--  N Praveen Kumar Btech-IV CSE IIIT,Hyd AP,India
    Imagination is more important than knowledge...                                --Albert Einstein



Re: postgresql shared buffers

От
Heikki Linnakangas
Дата:
Praveen Kumar N wrote:
> On Fri, 8 Sep 2006, Heikki Linnakangas wrote:
>
>> Looks like Alvaro guessed right. It reads both relations in sequence, 
>> sorts them in temporary storage, outside bufmgr, and then does a 
>> merge join on the sorted inputs.
>
> could you tell me how can we trace that? I mean which functions shall 
> I checkout for that.

The sort code is in src/backend/utils/sort/tuplesort.c and logtape.c. 
Can't remember function names from the top of my head.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com



Re: postgresql shared buffers

От
Praveen Kumar N
Дата:
one more doubt.in the following example we dont need sort right.But in 
this case also relations were scanned sequentially one by one.So is it the 
case that any relation is accessed only once from database while executing 
a given query?



praveen=# explain select count(*) from a_9000_0,b_9000_0;                                    QUERY PLAN
---------------------------------------------------------------------------------- Aggregate
(cost=537566595.00..537566595.01rows=1 width=0)   ->  Nested Loop  (cost=7616.00..485726595.00 rows=20736000000
width=0)        ->  Seq Scan on a_9000_0  (cost=0.00..6979.00 rows=144000 
 
width=0)         ->  Materialize  (cost=7616.00..9549.00 rows=144000 width=0)               ->  Seq Scan on b_9000_0
(cost=0.00..6979.00rows=144000 
 
width=0)
(5 rows)


Regards,
Praveen


On Fri, 8 Sep 2006, Heikki Linnakangas wrote:

> Date: Fri, 08 Sep 2006 15:30:37 +0100
> From: Heikki Linnakangas <heikki@enterprisedb.com>
> To: Praveen Kumar N <praveen_n@students.iiit.net>
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] postgresql shared buffers
> 
> Praveen Kumar N wrote:
>> On Fri, 8 Sep 2006, Heikki Linnakangas wrote:
>> 
>>> Looks like Alvaro guessed right. It reads both relations in sequence, 
>>> sorts them in temporary storage, outside bufmgr, and then does a merge 
>>> join on the sorted inputs.
>> 
>> could you tell me how can we trace that? I mean which functions shall I 
>> checkout for that.
>
> The sort code is in src/backend/utils/sort/tuplesort.c and logtape.c. Can't 
> remember function names from the top of my head.
>
>

--  N Praveen Kumar Btech-IV CSE IIIT,Hyd AP,India
    Imagination is more important than knowledge...                                --Albert Einstein



Re: postgresql shared buffers

От
Alvaro Herrera
Дата:
Praveen Kumar N wrote:
> 
> one more doubt.in the following example we dont need sort right.But in 
> this case also relations were scanned sequentially one by one.

In this case it's because the result from one seqscan was materialized.

> So is it the 
> case that any relation is accessed only once from database while executing 
> a given query?

Not in general -- you'll see that behavior only in particular cases.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.