Обсуждение: explanation of some configs

Поиск
Список
Период
Сортировка

explanation of some configs

От
Thomas Finneid
Дата:
Hi

I am having some problem understanding the details of a couple of
configuration directives. Here  is what I think I understand it so far:

effective_cache_size
    - specifies in how many B/KB/MB will be used to cache data
      between pg and the OS?
    - what is actually cached and how it is actually used by the
           planner and how does it affect the planner and the queries?
      -in other words, when and why do I need to set this parameter?

checkpoint_segments
    - specifies the number of segments?
      size: number*16MB?
    - means when number of WAL segments has filled up and matches
           checkpoint segment, dirty data is written to disk?
    - so it specifies how much data is stored in memory/wal before
           written to disk and therefore affects recoverability?

regards

thomas

Re: explanation of some configs

От
"Joshua D. Drake"
Дата:
On Sat, 2009-02-07 at 01:52 +0100, Thomas Finneid wrote:
> Hi
>
> I am having some problem understanding the details of a couple of
> configuration directives. Here  is what I think I understand it so far:
>
> effective_cache_size
>     - specifies in how many B/KB/MB will be used to cache data
>       between pg and the OS?
>     - what is actually cached and how it is actually used by the
>            planner and how does it affect the planner and the queries?
>       -in other words, when and why do I need to set this parameter?

This is just a hint to tell the planner how much cache will generally be
available. The number should be reflective of your shared buffers +
available operating system cache. If you database is postgresql only you
can generally set this very high 75% of available ram. If not then you
need to tone it down.

>
> checkpoint_segments
>     - specifies the number of segments?

The number of segments that will be used before a checkpoint is forced.

>       size: number*16MB?

Yes but they are not preallocated.

>     - means when number of WAL segments has filled up and matches
>            checkpoint segment, dirty data is written to disk?

A checkpoint occurs.

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: explanation of some configs

От
Thomas Finneid
Дата:
Joshua D. Drake wrote:
>> On Sat, 2009-02-07 at 01:52 +0100, Thomas Finneid wrote:
 >>
>> effective_cache_size
 >>
> This is just a hint to tell the planner how much cache will generally be
> available.

ok, but available for what?

> The number should be reflective of your shared buffers +
> available operating system cache. If you database is postgresql only you
> can generally set this very high 75% of available ram. If not then you
> need to tone it down.

So that means, if I have 8GB ram and have set shared buffer to e.g. 4GB,
I should set the effective_cache_size to at least 4GB otherwise the
planner will assume I dont have as much memory available so it would be
sort of pointless so set shared_buffer to 4GB?


>> checkpoint_segments
>>     - specifies the number of segments?
>
> The number of segments that will be used before a checkpoint is forced.

So to sum it up:

- Specifies the number of memory segments the WAL will use before a
checkpoint occur. (A checkpoint guarantees the data has been written to
disk, including dirty pages.)
- A segment is 16MB and the number of actually used segments are dynamic.
- If this number is too low or the transaction is large, PG will spend
more time on performing checkpoint operations which decreases performance.

Q1:

So checkpoint_time is then just another way of expressing the same?
I.e. to ensure that if the segments have not been filled, which would
foce a checkpoint, a checkpoint is at least forced at the specified time
lapse?

Q2:

So how does this relate to WAL buffers? It seems to me that wal_buffers
are not needed. Based on the above explanation.


regards

thomas

Re: explanation of some configs

От
justin
Дата:
Thomas Finneid wrote:
> Joshua D. Drake wrote:
>>> On Sat, 2009-02-07 at 01:52 +0100, Thomas Finneid wrote:
> >>
>>> effective_cache_size
> >>
>> This is just a hint to tell the planner how much cache will generally be
>> available.
>
> ok, but available for what?
for storing the data/tables/rows in memory so it does not have the disk
subsystem.
>
>> The number should be reflective of your shared buffers +
>> available operating system cache. If you database is postgresql only you
>> can generally set this very high 75% of available ram. If not then you
>> need to tone it down.
>
> So that means, if I have 8GB ram and have set shared buffer to e.g.
> 4GB, I should set the effective_cache_size to at least 4GB otherwise
> the planner will assume I dont have as much memory available so it
> would be sort of pointless so set shared_buffer to 4GB?
No because other parts of Postgresql use the shared_buffer to store
tables/data in memory.  If shared_buffer is set low then the system will
be during more disk IO as it can't fit that much into memory.  Another
critical setting to look at is work_mem where all the complex sorting,
and joins are done which is not related to shared buffers.  So leave
room in memory for these processes
>
>
>>> checkpoint_segments
>>>     - specifies the number of segments?
>>
>> The number of segments that will be used before a checkpoint is forced.
>
> So to sum it up:
>
> - Specifies the number of memory segments the WAL will use before a
> checkpoint occur. (A checkpoint guarantees the data has been written
> to disk, including dirty pages.)
> - A segment is 16MB and the number of actually used segments are dynamic.
> - If this number is too low or the transaction is large, PG will spend
> more time on performing checkpoint operations which decreases
> performance.
>
> Q1:
>
> So checkpoint_time is then just another way of expressing the same?
> I.e. to ensure that if the segments have not been filled, which would
> foce a checkpoint, a checkpoint is at least forced at the specified
> time lapse?
Yes and No  Checkpoint_time does forces a check point regardless if the
segment is full or not.  Checkpoint_segment is used to force a check
point  based on size.  In a big databases a checkpoint could get very
large before time had elapsed and if server cashed all that work would
be rolled back.

>
> Q2:
>
> So how does this relate to WAL buffers? It seems to me that
> wal_buffers are not needed. Based on the above explanation.
This is  number of pages in shared memory the Postgresql uses before WAL
is written to disk this is used to improve performance for large writes.

>
>
> regards
>
> thomas
>

Re: explanation of some configs

От
Robert Haas
Дата:
>>> effective_cache_size
>>
>> This is just a hint to tell the planner how much cache will generally be
>> available.
>
> ok, but available for what?

The documentation on these parameters is really very good.

http://www.postgresql.org/docs/8.3/interactive/runtime-config-query.html
http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html

effective_cache_size doesn't actually reserve any memory, but it
enables the planner to know something about what will probably happen
when attempting to execute queries (and therefore generate better
plans).

>> The number should be reflective of your shared buffers +
>> available operating system cache. If you database is postgresql only you
>> can generally set this very high 75% of available ram. If not then you
>> need to tone it down.
>
> So that means, if I have 8GB ram and have set shared buffer to e.g. 4GB, I
> should set the effective_cache_size to at least 4GB otherwise the planner
> will assume I dont have as much memory available so it would be sort of
> pointless so set shared_buffer to 4GB?

You might want to look at pgtune as a starting point for tuning these settings.

http://pgfoundry.org/projects/pgtune/

But, yes, if you read the documentation (links above) it says to make
effective_cache_size equal to shared_buffers plus however much of the
operating system disk cache you expect PostgreSQL to get.

>>> checkpoint_segments
>>>        - specifies the number of segments?
>>
>> The number of segments that will be used before a checkpoint is forced.
>
> So to sum it up:
>
> - Specifies the number of memory segments the WAL will use before a
> checkpoint occur. (A checkpoint guarantees the data has been written to
> disk, including dirty pages.)
> - A segment is 16MB and the number of actually used segments are dynamic.
> - If this number is too low or the transaction is large, PG will spend more
> time on performing checkpoint operations which decreases performance.

I think this is pretty much right.  The WAL logs are always flushed to
disk right away (unless you fool with the asynchronous_commit or fsync
parameters), so you are not at risk of losing data even if the server
crashes before the next checkpoint.  But the checkpoints keep you from
accumulating too much WAL (which eats disk space and makes recovery
slower in the event of a crash).

> Q1:
>
> So checkpoint_time is then just another way of expressing the same?
> I.e. to ensure that if the segments have not been filled, which would foce a
> checkpoint, a checkpoint is at least forced at the specified time lapse?

http://www.postgresql.org/docs/8.3/interactive/wal-configuration.html

"The server's background writer process will automatically perform a
checkpoint every so often. A checkpoint is created every
checkpoint_segments log segments, or every checkpoint_timeout seconds,
whichever comes first."

> Q2:
>
> So how does this relate to WAL buffers? It seems to me that wal_buffers are
> not needed. Based on the above explanation.

wal_buffers are in-memory buffers that hold WAL that has not yet been
flushed to disk.  WAL segments are files hold the WAL that has been
written to disk but not yet recycled (perhaps because the
corresponding data blocks haven't yet been written out).

...Robert

Re: explanation of some configs

От
Matthew Wakeling
Дата:
On Sat, 7 Feb 2009, justin wrote:
> In a big databases a checkpoint could get very large before time had
> elapsed and if server cashed all that work would be rolled back.

No. Once you commit a transaction, it is safe (unless you play with fsync
or asynchronous commit). The size of the checkpoint is irrelevant.

You see, Postgres writes the data twice. First it writes the data to the
end of the WAL. WAL_buffers are used to buffer this. Then Postgres calls
fsync on the WAL when you commit the transaction. This makes the
transaction safe, and is usually fast because it will be sequential writes
on a disc. Once fsync returns, Postgres starts the (lower priority) task
of copying the data from the WAL into the data tables. All the un-copied
data in the WAL needs to be held in memory, and that is what
checkpoint_segments is for. When that gets full, then Postgres needs to
stop writes until the copying has freed up the checkpoint segments again.

Matthew

--
 Don't worry!  The world can't end today because it's already tomorrow
 in Australia.

Re: explanation of some configs

От
justin
Дата:
Matthew Wakeling wrote:
> On Sat, 7 Feb 2009, justin wrote:
>> In a big databases a checkpoint could get very large before time had
>> elapsed and if server cashed all that work would be rolled back.
>
> No. Once you commit a transaction, it is safe (unless you play with
> fsync or asynchronous commit). The size of the checkpoint is irrelevant.
>
> You see, Postgres writes the data twice. First it writes the data to
> the end of the WAL. WAL_buffers are used to buffer this. Then Postgres
> calls fsync on the WAL when you commit the transaction. This makes the
> transaction safe, and is usually fast because it will be sequential
> writes on a disc. Once fsync returns, Postgres starts the (lower
> priority) task of copying the data from the WAL into the data tables.
> All the un-copied data in the WAL needs to be held in memory, and that
> is what checkpoint_segments is for. When that gets full, then Postgres
> needs to stop writes until the copying has freed up the checkpoint
> segments again.
>
> Matthew
>
Well then we have conflicting instructions in places on
wiki.postgresql.org which links to this
http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html

Re: explanation of some configs

От
Robert Haas
Дата:
On Mon, Feb 9, 2009 at 10:44 AM, justin <justin@emproshunts.com> wrote:
> Matthew Wakeling wrote:
>>
>> On Sat, 7 Feb 2009, justin wrote:
>>>
>>> In a big databases a checkpoint could get very large before time had
>>> elapsed and if server cashed all that work would be rolled back.
>>
>> No. Once you commit a transaction, it is safe (unless you play with fsync
>> or asynchronous commit). The size of the checkpoint is irrelevant.
>>
>> You see, Postgres writes the data twice. First it writes the data to the
>> end of the WAL. WAL_buffers are used to buffer this. Then Postgres calls
>> fsync on the WAL when you commit the transaction. This makes the transaction
>> safe, and is usually fast because it will be sequential writes on a disc.
>> Once fsync returns, Postgres starts the (lower priority) task of copying the
>> data from the WAL into the data tables. All the un-copied data in the WAL
>> needs to be held in memory, and that is what checkpoint_segments is for.
>> When that gets full, then Postgres needs to stop writes until the copying
>> has freed up the checkpoint segments again.
>>
>> Matthew
>>
> Well then we have conflicting instructions in places on wiki.postgresql.org
> which links to this
> http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html

Yes, I think the explanation of checkpoint_segments on that page is
simply wrong (though it could be true to a limited extent if you have
synchronous_commit turned off).

...Robert

Re: explanation of some configs

От
Matthew Wakeling
Дата:
On Mon, 9 Feb 2009, justin wrote:
> Well then we have conflicting instructions in places on wiki.postgresql.org
> which links to this
> http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html

Could you be a little more specific as to which sections conflict?

Matthew

--
 The only secure computer is one that's unplugged, locked in a safe,
 and buried 20 feet under the ground in a secret location...and i'm not
 even too sure about that one.                         --Dennis Huges, FBI

Re: explanation of some configs

От
justin
Дата:
<br /> checkpoint_segments<br /> Maximum distance between automatic WAL checkpoints, in log file segments (each segment
isnormally 16 megabytes).<br /><br /> Increase these settings if your database has lots of large batch writes to
decreasethe frequency of checkpoints (and thus lower disk activity). Decrease them if you are short on disk space or
yourenvironment has a significant risk of <u><b>unexpected power-outs, as any un-checkpointed transactions will dropped
onrestart.</b></u><br /><br /> Matthew Wakeling wrote: <blockquote
cite="mid:alpine.DEB.2.00.0902101208460.3995@aragorn.flymine.org"type="cite">On Mon, 9 Feb 2009, justin wrote: <br
/><blockquotetype="cite">Well then we have conflicting instructions in places on wiki.postgresql.org which links to
this<br /><a class="moz-txt-link-freetext"
href="http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html">http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html</a><br
/></blockquote><br/> Could you be a little more specific as to which sections conflict? <br /><br /> Matthew <br /><br
/></blockquote>

Re: explanation of some configs

От
Greg Smith
Дата:
On Tue, 10 Feb 2009, justin wrote:

> http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html
> checkpoint_segments
> Maximum distance between automatic WAL checkpoints, in log file segments (each
> segment is normally 16 megabytes).
>
> Increase these settings if your database has lots of large batch writes to
> decrease the frequency of checkpoints (and thus lower disk activity). Decrease
> them if you are short on disk space or your environment has a significant risk
> of unexpected power-outs, as any un-checkpointed transactions will dropped on
> restart.

You mentioned getting to there via
http://wiki.postgresql.org/wiki/Performance_Optimization

You'll note that link is disclaimed with "older version of material
covered in the GUC tour", and it's aimed at PostgreSQL 7.4.  If you read
the current version of that document at
http://www.pgcon.org/2008/schedule/attachments/44_annotated_gucs_draft1.pdf
you'll see that text you've noted is incorrect isn't there anymore.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: explanation of some configs

От
justin
Дата:
Greg Smith wrote:
> On Tue, 10 Feb 2009, justin wrote:
>
>> http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html
>> checkpoint_segments
>> Maximum distance between automatic WAL checkpoints, in log file
>> segments (each
>> segment is normally 16 megabytes).
>>
>> Increase these settings if your database has lots of large batch
>> writes to
>> decrease the frequency of checkpoints (and thus lower disk activity).
>> Decrease
>> them if you are short on disk space or your environment has a
>> significant risk
>> of unexpected power-outs, as any un-checkpointed transactions will
>> dropped on
>> restart.
>
> You mentioned getting to there via
> http://wiki.postgresql.org/wiki/Performance_Optimization
>
> You'll note that link is disclaimed with "older version of material
> covered in the GUC tour", and it's aimed at PostgreSQL 7.4.  If you
> read the current version of that document at
> http://www.pgcon.org/2008/schedule/attachments/44_annotated_gucs_draft1.pdf
>
> you'll see that text you've noted is incorrect isn't there anymore.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Not to be overly nick picking where is the version called out that it
applies to.  Stating Older version is vague

If this new version annotated config file is correct then I can add a
new page detailing out the the different annotated config for the
different versions to the wiki.

Re: explanation of some configs

От
Greg Smith
Дата:
On Tue, 10 Feb 2009, justin wrote:

> Not to be overly nick picking where is the version called out that it applies
> to.  Stating Older version is vague

It's at the bottom of the document.  I just updated the "Performance
Optimization" page to reflect that.  One of those things I keep meaning to
do is tag all of the documents on that page with the associated verison
number that was current when they were written; many of them cover V8.0 or
before.  It's not trivial to figure that out in all cases, for some you
need to know a bit about how the configuration parameters changed in order
to guess which version the advice applied to.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: explanation of some configs

От
justin
Дата:
Greg Smith wrote:
> On Tue, 10 Feb 2009, justin wrote:
>
>> Not to be overly nick picking where is the version called out that it
>> applies to.  Stating Older version is vague
>
> It's at the bottom of the document.  I just updated the "Performance
> Optimization" page to reflect that.  One of those things I keep
> meaning to do is tag all of the documents on that page with the
> associated verison number that was current when they were written;
> many of them cover V8.0 or before.  It's not trivial to figure that
> out in all cases, for some you need to know a bit about how the
> configuration parameters changed in order to guess which version the
> advice applied to.
I was thinking something like this
http://wiki.postgresql.org/wiki/Annotated_config

So in the future all we have to do is add new entries to this page once
annotated config file is updated with the correct advice.