Обсуждение: probably cause (and fix) for floating-point assist faults on itanium

Поиск
Список
Период
Сортировка

probably cause (and fix) for floating-point assist faults on itanium

От
Greg Matthews
Дата:
Hi folks,

I'm running PG 8.3.15 on an itanium box and was seeing lots of
floating-point assist faults by the kernel. Searched around, found a
couple references/discussions here and there:

http://archives.postgresql.org/pgsql-general/2008-08/msg00244.php
http://archives.postgresql.org/pgsql-performance/2011-06/msg00093.php
http://archives.postgresql.org/pgsql-performance/2011-06/msg00102.php

I took up Tom's challenge and found that the buffer allocation prediction
code in BgBufferSync() is the likely culprit:

     if (smoothed_alloc <= (float) recent_alloc)
         smoothed_alloc = recent_alloc;
     else
         smoothed_alloc += ((float) recent_alloc - smoothed_alloc) /
             smoothing_samples;

smoothed_alloc (float) is moving towards 0 during any extended period of
time when recent_alloc (uint32) remains 0. In my case it takes just a
minute or two before it becomes small enough to start triggering the
fault.

Given how smoothed_alloc is used just after this place in the code it
seems overkill to allow it to continue to shrink so small, so I made a
little mod:

     if (smoothed_alloc <= (float) recent_alloc)
         smoothed_alloc = recent_alloc;
     else if (smoothed_alloc >= 0.00001)
         smoothed_alloc += ((float) recent_alloc - smoothed_alloc) /
             smoothing_samples;


This seems to have done the trick. From what I can tell this section of
code is unchanged in 9.1.1 - perhaps in a future version a similar mod
could be made?

FWIW, I don't think it's really much of a performance impact for the
database, because if recent_alloc remains 0 for a long while it probably
means the DB isn't doing much anyway. However it is annoying when system
logs fill up, and the extra floating point handling may affect some other
process(es).

-Greg

Re: probably cause (and fix) for floating-point assist faults on itanium

От
Claudio Freire
Дата:
On Thu, Nov 17, 2011 at 10:07 PM, Greg Matthews
<gregory.a.matthews@nasa.gov> wrote:
>        if (smoothed_alloc <= (float) recent_alloc)
>                smoothed_alloc = recent_alloc;
>        else if (smoothed_alloc >= 0.00001)
>                smoothed_alloc += ((float) recent_alloc - smoothed_alloc) /
>                        smoothing_samples;
>

I don't think that logic is sound.

Rather,

       if (smoothed_alloc <= (float) recent_alloc) {
               smoothed_alloc = recent_alloc;
       } else {
               if (smoothed_alloc < 0.000001)
                   smoothed_alloc = 0;
               smoothed_alloc += ((float) recent_alloc - smoothed_alloc) /
                       smoothing_samples;
       }

Re: probably cause (and fix) for floating-point assist faults on itanium

От
Tom Lane
Дата:
Claudio Freire <klaussfreire@gmail.com> writes:
> On Thu, Nov 17, 2011 at 10:07 PM, Greg Matthews
> <gregory.a.matthews@nasa.gov> wrote:
>>        if (smoothed_alloc <= (float) recent_alloc)
>>                smoothed_alloc = recent_alloc;
>>        else if (smoothed_alloc >= 0.00001)
>>                smoothed_alloc += ((float) recent_alloc - smoothed_alloc) /
>>                        smoothing_samples;
>>

> I don't think that logic is sound.

> Rather,

>        if (smoothed_alloc <= (float) recent_alloc) {
>                smoothed_alloc = recent_alloc;
>        } else {
>                if (smoothed_alloc < 0.000001)
>                    smoothed_alloc = 0;
>                smoothed_alloc += ((float) recent_alloc - smoothed_alloc) /
>                        smoothing_samples;
>        }

The real problem with either of these is the cutoff number is totally
arbitrary.  I'm thinking of something like this:

    /*
     * Track a moving average of recent buffer allocations.  Here, rather than
     * a true average we want a fast-attack, slow-decline behavior: we
     * immediately follow any increase.
     */
    if (smoothed_alloc <= (float) recent_alloc)
        smoothed_alloc = recent_alloc;
    else
        smoothed_alloc += ((float) recent_alloc - smoothed_alloc) /
            smoothing_samples;

    /* Scale the estimate by a GUC to allow more aggressive tuning. */
    upcoming_alloc_est = smoothed_alloc * bgwriter_lru_multiplier;

+    /*
+    * If recent_alloc remains at zero for many cycles,
+    * smoothed_alloc will eventually underflow to zero, and the
+    * underflows produce annoying kernel warnings on some platforms.
+    * Once upcoming_alloc_est has gone to zero, there's no point in
+    * tracking smaller and smaller values of smoothed_alloc, so just
+    * reset it to exactly zero to avoid this syndrome.
+    */
+   if (upcoming_alloc_est == 0)
+       smoothed_alloc = 0;

    /*
     * Even in cases where there's been little or no buffer allocation
     * activity, we want to make a small amount of progress through the buffer


            regards, tom lane

Re: probably cause (and fix) for floating-point assist faults on itanium

От
Tom Lane
Дата:
Greg Matthews <gregory.a.matthews@nasa.gov> writes:
> Looks good to me. I built PG with this change, no kernel warnings after
> ~10 minutes of running. I'll continue to monitor but I think this fixes
> the syndrome. Thanks Tom.

Patch committed -- thanks for checking it.

            regards, tom lane

Re: probably cause (and fix) for floating-point assist faults on itanium

От
Greg Matthews
Дата:
Looks good to me. I built PG with this change, no kernel warnings after
~10 minutes of running. I'll continue to monitor but I think this fixes
the syndrome. Thanks Tom.

-Greg


On Fri, 18 Nov 2011, Tom Lane wrote:

> Claudio Freire <klaussfreire@gmail.com> writes:
>> On Thu, Nov 17, 2011 at 10:07 PM, Greg Matthews
>> <gregory.a.matthews@nasa.gov> wrote:
>>>        if (smoothed_alloc <= (float) recent_alloc)
>>>                smoothed_alloc = recent_alloc;
>>>        else if (smoothed_alloc >= 0.00001)
>>>                smoothed_alloc += ((float) recent_alloc - smoothed_alloc) /
>>>                        smoothing_samples;
>>>
>
>> I don't think that logic is sound.
>
>> Rather,
>
>>        if (smoothed_alloc <= (float) recent_alloc) {
>>                smoothed_alloc = recent_alloc;
>>        } else {
>>                if (smoothed_alloc < 0.000001)
>>                    smoothed_alloc = 0;
>>                smoothed_alloc += ((float) recent_alloc - smoothed_alloc) /
>>                        smoothing_samples;
>>        }
>
> The real problem with either of these is the cutoff number is totally
> arbitrary.  I'm thinking of something like this:
>
>    /*
>     * Track a moving average of recent buffer allocations.  Here, rather than
>     * a true average we want a fast-attack, slow-decline behavior: we
>     * immediately follow any increase.
>     */
>    if (smoothed_alloc <= (float) recent_alloc)
>        smoothed_alloc = recent_alloc;
>    else
>        smoothed_alloc += ((float) recent_alloc - smoothed_alloc) /
>            smoothing_samples;
>
>    /* Scale the estimate by a GUC to allow more aggressive tuning. */
>    upcoming_alloc_est = smoothed_alloc * bgwriter_lru_multiplier;
>
> +    /*
> +    * If recent_alloc remains at zero for many cycles,
> +    * smoothed_alloc will eventually underflow to zero, and the
> +    * underflows produce annoying kernel warnings on some platforms.
> +    * Once upcoming_alloc_est has gone to zero, there's no point in
> +    * tracking smaller and smaller values of smoothed_alloc, so just
> +    * reset it to exactly zero to avoid this syndrome.
> +    */
> +   if (upcoming_alloc_est == 0)
> +       smoothed_alloc = 0;
>
>    /*
>     * Even in cases where there's been little or no buffer allocation
>     * activity, we want to make a small amount of progress through the buffer
>
>
>             regards, tom lane
>