Обсуждение: Re: posix advises ...

Поиск
Список
Период
Сортировка

Re: posix advises ...

От
Abhijit Menon-Sen
Дата:
Hi Zoltán.

I was reading through your posix_fadvise patch, and I wanted to ask
about this change in particular:

> --- a/src/backend/executor/nodeIndexscan.c
> +++ b/src/backend/executor/nodeIndexscan.c
> @@ -290,7 +290,7 @@ ExecIndexEvalArrayKeys(ExprContext *econtext,
>         /* We want to keep the arrays in per-tuple memory */
>         oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
>  
> -       for (j = 0; j < numArrayKeys; j++)
> +       for (j = numArrayKeys-1; j >= 0; j--)
>         {
>                 ScanKey         scan_key = arrayKeys[j].scan_key;
>                 ExprState  *array_expr = arrayKeys[j].array_expr;

Why is this loop reversed? (I could have missed some discussion about
this, I just wanted to make sure it was intentional.)

I can confirm that the patch applies to HEAD, that the configure test
correctly #defines USE_POSIX_FADVISE, that it compiles cleanly, and it
looks basically sensible to me. The nodeBitmapHeapscan.c and iterator
changes need a second look by someone who understands the code better
than I do.

The documentation patch could use a little tweaking:

> +        <productname>PostgreSQL</productname> can give hints to POSIX compatible
> +        operating systems using posix_fadvise(2) to pre-read pages that will
> +        be needed in the near future. Reading the pages into the OS kernel's
> +        disk buffer will be done asynchronically while <productname>PostgreSQL</productname>
> +        works on pages which are already in memory. This may speed up bitmap scans
> +        considerably. This setting only applies to bitmap scans.
> +        Hinting for sequential scans is also used but no GUC is needed in this case.

I would suggest something like this instead:
   <productname>PostgreSQL</productname> can use posix_fadvise(2) to   tell the operating system about pages that it
knowswill be needed   in the near future. The performance of bitmap scans is considerably   improved when the kernel
pre-readssuch pages into its disk buffers.   This variable controls how many pages are marked for pre-reading at   a
timeduring a bitmap scan.
 

But I'm not convinced that this GUC is well-advised; at least, it needs
some advice about how to determine a sensible size for the parameter
(and maybe a better name). But is it really necessary at all?

Thanks.

-- ams


Re: posix advises ...

От
Gregory Stark
Дата:
"Abhijit Menon-Sen" <ams@oryx.com> writes:

> Hi Zoltán.
>
> I was reading through your posix_fadvise patch,

Actually Zoltan's patch was based on an earlier patch from me. The sections
you're highlighting here are from my original patch.

> and I wanted to ask about this change in particular:
>
>> --- a/src/backend/executor/nodeIndexscan.c
>> +++ b/src/backend/executor/nodeIndexscan.c
>> @@ -290,7 +290,7 @@ ExecIndexEvalArrayKeys(ExprContext *econtext,
>>         /* We want to keep the arrays in per-tuple memory */
>>         oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
>>
>> -       for (j = 0; j < numArrayKeys; j++)
>> +       for (j = numArrayKeys-1; j >= 0; j--)

> Why is this loop reversed? (I could have missed some discussion about
> this, I just wanted to make sure it was intentional.)

There was some discussion about this change and in fact if you look at CVS
HEAD you'll find it already applied. It probably doesn't make a big difference
at least in most cases but it seems more sensible to increment the least
significant key elements first since that maximizes the chances of fetching
index keys from the same index pages or nearby pages. Incrementing the most
significant index keys would maximize the distance we're jumpin around in the
index tree.

> But I'm not convinced that this GUC is well-advised; at least, it needs
> some advice about how to determine a sensible size for the parameter
> (and maybe a better name). But is it really necessary at all?

I'm not sure which version of my patch Zoltan's was based on. The later
versions of mine had a GUC named effective_spindle_count which I think is
nicely abstracted away from the implementation details. We do need something
like that because we have no way to determine how wide a raid stripe Postgres
is sitting on and the optimal read-ahead depends on that.

The alternative would be to measure the bandwith with various amounts of
read-ahead. But I fear that would be quite hard on a production system with
the amount of noise from other i/o. It's hard enough on an idle machine with
synthetic benchmarks.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!


Re: posix advises ...

От
Abhijit Menon-Sen
Дата:
At 2008-07-12 00:52:42 +0100, stark@enterprisedb.com wrote:
>
> There was some discussion about this change and in fact if you
> look at CVS HEAD you'll find it already applied.

Not as far as I can see.

> Incrementing the most significant index keys would maximize the
> distance we're jumpin around in the index tree.

I see. Thanks.

> The later versions of mine had a GUC named effective_spindle_count
> which I think is nicely abstracted away from the implementation
> details.

Yes, that does sound much better. (The patch I read had a
preread_pages_bitmapscan variable instead.)

-- ams


Re: posix advises ...

От
Tom Lane
Дата:
Abhijit Menon-Sen <ams@oryx.com> writes:
> At 2008-07-12 00:52:42 +0100, stark@enterprisedb.com wrote:
>> There was some discussion about this change and in fact if you
>> look at CVS HEAD you'll find it already applied.

> Not as far as I can see.

The place where it matters is in ExecIndexAdvanceArrayKeys(),
not initial setup.
        regards, tom lane


Re: posix advises ...

От
Greg Smith
Дата:
On Sat, 12 Jul 2008, Abhijit Menon-Sen wrote:

> At 2008-07-12 00:52:42 +0100, stark@enterprisedb.com wrote:
>>
>> The later versions of mine had a GUC named effective_spindle_count
>> which I think is nicely abstracted away from the implementation
>> details.
>
> Yes, that does sound much better. (The patch I read had a
> preread_pages_bitmapscan variable instead.)

This patch does need a bit of general care in a couple of areas.  The 
reviewing game plan I'm working through goes like this:

1) Update the original fadvise test program Greg Stark wrote to be a bit 
easier to use for testing general compatibility of this approach.  I want 
to collect some data from at least two Linux and Solaris systems with 
different disk setups.

2) Check out effective_spindle_count and see if it looks like a reasonable 
way to tune this feature.  If so, will probably need to merge that in to 
Zoltan's version of the patch.  May need some other cleanup in that patch 
set as well--I'm not sure that closed XLOG patch that got pushed into here 
as well is really helpful for example.

3) Generate a sequential scan test program aimed to hobble the Linux 
kernel in the way Zoltan described as motivation for his work.  I'm 
working with Jeff Davis this week to try and repurpose some of his 
syncronized scan test programs to handle this while we're both in the same 
place for a bit.

4) Generate a bitmap scan test program to check the original patch.

5) If the performance results look useful and consistant, then move toward 
cleaning up broader compatibility issues like the segfault concerns Zoltan 
mentioned.

Going to take a while to work through all that, but performance patches 
with platform-specific benefit are always painful like this.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: posix advises ...

От
Alvaro Herrera
Дата:
Greg Smith wrote:

> This patch does need a bit of general care in a couple of areas.  The  
> reviewing game plan I'm working through goes like this:

Did this review effort go anywhere?


-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: posix advises ...

От
Greg Smith
Дата:
On Sun, 31 Aug 2008, Alvaro Herrera wrote:

> Greg Smith wrote:
>
>> This patch does need a bit of general care in a couple of areas.  The
>> reviewing game plan I'm working through goes like this:
>
> Did this review effort go anywhere?

Haven't made much progress--all my spare time for work like this lately 
has been stuck fighting broken hardware and OS issues on my test systems. 
I have a lot more time set aside for PostgreSQL hacking this month, and 
I'll look at that and the checkpoint block sorting patch I've also been 
tinkering with soon and kick back an initial review of each.

But if anybody else is keen on working on either patch, let me know.  Be 
glad to assist instead of lead.  It takes fairly serious hardware to work 
on either of these usefully though.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD