Обсуждение: A micro-optimisation for walkdir()

Поиск
Список
Период
Сортировка

A micro-optimisation for walkdir()

От
Thomas Munro
Дата:
Hello hackers,

You don't need to call stat() just to find out if a dirent is a file
or directory, most of the time.  Please see attached.

Вложения

Re: A micro-optimisation for walkdir()

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> You don't need to call stat() just to find out if a dirent is a file
> or directory, most of the time.  Please see attached.

Hm.  If we do this, I can see wanting to apply the knowledge in more
places than walkdir().  Is it possible to extract out the nonstandard
bits into a reusable subroutine?  I'm envisioning an API more or less
like

   extern enum PGFileType identify_file_type(const char *path,
                                             const struct dirent *de,
                                             bool look_thru_symlinks);

            regards, tom lane



Re: A micro-optimisation for walkdir()

От
Juan José Santamaría Flecha
Дата:


On Wed, Sep 2, 2020 at 4:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
> You don't need to call stat() just to find out if a dirent is a file
> or directory, most of the time.  Please see attached.

Hm.  If we do this, I can see wanting to apply the knowledge in more
places than walkdir().

Win32 could also benefit from this micro-optimisation if we expanded the dirent port to include d_type. Please find attached a patch that does so.

Regards,

Juan José Santamaría Flecha
Вложения

Re: A micro-optimisation for walkdir()

От
Thomas Munro
Дата:
On Thu, Sep 3, 2020 at 3:52 AM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
> On Wed, Sep 2, 2020 at 4:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Thomas Munro <thomas.munro@gmail.com> writes:
>> > You don't need to call stat() just to find out if a dirent is a file
>> > or directory, most of the time.  Please see attached.
>>
>> Hm.  If we do this, I can see wanting to apply the knowledge in more
>> places than walkdir().

Good idea.  Here's a new version that defines a new function
get_dirent_type() in src/common/file_utils_febe.c and uses it for both
frontend and backend walkdir().

> Win32 could also benefit from this micro-optimisation if we expanded the dirent port to include d_type. Please find
attacheda patch that does so. 

Is GetFileAttributes() actually faster than stat()?  Oh, I see that
our pgwin32_safestat() makes extra system calls.  Huh.  Ok then.
Thanks!

Вложения

Re: A micro-optimisation for walkdir()

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
>> On Wed, Sep 2, 2020 at 4:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Hm.  If we do this, I can see wanting to apply the knowledge in more
>>> places than walkdir().

> Good idea.  Here's a new version that defines a new function
> get_dirent_type() in src/common/file_utils_febe.c and uses it for both
> frontend and backend walkdir().

Quick thoughts on this patch:

* The API spec for get_dirent_type() needs to say that errno is
meaningful when the return value is PGFILETYPE_ERROR.  That's
something that would not be hard to break, so not documenting
the point at all doesn't seem great.  More generally, I don't
especially like having the callers know that the errno is from
stat() rather than something else.

* I don't quite like the calling code you have that covers some
return values and then has a default: case without any comment.
It's not really obvious that the default: case is expected to be
hit in non-error situations, especially when there is a separate
switch path for errors.  I can't find fault with the code as such,
but I think it'd be good to have a comment there.  Maybe along
the lines of "Ignore file types other than regular files and
directories".

Both of these concerns would abate if we had get_dirent_type()
just throw an error itself when stat() fails, thereby removing the
PGFILETYPE_ERROR result code.  I'm not 100% sold either way on
that, but it's something to think about.  Is there ever going
to be a reason for the caller to ignore an error?

            regards, tom lane



Re: A micro-optimisation for walkdir()

От
Thomas Munro
Дата:
On Thu, Sep 3, 2020 at 5:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> [request for better comments]

Ack.

> Both of these concerns would abate if we had get_dirent_type()
> just throw an error itself when stat() fails, thereby removing the
> PGFILETYPE_ERROR result code.  I'm not 100% sold either way on
> that, but it's something to think about.  Is there ever going
> to be a reason for the caller to ignore an error?

Hmm.  Well I had it like that in an earlier version, but then I
couldn't figure out the right way to write code that would work in
both frontend and backend code, without writing two copies in two
translation units, or putting the whole thing in a header.  What
approach do you prefer?



Re: A micro-optimisation for walkdir()

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Thu, Sep 3, 2020 at 5:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Both of these concerns would abate if we had get_dirent_type()
>> just throw an error itself when stat() fails, thereby removing the
>> PGFILETYPE_ERROR result code.  I'm not 100% sold either way on
>> that, but it's something to think about.  Is there ever going
>> to be a reason for the caller to ignore an error?

> Hmm.  Well I had it like that in an earlier version, but then I
> couldn't figure out the right way to write code that would work in
> both frontend and backend code, without writing two copies in two
> translation units, or putting the whole thing in a header.  What
> approach do you prefer?

Well, there are plenty of places in src/port/ where we do things like

#ifndef FRONTEND
        ereport(ERROR,
                (errcode_for_file_access(),
                 errmsg("could not get junction for \"%s\": %s",
                        path, msg)));
#else
        fprintf(stderr, _("could not get junction for \"%s\": %s\n"),
                path, msg);
#endif

I don't see a compelling reason why this function couldn't report
stat() failures similarly, especially if we're just going to have
the callers do exactly the same thing as that anyway.

            regards, tom lane



Re: A micro-optimisation for walkdir()

От
Thomas Munro
Дата:
On Fri, Sep 4, 2020 at 3:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Hmm.  Well I had it like that in an earlier version, but then I
> > couldn't figure out the right way to write code that would work in
> > both frontend and backend code, without writing two copies in two
> > translation units, or putting the whole thing in a header.  What
> > approach do you prefer?
>
> Well, there are plenty of places in src/port/ where we do things like
>
> #ifndef FRONTEND
>         ereport(ERROR,
>                 (errcode_for_file_access(),
>                  errmsg("could not get junction for \"%s\": %s",
>                         path, msg)));
> #else
>         fprintf(stderr, _("could not get junction for \"%s\": %s\n"),
>                 path, msg);
> #endif
>
> I don't see a compelling reason why this function couldn't report
> stat() failures similarly, especially if we're just going to have
> the callers do exactly the same thing as that anyway.

Ok, so the main weird thing is that you finish up having to pass in an
elevel, but that has different meanings in FE and BE code.  Note that
you still need a PGFILE_ERROR return value, because we don't log
messages at a level that exits non-locally (and that concept doesn't
even exist for FE logging).

Вложения

Re: A micro-optimisation for walkdir()

От
Alvaro Herrera
Дата:
On 2020-Sep-04, Thomas Munro wrote:

> @@ -10,6 +10,7 @@ struct dirent
>  {
>      long        d_ino;
>      unsigned short d_reclen;
> +    unsigned char d_type;
>      unsigned short d_namlen;
>      char        d_name[MAX_PATH];
>  };
> @@ -20,4 +21,26 @@ DIR           *opendir(const char *);
>  struct dirent *readdir(DIR *);
>  int            closedir(DIR *);
>  
> +/* File types for 'd_type'. */
> +enum
> +  {
> +    DT_UNKNOWN = 0,
> +# define DT_UNKNOWN        DT_UNKNOWN

Uhm ... what do these #defines do?  They look a bit funny.

Would it make sense to give this enum a name, and then use that name in
struct dirent's definition, instead of unsigned char?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: A micro-optimisation for walkdir()

От
Juan José Santamaría Flecha
Дата:
On Fri, Sep 4, 2020 at 9:37 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Sep-04, Thomas Munro wrote:

> @@ -10,6 +10,7 @@ struct dirent
>  {
>       long            d_ino;
>       unsigned short d_reclen;
> +     unsigned char d_type;
>       unsigned short d_namlen;
>       char            d_name[MAX_PATH];
>  };
> @@ -20,4 +21,26 @@ DIR                   *opendir(const char *);
>  struct dirent *readdir(DIR *);
>  int                  closedir(DIR *);

> +/* File types for 'd_type'. */
> +enum
> +  {
> +     DT_UNKNOWN = 0,
> +# define DT_UNKNOWN          DT_UNKNOWN

Uhm ... what do these #defines do?  They look a bit funny.

Would it make sense to give this enum a name, and then use that name in
struct dirent's definition, instead of unsigned char?

They mimic POSIX dirent.h. I would rather stick to that.

Regards,

Juan José Santamaría Flecha   

Re: A micro-optimisation for walkdir()

От
Alvaro Herrera
Дата:
On 2020-Sep-04, Juan José Santamaría Flecha wrote:

> On Fri, Sep 4, 2020 at 9:37 PM Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
> 
> > On 2020-Sep-04, Thomas Munro wrote:

> > >
> > > +/* File types for 'd_type'. */
> > > +enum
> > > +  {
> > > +     DT_UNKNOWN = 0,
> > > +# define DT_UNKNOWN          DT_UNKNOWN
> >
> > Uhm ... what do these #defines do?  They look a bit funny.
> >
> > Would it make sense to give this enum a name, and then use that name in
> > struct dirent's definition, instead of unsigned char?
> 
> They mimic POSIX dirent.h. I would rather stick to that.

Ah ... they do?

If you remove the #define lines, what happens to your patch?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: A micro-optimisation for walkdir()

От
Juan José Santamaría Flecha
Дата:

On Fri, Sep 4, 2020 at 10:28 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Sep-04, Juan José Santamaría Flecha wrote:

> On Fri, Sep 4, 2020 at 9:37 PM Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
>
> > On 2020-Sep-04, Thomas Munro wrote:

> > >
> > > +/* File types for 'd_type'. */
> > > +enum
> > > +  {
> > > +     DT_UNKNOWN = 0,
> > > +# define DT_UNKNOWN          DT_UNKNOWN
> >
> > Uhm ... what do these #defines do?  They look a bit funny.
> >
> > Would it make sense to give this enum a name, and then use that name in
> > struct dirent's definition, instead of unsigned char?
>
> They mimic POSIX dirent.h. I would rather stick to that.

Ah ... they do?

If you remove the #define lines, what happens to your patch?

If will fail to detect that the patch makes the optimisation available for WIN32:

+#if defined(DT_UNKNOWN) && defined(DT_REG) && defined(DT_DIR) && defined(DT_LNK)

Regards,

Juan José Santamaría Flecha

Re: A micro-optimisation for walkdir()

От
Alvaro Herrera
Дата:
On 2020-Sep-04, Juan José Santamaría Flecha wrote:

> If will fail to detect that the patch makes the optimisation available for
> WIN32:
> 
> +#if defined(DT_UNKNOWN) && defined(DT_REG) && defined(DT_DIR) &&
> defined(DT_LNK)

Oh, I see.  I suggest that it'd be better to change this line instead.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: A micro-optimisation for walkdir()

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2020-Sep-04, Juan José Santamaría Flecha wrote:
>> If will fail to detect that the patch makes the optimisation available for
>> WIN32:
>> 
>> +#if defined(DT_UNKNOWN) && defined(DT_REG) && defined(DT_DIR) &&
>> defined(DT_LNK)

> Oh, I see.  I suggest that it'd be better to change this line instead.

I think that it's standard to test for such symbols by seeing
if they're defined as macros ... not least because that's the *only*
way to test their existence in C.

Personally, what I'd do is lose the enum and just define the macros
with simple integer constant values.

            regards, tom lane



Re: A micro-optimisation for walkdir()

От
Andres Freund
Дата:
Hi,

On 2020-09-02 17:51:27 +0200, Juan José Santamaría Flecha wrote:
> Win32 could also benefit from this micro-optimisation if we expanded the
> dirent port to include d_type. Please find attached a patch that does
> so
.
>      }
>      strcpy(d->ret.d_name, fd.cFileName);    /* Both strings are MAX_PATH long */
>      d->ret.d_namlen = strlen(d->ret.d_name);
> +    /*
> +     * The only identifed types are: directory, regular file or symbolic link.
> +     * Errors are treated as a file type that could not be determined.
> +     */
> +    attrib = GetFileAttributes(d->ret.d_name);
> +    if (attrib == INVALID_FILE_ATTRIBUTES)
> +        d->ret.d_type = DT_UNKNOWN;
> +    else if ((attrib & FILE_ATTRIBUTE_DIRECTORY) != 0)
> +        d->ret.d_type = DT_DIR;
> +    else if ((attrib & FILE_ATTRIBUTE_REPARSE_POINT) != 0)
> +        d->ret.d_type = DT_LNK;
> +    else
> +        d->ret.d_type = DT_REG;
>  
>      return &d->ret;
>  }

Is this really an optimization? The benefit of Thomas' patch is that
that information sometimes already is there. But here you're doing a
separate lookup with GetFileAttributes()?

What am I missing?

Greetings,

Andres Freund



Re: A micro-optimisation for walkdir()

От
Alvaro Herrera
Дата:
On 2020-Sep-04, Tom Lane wrote:

> I think that it's standard to test for such symbols by seeing
> if they're defined as macros ... not least because that's the *only*
> way to test their existence in C.

I guess since what we're doing is emulating standard readdir(), that
makes sense.

> Personally, what I'd do is lose the enum and just define the macros
> with simple integer constant values.

WFM.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: A micro-optimisation for walkdir()

От
Thomas Munro
Дата:
On Sat, Sep 5, 2020 at 9:45 AM Andres Freund <andres@anarazel.de> wrote:
> On 2020-09-02 17:51:27 +0200, Juan José Santamaría Flecha wrote:
> > +     attrib = GetFileAttributes(d->ret.d_name);
>
> Is this really an optimization? The benefit of Thomas' patch is that
> that information sometimes already is there. But here you're doing a
> separate lookup with GetFileAttributes()?

Well as discussed already, our stat() emulation on Windows does
multiple syscalls, so it's a slight improvement.  However, it looks
like we might be missing a further opportunity here...  Doesn't
Windows already give us the flags we need in the dwFileAttributes
member of the WIN32_FIND_DATA object that the Find{First,Next}File()
functions populate?



Re: A micro-optimisation for walkdir()

От
Andres Freund
Дата:
Hi,

On 2020-09-05 11:15:07 +1200, Thomas Munro wrote:
> On Sat, Sep 5, 2020 at 9:45 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2020-09-02 17:51:27 +0200, Juan José Santamaría Flecha wrote:
> > > +     attrib = GetFileAttributes(d->ret.d_name);
> >
> > Is this really an optimization? The benefit of Thomas' patch is that
> > that information sometimes already is there. But here you're doing a
> > separate lookup with GetFileAttributes()?
> 
> Well as discussed already, our stat() emulation on Windows does
> multiple syscalls, so it's a slight improvement.

But the patch is patching readdir(), not just walkdir(). Not all
readdir() / ReadDir() callers necessarily do a stat() and many continue
to do a stat() after the patches. So for all of those except walkdir(),
some like RemoveOldXlogFiles() sometimes being noticable cost wise, the
patch will increase the cost on windows. No?  That's quite different
from utilizing "free" information.


> However, it looks like we might be missing a further opportunity
> here...  Doesn't Windows already give us the flags we need in the
> dwFileAttributes member of the WIN32_FIND_DATA object that the
> Find{First,Next}File() functions populate?

That'd be better...

Greetings,

Andres Freund



Re: A micro-optimisation for walkdir()

От
Juan José Santamaría Flecha
Дата:

On Sat, Sep 5, 2020 at 2:13 AM Andres Freund <andres@anarazel.de> wrote:

> However, it looks like we might be missing a further opportunity
> here...  Doesn't Windows already give us the flags we need in the
> dwFileAttributes member of the WIN32_FIND_DATA object that the
> Find{First,Next}File() functions populate?

That'd be better...

At first I did not see how to get DT_LNK directly, but it is possible without additional calls, so please find attached a version with that logic.

This version also drops the enum, defining just the macros.

Regards,

Juan José Santamaría Flecha  
Вложения

Re: A micro-optimisation for walkdir()

От
Ranier Vilela
Дата:
Hi Juan,

This is only a suggestion, if you find it appropriate.
We could use a little cut tail in get_dirent_type function.

Try to avoid add padding, when modifying or adding fields.
struct dirent
 {
  long d_ino;
  unsigned short d_reclen;
  unsigned short d_namlen;
+ unsigned char d_type;
  char d_name[MAX_PATH];
 };

Or even better if possible:
struct dirent
 {
  char d_name[MAX_PATH];
  long d_ino;
  unsigned short d_reclen;
  unsigned short d_namlen;
  unsigned char d_type;
 };

regards,
Ranier Vilela

Вложения

Re: A micro-optimisation for walkdir()

От
Thomas Munro
Дата:
On Sun, Sep 6, 2020 at 5:23 AM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
> On Sat, Sep 5, 2020 at 2:13 AM Andres Freund <andres@anarazel.de> wrote:
>> > However, it looks like we might be missing a further opportunity
>> > here...  Doesn't Windows already give us the flags we need in the
>> > dwFileAttributes member of the WIN32_FIND_DATA object that the
>> > Find{First,Next}File() functions populate?
>>
>> That'd be better...
>
>
> At first I did not see how to get DT_LNK directly, but it is possible without additional calls, so please find
attacheda version with that logic. 
>
> This version also drops the enum, defining just the macros.

Excellent.  I'd like to commit these soon, unless someone has a better
idea for how to name file_utils_febe.c.

I think the following is a little mysterious, but it does seem to be
what people do for this in other projects.  It is the documented way
to detect mount points, and I guess IO_REPARSE_TAG_MOUNT_POINT is
either overloaded also for junctions, or junctions are the same thing
as mount points.  It would be nice to see a Win32 documentation page
that explicitly said that.

+    /* For reparse points dwReserved0 field will contain the ReparseTag */
+    else if ((fd.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) != 0
+             && (fd.dwReserved0 == IO_REPARSE_TAG_MOUNT_POINT))
+        d->ret.d_type = DT_LNK;

Hmm, it's interesting that our existing test for a junction in
pgwin32_is_junction() only looks for FILE_ATTRIBUTE_REPARSE_POINT and
doesn't care what kind of reparse point it is.



Re: A micro-optimisation for walkdir()

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> Excellent.  I'd like to commit these soon, unless someone has a better
> idea for how to name file_utils_febe.c.

As long as it's in src/port or src/common, isn't it implicit that
it's probably FE/BE common code?

I think it'd make more sense to insert all this stuff into file_utils.c,
and then just "#ifdef FRONTEND" the existing code there that doesn't work
in backend.  For one thing, that gives us a saner upgrade path whenever
somebody gets ambitious enough to make that code work for the backend.

            regards, tom lane



Re: A micro-optimisation for walkdir()

От
Thomas Munro
Дата:
On Mon, Sep 7, 2020 at 10:36 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Excellent.  I'd like to commit these soon, unless someone has a better
> > idea for how to name file_utils_febe.c.
>
> As long as it's in src/port or src/common, isn't it implicit that
> it's probably FE/BE common code?
>
> I think it'd make more sense to insert all this stuff into file_utils.c,
> and then just "#ifdef FRONTEND" the existing code there that doesn't work
> in backend.  For one thing, that gives us a saner upgrade path whenever
> somebody gets ambitious enough to make that code work for the backend.

True.  Ok, I committed the Unix patch that way.  I know it works on
Linux, FreeBSD, macOS and Windows (without Juan José's patch), but
I'll have to check the build farm later to make sure HPUX, AIX and
Solaris are OK with this.  It's remotely possible that one of them
defines DT_REG etc but doesn't have d_type; I'm hoping to get away
with not adding a configure check.

I'll wait a bit longer for comments on the Windows patch.



Re: A micro-optimisation for walkdir()

От
Magnus Hagander
Дата:


On Mon, Sep 7, 2020 at 12:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Sun, Sep 6, 2020 at 5:23 AM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
> On Sat, Sep 5, 2020 at 2:13 AM Andres Freund <andres@anarazel.de> wrote:
>> > However, it looks like we might be missing a further opportunity
>> > here...  Doesn't Windows already give us the flags we need in the
>> > dwFileAttributes member of the WIN32_FIND_DATA object that the
>> > Find{First,Next}File() functions populate?
>>
>> That'd be better...
>
>
> At first I did not see how to get DT_LNK directly, but it is possible without additional calls, so please find attached a version with that logic.
>
> This version also drops the enum, defining just the macros.

Excellent.  I'd like to commit these soon, unless someone has a better
idea for how to name file_utils_febe.c.

I think the following is a little mysterious, but it does seem to be
what people do for this in other projects.  It is the documented way
to detect mount points, and I guess IO_REPARSE_TAG_MOUNT_POINT is
either overloaded also for junctions, or junctions are the same thing
as mount points.  It would be nice to see a Win32 documentation page
that explicitly said that.

The wikipedia page on it is actually fairly decent: https://en.wikipedia.org/wiki/NTFS_reparse_point. It's not the documentation of course, but it's easier to read :) The core difference is whether you mount a whole filesystem (mount point) or just a directory off something mounted elsehwere (junction).

And yes, the wikipedia page confirms that junctions also use IO_REPARSE_TAG_MOUNT_POINT.


+    /* For reparse points dwReserved0 field will contain the ReparseTag */
+    else if ((fd.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) != 0
+             && (fd.dwReserved0 == IO_REPARSE_TAG_MOUNT_POINT))
+        d->ret.d_type = DT_LNK;

Hmm, it's interesting that our existing test for a junction in
pgwin32_is_junction() only looks for FILE_ATTRIBUTE_REPARSE_POINT and
doesn't care what kind of reparse point it is.

I think that's mostly historical. When that code was written, the only two types of reparse points that existed were junctions and mount points -- which are as already noted, the same. Symbolic links, unix sockets and such things came later.

 
--

Re: A micro-optimisation for walkdir()

От
Thomas Munro
Дата:
On Mon, Sep 7, 2020 at 9:42 PM Magnus Hagander <magnus@hagander.net> wrote:
> On Mon, Sep 7, 2020 at 12:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> I think the following is a little mysterious, but it does seem to be
>> what people do for this in other projects.  It is the documented way
>> to detect mount points, and I guess IO_REPARSE_TAG_MOUNT_POINT is
>> either overloaded also for junctions, or junctions are the same thing
>> as mount points.  It would be nice to see a Win32 documentation page
>> that explicitly said that.
>
> The wikipedia page on it is actually fairly decent: https://en.wikipedia.org/wiki/NTFS_reparse_point. It's not the
documentationof course, but it's easier to read :) The core difference is whether you mount a whole filesystem (mount
point)or just a directory off something mounted elsehwere (junction). 
>
> And yes, the wikipedia page confirms that junctions also use IO_REPARSE_TAG_MOUNT_POINT.

Thanks for confirming.  I ran the Windows patch through pgindent,
fixed a small typo, and pushed.



Re: A micro-optimisation for walkdir()

От
Juan José Santamaría Flecha
Дата:
On Mon, Sep 7, 2020 at 1:41 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Mon, Sep 7, 2020 at 9:42 PM Magnus Hagander <magnus@hagander.net> wrote:
> On Mon, Sep 7, 2020 at 12:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> I think the following is a little mysterious, but it does seem to be
>> what people do for this in other projects.  It is the documented way
>> to detect mount points, and I guess IO_REPARSE_TAG_MOUNT_POINT is
>> either overloaded also for junctions, or junctions are the same thing
>> as mount points.  It would be nice to see a Win32 documentation page
>> that explicitly said that.
>
> The wikipedia page on it is actually fairly decent: https://en.wikipedia.org/wiki/NTFS_reparse_point. It's not the documentation of course, but it's easier to read :) The core difference is whether you mount a whole filesystem (mount point) or just a directory off something mounted elsehwere (junction).
>
> And yes, the wikipedia page confirms that junctions also use IO_REPARSE_TAG_MOUNT_POINT.

Thanks for confirming.  I ran the Windows patch through pgindent,
fixed a small typo, and pushed.

Great, thanks. Should we include something from this discussion as comments?

Regards,

Juan José Santamaría Flecha 

Re: A micro-optimisation for walkdir()

От
Thomas Munro
Дата:
On Tue, Sep 8, 2020 at 2:05 AM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
> On Mon, Sep 7, 2020 at 1:41 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Thanks for confirming.  I ran the Windows patch through pgindent,
>> fixed a small typo, and pushed.
>
> Great, thanks. Should we include something from this discussion as comments?

I dunno, it seems like this may be common knowledge for Windows people
and I was just being paranoid by asking for more info as a
non-Windowsian, but if you want to propose new comments or perhaps
just a pointer to one central place where we explain how that works, I
wouldn't be against that.

FWIW I just spotted a couple of very suspicious looking failures for
build farm animal "walleye", a "MinGW64 8.1.0" system, that say:

c:\\pgbuildfarm\\pgbuildroot\\HEAD\\pgsql.build\\src\\bin\\pg_upgrade>RMDIR
/s/q "c:\\pgbuildfarm\\pgbuildroot\\HEAD\\pgsql.build\\src\\bin\\pg_upgrade\\tmp_check\\data.old"
The directory is not empty.

Then I noticed it failed the same way 8 days ago, so I don't know
what's up with that but it looks like we didn't break it...



Re: A micro-optimisation for walkdir()

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> FWIW I just spotted a couple of very suspicious looking failures for
> build farm animal "walleye", a "MinGW64 8.1.0" system, that say:

walleye's been kind of unstable since the get-go, so I wouldn't put
too much faith in reports just from it.

            regards, tom lane



Re: A micro-optimisation for walkdir()

От
Thomas Munro
Дата:
On Tue, Sep 8, 2020 at 10:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > FWIW I just spotted a couple of very suspicious looking failures for
> > build farm animal "walleye", a "MinGW64 8.1.0" system, that say:
>
> walleye's been kind of unstable since the get-go, so I wouldn't put
> too much faith in reports just from it.

CC'ing animal owner.

<pokes at the logs>  I suspect that someone who knows about PostgreSQL
on Windows would recognise the above symptom, but my guess is the
Windows "indexing" service is on, or an antivirus thing, or some other
kind of automatically-open-and-sniff-every-file-on-certain-file-events
thing.  It looks like nothing of ours is even running at that moment
("waiting for server to shut down.... done"), and it's the RMDIR /s
shell command that is reporting the error.  The other low probability
error seen on this host is this one:

+ERROR:  could not stat file "pg_wal/000000010000000000000007":
Permission denied



Re: A micro-optimisation for walkdir()

От
Juan José Santamaría Flecha
Дата:

On Mon, Sep 14, 2020 at 12:42 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Tue, Sep 8, 2020 at 10:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > FWIW I just spotted a couple of very suspicious looking failures for
> > build farm animal "walleye", a "MinGW64 8.1.0" system, that say:
>
> walleye's been kind of unstable since the get-go, so I wouldn't put
> too much faith in reports just from it.

CC'ing animal owner.

<pokes at the logs>  I suspect that someone who knows about PostgreSQL
on Windows would recognise the above symptom, but my guess is the
Windows "indexing" service is on, or an antivirus thing, or some other
kind of automatically-open-and-sniff-every-file-on-certain-file-events
thing.  It looks like nothing of ours is even running at that moment
("waiting for server to shut down.... done"), and it's the RMDIR /s
shell command that is reporting the error.  The other low probability
error seen on this host is this one:

+ERROR:  could not stat file "pg_wal/000000010000000000000007":
Permission denied

This is a known problem, previous to this patch. There have been a couple of attempts to tackle it, and hopefully the shared-memory based stats collector will take care of it:


Regards,

Juan José Santamaría Flecha