Обсуждение: Patch to add hook to copydir()
Hello all, I've been working on an extension that tightly integrates postgres with underlying filesystem . I need to customize how postgres copies directories for new databases. I first looked at the ProcessUtility_hook. This would require me to copy or rewrite most of the createdb() function. This is less than ideal of course. Someone on the IRC channel suggested I could add a hook for copydir(). I implemented the hook similar to how the ProcessUtility_hook is implemented. I couldn't find any tests for any of the existing hooks. I've been looking at the regression tests, but I am not entirely sure how to proceed on that front. I tested my patch extensively against master and the REL_12_STABLE branch. All tests pass and the patch has been working great with my extension. I attached a first draft of the patch against master. --- Swen Kooij
Вложения
I just realized I completely borked the patch file. My apologies. Attached a (hopefully) correct patch file. --- Swen Kooij On Mon, Sep 2, 2019 at 9:54 PM Swen Kooij <swenkooij@gmail.com> wrote: > > Hello all, > > I've been working on an extension that tightly integrates > postgres with underlying filesystem . I need to customize > how postgres copies directories for new databases. > > I first looked at the ProcessUtility_hook. This would require > me to copy or rewrite most of the createdb() function. This is > less than ideal of course. Someone on the IRC channel > suggested I could add a hook for copydir(). > > I implemented the hook similar to how the > ProcessUtility_hook is implemented. I couldn't find any tests > for any of the existing hooks. I've been looking at the regression > tests, but I am not entirely sure how to proceed on that front. > > I tested my patch extensively against master and > the REL_12_STABLE branch. All tests pass and the patch has > been working great with my extension. > > I attached a first draft of the patch against master. > > --- > Swen Kooij
Вложения
On 2019-09-02 20:54, Swen Kooij wrote: > I've been working on an extension that tightly integrates > postgres with underlying filesystem . I need to customize > how postgres copies directories for new databases. Could you share some more details, so we can assess whether that is a sensible way to go about it, and what other hooks might be needed? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Sep-02, Peter Eisentraut wrote: > On 2019-09-02 20:54, Swen Kooij wrote: > > I've been working on an extension that tightly integrates > > postgres with underlying filesystem . I need to customize > > how postgres copies directories for new databases. > > Could you share some more details, so we can assess whether that is a > sensible way to go about it, and what other hooks might be needed? It seems either terribly high-level, or terribly low-level, depending on how you look at it. I wonder to what extent it conflicts with the table AM work, and with the idea of allowing FDWs to have real relfilenodes. I wonder if this is just one missing piece that's needed to complete something of a layer for which other pieces are already satisfied by other hooks. As is and pending further explanation, it seems a bad idea to me. If we want pluggability here, maybe we need some new abstraction layer. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thank you both for the feedback. To give some background on what I am trying to do. Mostly experimental work right now. I've built an extension that takes advantage of the copy-on-write properties of ZFS and BTRFS. A tool transforms a data directory into a set of datasets/sub volumes. When a new database is created and the directories are copied, the extension uses the filesystem's ability to create a snapshot/clone of the source dataset/sub volume. As createdb() iterates over the template database's tablespaces and copies the directories to the new one, the copydir() gets invoked (from dbcommands.c). My extension hooks copydir() and instead snapshots the source dir and clones it into the target dir (respecting the recurse flag). If the source dir is not a ZFS data set or BTRFS sub volume, the standard implementation gets invoked and everything is business as usual. As described in the first paragraph, I've built a small tool that transforms all base/* directories in the data directory into ZFS datasets or BTRFS sub volumes. New tablespaces that get created later would have to be ZFS datasets or BTRFS sub volumes as well in order for the extension to work. As explained above, if they are not, the extension will just copy the files. I also hook ProcessUtility in order to clean up the ZFS datasets and BTRFS subvolumes _after_ a database gets deleted. Postgres just handles the deletion of the files and the extension cleans up any dangling ZFS/BTRFS resources. Is there anything that I am missing? My early experiments have been very promising but my experience with Postgres internals is limited. Any help or feedback would be much appreciated. On Mon, Sep 2, 2019 at 11:06 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2019-Sep-02, Peter Eisentraut wrote: > > > On 2019-09-02 20:54, Swen Kooij wrote: > > > I've been working on an extension that tightly integrates > > > postgres with underlying filesystem . I need to customize > > > how postgres copies directories for new databases. > > > > Could you share some more details, so we can assess whether that is a > > sensible way to go about it, and what other hooks might be needed? > > It seems either terribly high-level, or terribly low-level, depending on > how you look at it. I wonder to what extent it conflicts with the table > AM work, and with the idea of allowing FDWs to have real relfilenodes. > I wonder if this is just one missing piece that's needed to complete > something of a layer for which other pieces are already satisfied by > other hooks. > > As is and pending further explanation, it seems a bad idea to me. If we > want pluggability here, maybe we need some new abstraction layer. > > -- > Álvaro Herrera https://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-09-02 22:16, Swen Kooij wrote: > Is there anything that I am missing? My early experiments have been > very promising but my experience with Postgres internals is limited. Any > help or feedback would be much appreciated. You might want to review several previous threads that were contemplating doing some reflink stuff with btrfs during database creation. Not exactly what you are doing, but related. Also, wouldn't it work if your extension just defined its own copydir() in the .so? That should then be used over the built-in one -- maybe. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I read two previous proposals for something similar. First one is from 2013 [0]. It proposed a OS and filesystem specific implementation. Which was then changed to a patch that adds a config option to specify shell commands that postgres should use for copying files and dirs. This was after commentary that such a feature is better suited for an extension. Which is what I am trying to do. What was true in 2013 (when the other patch was proposed) is still true. There's no good way at the moment to hook into how postgres copies files and directories. The second thread I found from 2018 [1] proposes filesystem specific changes (BTRFS, APFS and XFS) that operate on a file level. Really nice patch actually, but much more invasive than what I am proposing. Both patches make rather invasive and significant changes specific to the features they're trying to build. The general sentiment I got from reading those two threads is that building in support for these kind of features is hard and is probably better first done as an extension. I understand that the patch I proposed is not an ideal interface, but it gets the job done without having to built this kind of functionality into postgres. I am not proposing any filesystem or OS specific hooks/changes in postgres to make this work. I tried to override the copydir() function without any hooks and I haven't gotten it to work yet. If it does, I'd still prefer the hook. It's more predictable and probably more portable. [0] https://www.postgresql.org/message-id/511B5D11.4040507@socialserve.com [1] https://www.postgresql.org/message-id/bc9ca382-b98d-0446-f699-8c5de2307ca7%402ndquadrant.com On Tue, Sep 3, 2019 at 9:48 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2019-09-02 22:16, Swen Kooij wrote: > > Is there anything that I am missing? My early experiments have been > > very promising but my experience with Postgres internals is limited. Any > > help or feedback would be much appreciated. > > You might want to review several previous threads that were > contemplating doing some reflink stuff with btrfs during database > creation. Not exactly what you are doing, but related. > > Also, wouldn't it work if your extension just defined its own copydir() > in the .so? That should then be used over the built-in one -- maybe. > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Should I forget about getting such a patch in? I am up for implementing alternative solutions if the current one is considered unacceptable. As I tried to demonstrate in my last email, previous attempts also failed. On Tue, Sep 3, 2019 at 12:14 PM Swen Kooij <swenkooij@gmail.com> wrote: > > I read two previous proposals for something similar. > > First one is from 2013 [0]. It proposed a OS and filesystem specific > implementation. Which was then changed to a patch that adds a > config option to specify shell commands that postgres should use > for copying files and dirs. This was after commentary that such a > feature is better suited for an extension. Which is what I am trying to do. > > What was true in 2013 (when the other patch was proposed) is still > true. There's no good way at the moment to hook into how postgres > copies files and directories. > > The second thread I found from 2018 [1] proposes filesystem specific > changes (BTRFS, APFS and XFS) that operate on a file level. Really > nice patch actually, but much more invasive than what I am proposing. > > > Both patches make rather invasive and significant changes specific > to the features they're trying to build. The general sentiment I got > from reading those two threads is that building in support for these > kind of features is hard and is probably better first done as an extension. > > I understand that the patch I proposed is not an ideal interface, > but it gets the job done without having to built this kind of > functionality into postgres. I am not proposing any filesystem > or OS specific hooks/changes in postgres to make this work. > > I tried to override the copydir() function without any hooks > and I haven't gotten it to work yet. If it does, I'd still prefer the > hook. It's more predictable and probably more portable. > > [0] https://www.postgresql.org/message-id/511B5D11.4040507@socialserve.com > [1] https://www.postgresql.org/message-id/bc9ca382-b98d-0446-f699-8c5de2307ca7%402ndquadrant.com > > On Tue, Sep 3, 2019 at 9:48 AM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > > > On 2019-09-02 22:16, Swen Kooij wrote: > > > Is there anything that I am missing? My early experiments have been > > > very promising but my experience with Postgres internals is limited. Any > > > help or feedback would be much appreciated. > > > > You might want to review several previous threads that were > > contemplating doing some reflink stuff with btrfs during database > > creation. Not exactly what you are doing, but related. > > > > Also, wouldn't it work if your extension just defined its own copydir() > > in the .so? That should then be used over the built-in one -- maybe. > > > > -- > > Peter Eisentraut http://www.2ndQuadrant.com/ > > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services