Обсуждение: pg_execute_from_file review
I've just looked at pg_execute_from_file[1]. The idea here is to execute all the SQL commands in a given file. My comments: * It applies well enough, and builds fine * It seems to work, and I've not come up with a way to make it break * It seems useful, and to follow the limited design discussion relevant to it * It looks more or less like the rest of the code base, so far as I know Various questions and/or nits: * I'd like to see the docs slightly expanded, specifically to describe parameter replacement. I wondered for a while if Ineeded to set of parameters in any specific way, before reading the code and realizing they can be whatever I want. * Does anyone think it needs representation in the test suite? * Is it at all bad to include spi.h in genfile.c? IOW should this function live elsewhere? It seems reasonable to me to doit as written, but I thought I'd ask. * In the snippet below, it seems best just to use palloc0(): query_string = (char *)palloc((fsize+1)*sizeof(char)); memset(query_string,0, fsize+1); * Shouldn't it include SPI_push() and SPI_pop()? [1] http://archives.postgresql.org/message-id/m262wf6fnz.fsf@2ndQuadrant.fr -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
Joshua Tolley <eggyknap@gmail.com> writes: > I've just looked at pg_execute_from_file[1]. The idea here is to execute all > the SQL commands in a given file. My comments: Thanks for your review. Please find attached a revised patch where I've changed the internals of the function so that it's split in two and that the opr_sanity check passes, per comments from David Wheeler and Tom Lane. > * I'd like to see the docs slightly expanded, specifically to describe > parameter replacement. I wondered for a while if I needed to set of > parameters in any specific way, before reading the code and realizing they > can be whatever I want. My guess is that you knew that in the CREATE EXTENSION context, it has been proposed to use the notation @extschema@ as a placeholder, and you've then been confused. I've refrained from imposing any style with respect to what the placeholder would look like in the mecanism-patch. Do we still want to detail in the docs that there's nothing expected about the placeholder syntax of format? > * Does anyone think it needs representation in the test suite? Well the patch will get its tests with the arrival of the extension main patch, where all contribs are installed using it. > * Is it at all bad to include spi.h in genfile.c? IOW should this function > live elsewhere? It seems reasonable to me to do it as written, but I thought > I'd ask. Well, using spi at this place has been asked by Álvaro and Tom, so my guess is that's ok :) > * In the snippet below, it seems best just to use palloc0(): > query_string = (char *)palloc((fsize+1)*sizeof(char)); > memset(query_string, 0, fsize+1); Edited. > * Shouldn't it include SPI_push() and SPI_pop()? ENOCLUE Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Вложения
On Thu, Nov 25, 2010 at 10:24:51PM +0100, Dimitri Fontaine wrote: > Joshua Tolley <eggyknap@gmail.com> writes: > > I've just looked at pg_execute_from_file[1]. The idea here is to execute all > > the SQL commands in a given file. My comments: > > Thanks for your review. Please find attached a revised patch where I've > changed the internals of the function so that it's split in two and that > the opr_sanity check passes, per comments from David Wheeler and Tom Lane. I'll take a look ASAP. > > * I'd like to see the docs slightly expanded, specifically to describe > > parameter replacement. I wondered for a while if I needed to set of > > parameters in any specific way, before reading the code and realizing they > > can be whatever I want. > > My guess is that you knew that in the CREATE EXTENSION context, it has > been proposed to use the notation @extschema@ as a placeholder, and > you've then been confused. I've refrained from imposing any style with > respect to what the placeholder would look like in the mecanism-patch. > > Do we still want to detail in the docs that there's nothing expected > about the placeholder syntax of format? Perhaps such docs will show up with the rest of the EXTENSION work, but I'd like a brief mention somewhere. > > * Does anyone think it needs representation in the test suite? > > Well the patch will get its tests with the arrival of the extension main > patch, where all contribs are installed using it. Works for me. > > * Shouldn't it include SPI_push() and SPI_pop()? > > ENOCLUE My guess is "yes", because that was widely hailed as a good idea when I did PL/LOLCODE. I suspect it would only matter if someone were using pg_execute_from_file within some other function, which isn't entirely unlikely. -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
Joshua Tolley <eggyknap@gmail.com> writes: >> > * I'd like to see the docs slightly expanded, specifically to describe >> > parameter replacement. I wondered for a while if I needed to set of >> > parameters in any specific way, before reading the code and realizing they >> > can be whatever I want. >> >> My guess is that you knew that in the CREATE EXTENSION context, it has >> been proposed to use the notation @extschema@ as a placeholder, and >> you've then been confused. I've refrained from imposing any style with >> respect to what the placeholder would look like in the mecanism-patch. >> >> Do we still want to detail in the docs that there's nothing expected >> about the placeholder syntax of format? > > Perhaps such docs will show up with the rest of the EXTENSION work, but I'd > like a brief mention somewhere. I'm unclear about how to spell out what you'd like to be seen in there, so I'm not proposing a newer version of the patch. >> > * Shouldn't it include SPI_push() and SPI_pop()? >> >> ENOCLUE > > My guess is "yes", because that was widely hailed as a good idea when I did > PL/LOLCODE. I suspect it would only matter if someone were using > pg_execute_from_file within some other function, which isn't entirely > unlikely. In fact, per the fine manual, it seems unnecessary doing so when using SPI_execute(), which is the case here: http://www.postgresql.org/docs/9.0/interactive/spi-spi-push.html Note that SPI_execute and related functions automatically do the equivalent of SPI_push before passing control back to theSQL execution engine, so it is not necessary for you to worry about this when using those functions. For information, we've been talking about the case on IRC and Joshua and we are agreeing on this conclusion. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Nov 26, 2010 at 06:24, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Thanks for your review. Please find attached a revised patch where I've > changed the internals of the function so that it's split in two and that > the opr_sanity check passes, per comments from David Wheeler and Tom Lane. I have some comments and questions about pg_execute_from_file.v5.patch. ==== Source code ==== * OID=3627 is used by another function. Don't you expect 3927? * There is a compiler warning: genfile.c: In function ‘pg_execute_from_file_with_placeholders’: genfile.c:427: warning: unusedvariable ‘query_string’ * I'd like to ask native speakers whether "from" is needed in names of "pg_execute_from_file" and "pg_execute_from_query_string". ==== Design and Implementation ==== * pg_execute_from_file() can execute any files even if they are not in $PGDATA. OTOH, we restrict pg_read_file() to readsuch files. What will be our policy? Note that the contents of file might be logged or returned to the client on errors. * Do we have any reasons to have pg_execute_from_file separated from pg_read_file ? If we allow pg_read_file() to read filesin $PGSHARE, pg_execute_from_file could be replaced with "EXECUTE pg_read_file()". (Note that pg_execute_from_file isimplemented to do so even now.) * I hope pg_execute_from_file (and pg_read_file) had an option to specify an character encoding of the file. Especially,SJIS is still used widely, but it is not a valid server encoding. -- Itagaki Takahiro
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > I have some comments and questions about pg_execute_from_file.v5.patch. Thanks for reviewing it! > ==== Source code ==== > * OID=3627 is used by another function. Don't you expect 3927? Yes indeed. It took me some time to understand what's happening here, and it seems to be a case of git branches management from me. Here's the patch as I worked on it (that's much easier to test against the full branch, that's extension), then as it got cherry-picked into the branch I use to produce the patches: http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=commitdiff;h=b406fe35e36e6823e18f7c3157bc330b40b130d4 http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=commitdiff;h=b04eda8f8ee05c3bb5f4d6b693c5169aa7c3b9d1 I missed including a part of the following patch into the pg_execute_from_file branch: http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=commitdiff;h=7b3fe8384fb7636130e50f03a338f36495e15030 Sorry about that, will get fixed in v6 — already fixed in the git branch. > * There is a compiler warning: > genfile.c: In function ‘pg_execute_from_file_with_placeholders’: > genfile.c:427: warning: unused variable ‘query_string’ Fixed (in the git branches), sorry about that. > * I'd like to ask native speakers whether "from" is needed in names > of "pg_execute_from_file" and "pg_execute_from_query_string". Fair enough, will wait for some comments before producing a v6. > ==== Design and Implementation ==== > * pg_execute_from_file() can execute any files even if they are not > in $PGDATA. OTOH, we restrict pg_read_file() to read such files. > What will be our policy? Note that the contents of file might be > logged or returned to the client on errors. > > * Do we have any reasons to have pg_execute_from_file separated from > pg_read_file ? If we allow pg_read_file() to read files in $PGSHARE, > pg_execute_from_file could be replaced with "EXECUTE pg_read_file()". > (Note that pg_execute_from_file is implemented to do so even now.) pg_execute_from_file started as a very different animal, and I can see about it using pg_read_file() now, if we also change restrictions. Also, note that before using SPI an execute failure didn't ouput the whole input file. > * I hope pg_execute_from_file (and pg_read_file) had an option > to specify an character encoding of the file. Especially, SJIS > is still used widely, but it is not a valid server encoding. What we agreed on doing in the extension's main patch was to change the client_encoding before to call into pg_execute_from_file(). So I'd hope that changing that client side just before to call pg_execute_from_file would be enough. Is it? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Mon, Nov 29, 2010 at 4:26 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >> * I'd like to ask native speakers whether "from" is needed in names >> of "pg_execute_from_file" and "pg_execute_from_query_string". > > Fair enough, will wait for some comments before producing a v6. Yes, you need the from there. -- Robert Haas Native English Speaker
On Mon, Nov 29, 2010 at 10:27 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Nov 29, 2010 at 4:26 AM, Dimitri Fontaine > <dimitri@2ndquadrant.fr> wrote: >>> * I'd like to ask native speakers whether "from" is needed in names >>> of "pg_execute_from_file" and "pg_execute_from_query_string". >> >> Fair enough, will wait for some comments before producing a v6. > > Yes, you need the from there. Eh, wait. You definitely need from in pg_execute_from_file(). But pg_execute_from_query_string() doesn't sound quite right. What does that function do, anyway? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/29/2010 10:30 AM, Robert Haas wrote: > On Mon, Nov 29, 2010 at 10:27 AM, Robert Haas<robertmhaas@gmail.com> wrote: >> On Mon, Nov 29, 2010 at 4:26 AM, Dimitri Fontaine >> <dimitri@2ndquadrant.fr> wrote: >>>> * I'd like to ask native speakers whether "from" is needed in names >>>> of "pg_execute_from_file" and "pg_execute_from_query_string". >>> Fair enough, will wait for some comments before producing a v6. >> Yes, you need the from there. > Eh, wait. You definitely need from in pg_execute_from_file(). But > pg_execute_from_query_string() doesn't sound quite right. What does > that function do, anyway? I'm not sure why you need either "from". It just seems like a noise word. Maybe we could use pg_execute_query_file() and pg_execute_query_string(), which would be fairly clear and nicely symmetrical. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I'm not sure why you need either "from". It just seems like a noise > word. Maybe we could use pg_execute_query_file() and > pg_execute_query_string(), which would be fairly clear and nicely > symmetrical. I'd go with that but need to tell: only pg_execute_query_file() is to be exposed at the SQL level, the other one is just a backend facility to share code between the variants with and without placeholders. If you wonder why have two variants, it's because you can't have default values (pg_node_tree) in pg_proc.h, and followingTom's advices. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Mon, Nov 29, 2010 at 10:37 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > > > On 11/29/2010 10:30 AM, Robert Haas wrote: >> >> On Mon, Nov 29, 2010 at 10:27 AM, Robert Haas<robertmhaas@gmail.com> >> wrote: >>> >>> On Mon, Nov 29, 2010 at 4:26 AM, Dimitri Fontaine >>> <dimitri@2ndquadrant.fr> wrote: >>>>> >>>>> * I'd like to ask native speakers whether "from" is needed in names >>>>> of "pg_execute_from_file" and "pg_execute_from_query_string". >>>> >>>> Fair enough, will wait for some comments before producing a v6. >>> >>> Yes, you need the from there. >> >> Eh, wait. You definitely need from in pg_execute_from_file(). But >> pg_execute_from_query_string() doesn't sound quite right. What does >> that function do, anyway? > > I'm not sure why you need either "from". It just seems like a noise word. > Maybe we could use pg_execute_query_file() and pg_execute_query_string(), > which would be fairly clear and nicely symmetrical. Because you execute queries, not files. Or at least that's how I think about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Andrew Dunstan <andrew@dunslane.net> writes: > I'm not sure why you need either "from". It just seems like a noise > word. Maybe we could use pg_execute_query_file() and > pg_execute_query_string(), which would be fairly clear and nicely > symmetrical. +1, but I think "query" is also a noise word here. Why not just "pg_execute_file" and "pg_execute_string"? regards, tom lane
On 11/29/2010 10:51 AM, Robert Haas wrote: >> >> I'm not sure why you need either "from". It just seems like a noise word. >> Maybe we could use pg_execute_query_file() and pg_execute_query_string(), >> which would be fairly clear and nicely symmetrical. > Because you execute queries, not files. Or at least that's how I > think about it. > Well, to me "pg_execute_query_file" says "execute the queries in this file". I'm not sure what else it could sensibly mean. But I think any of the suggestions will probably work OK. cheers andrew
On Mon, Nov 29, 2010 at 11:12:58AM -0500, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > I'm not sure why you need either "from". It just seems like a noise > > word. Maybe we could use pg_execute_query_file() and > > pg_execute_query_string(), which would be fairly clear and nicely > > symmetrical. > > +1, but I think "query" is also a noise word here. > Why not just "pg_execute_file" and "pg_execute_string"? > > regards, tom lane While we're bikeshedding, and since I started the thread that has become this topic, +1 for Tom's naming. -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
On 11/29/2010 11:12 AM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> I'm not sure why you need either "from". It just seems like a noise >> word. Maybe we could use pg_execute_query_file() and >> pg_execute_query_string(), which would be fairly clear and nicely >> symmetrical. > +1, but I think "query" is also a noise word here. > Why not just "pg_execute_file" and "pg_execute_string"? > > Well, I put that in to make it clear that the file/string is expected to contain SQL and not, say, machine code. But I agree we could possibly do without it. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 11/29/2010 11:12 AM, Tom Lane wrote: >> +1, but I think "query" is also a noise word here. >> Why not just "pg_execute_file" and "pg_execute_string"? > Well, I put that in to make it clear that the file/string is expected to > contain SQL and not, say, machine code. But I agree we could possibly do > without it. Well, if you want to make that clear, it should be "pg_execute_sql_file" etc. I still think "query" is pretty vague, if not actually counterproductive (because it's singular not plural, so someone might think the file can only contain one query). regards, tom lane
On Mon, Nov 29, 2010 at 11:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> I'm not sure why you need either "from". It just seems like a noise >> word. Maybe we could use pg_execute_query_file() and >> pg_execute_query_string(), which would be fairly clear and nicely >> symmetrical. > > +1, but I think "query" is also a noise word here. > Why not just "pg_execute_file" and "pg_execute_string"? I'd pick pg_execute_from_file() and just plain pg_execute(), myself. pg_execute_file() could be read to mean you are going to execute the file itself (i.e. it's a program). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > pg_execute_file() could be read to mean you are going to execute the > file itself (i.e. it's a program). Well, if that's what it suggests to you, I don't see how adding "from" disambiguates anything. You could be executing machine code "from" a file, too. What did you think of "pg_execute_sql_file"? regards, tom lane
On Mon, Nov 29, 2010 at 11:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> pg_execute_file() could be read to mean you are going to execute the >> file itself (i.e. it's a program). > > Well, if that's what it suggests to you, I don't see how adding "from" > disambiguates anything. You could be executing machine code "from" > a file, too. > > What did you think of "pg_execute_sql_file"? That, I like. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I'd pick pg_execute_from_file() and just plain pg_execute(), myself. For the record there's only one name exposed at the SQL level. Or do you want me to expand the patch to actually include a pg_execute() version of the function, that would execute the query in PG_GETARG_TEXT_P(0)? > On Mon, Nov 29, 2010 at 11:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >> What did you think of "pg_execute_sql_file"? > > That, I like. Ok, I call pg_execute_sql_file() the winner and will prepare a new patch later tonight, now is comute time. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Mon, Nov 29, 2010 at 12:21 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I'd pick pg_execute_from_file() and just plain pg_execute(), myself. > > For the record there's only one name exposed at the SQL level. Or do you > want me to expand the patch to actually include a pg_execute() version > of the function, that would execute the query in PG_GETARG_TEXT_P(0)? No, not particularly. >> On Mon, Nov 29, 2010 at 11:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>> What did you think of "pg_execute_sql_file"? >> >> That, I like. > > Ok, I call pg_execute_sql_file() the winner and will prepare a new patch > later tonight, now is comute time. Sounds good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > I have some comments and questions about > pg_execute_from_file.v5.patch. I believe v6 fixes it all, please find it attached. > ==== Source code ==== > * OID=3627 is used by another function. Don't you expect 3927? > > * There is a compiler warning: > genfile.c: In function ‘pg_execute_from_file_with_placeholders’: > genfile.c:427: warning: unused variable ‘query_string’ Both fixes are in, sorry again. > * I'd like to ask native speakers whether "from" is needed in names > of "pg_execute_from_file" and "pg_execute_from_query_string". The function name now is pg_execute_sql_file(), per comments from Andrew, Joshua, Robert and Tom, all qualified native English speakers, among other qualities :) > ==== Design and Implementation ==== > * pg_execute_from_file() can execute any files even if they are not > in $PGDATA. OTOH, we restrict pg_read_file() to read such files. > What will be our policy? Note that the contents of file might be > logged or returned to the client on errors. > > * Do we have any reasons to have pg_execute_from_file separated from > pg_read_file ? If we allow pg_read_file() to read files in $PGSHARE, > pg_execute_from_file could be replaced with "EXECUTE pg_read_file()". > (Note that pg_execute_from_file is implemented to do so even now.) Thinking some more about it, there's still a reason to maintain them separated: the API ain't the same, we're not proposing to read a sql script file chunk after chunk, nor do we want users to have to check for the file size before being able to call the function. A problem with pg_read_file() as it stands is that it's returning text rather than bytea, too, and if we choose to fix that rather than adding some new functions, we will want to avoid having to separate the two functions again. > * I hope pg_execute_from_file (and pg_read_file) had an option > to specify an character encoding of the file. Especially, SJIS > is still used widely, but it is not a valid server encoding. So, what about client_encoding here, again? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Вложения
Excerpts from Dimitri Fontaine's message of lun nov 29 17:03:06 -0300 2010: > Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > > * I hope pg_execute_from_file (and pg_read_file) had an option > > to specify an character encoding of the file. Especially, SJIS > > is still used widely, but it is not a valid server encoding. > > So, what about client_encoding here, again? I tried this in an earlier iteration of this patch, and it works fine (albeit with a Latin1 file in an UTF8 encoded database, but presumably this is no different from any other pair of client/server encodings; recoding took place correctly during execution). -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, Nov 30, 2010 at 05:03, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > I believe v6 fixes it all, please find it attached. > >> ==== Design and Implementation ==== >> * pg_execute_from_file() can execute any files even if they are not >> in $PGDATA. OTOH, we restrict pg_read_file() to read such files. >> What will be our policy? Note that the contents of file might be >> logged or returned to the client on errors. >> >> * Do we have any reasons to have pg_execute_from_file separated from >> pg_read_file ? If we allow pg_read_file() to read files in $PGSHARE, >> pg_execute_from_file could be replaced with "EXECUTE pg_read_file()". >> (Note that pg_execute_from_file is implemented to do so even now.) > > Thinking some more about it, there's still a reason to maintain them > separated: the API ain't the same, we're not proposing to read a sql > script file chunk after chunk, nor do we want users to have to check for > the file size before being able to call the function. > > A problem with pg_read_file() as it stands is that it's returning text > rather than bytea, too, and if we choose to fix that rather than adding > some new functions, we will want to avoid having to separate the two > functions again. I think there are two topics here: 1. Do we need to restrict locations in which sql files are executable? 2. Which is better,pg_execute_sql_file() or EXECUTE pg_read_file() ? There are no discussion yet for 1, but I think we need some restrictions anyway. If we will be conservative, we would allow only files in $PGSHARE or $PGSHARE/contrib. More aggressive approach might be something like CREATE DIRECTORY command in Oracle Database: http://download.oracle.com/docs/cd/E14072_01/server.112/e10592/statements_5007.htm For 2, I'd like to monofunctionalize pg_execute_sql_file() into separated functions something like: - FUNCTION pg_execute_sql(sql text) - FUNCTION replace(str text, from1 text, to1 text, VARIADIC text) - FUNCTION pg_read_binary_file(path text, offset bigint, size bigint) (size == -1 means whole-file) pg_read_binary_file() is the most important functions probably. pg_execute_sql_file() can be rewritten as follows. We can also use existing convert_from() to support encodings. SELECT pg_execute_sql( replace( convert_from( pg_read_binary_file('path', 0, -1), 'encoding'), 'key1', 'value1', 'key2', 'value2') ); -- Itagaki Takahiro
On Tue, Nov 30, 2010 at 08:56, Alvaro Herrera <alvherre@commandprompt.com> wrote: >> > * I hope pg_execute_from_file (and pg_read_file) had an option >> > to specify an character encoding of the file. Especially, SJIS >> > is still used widely, but it is not a valid server encoding. >> >> So, what about client_encoding here, again? > > I tried this in an earlier iteration of this patch, and it works fine > (albeit with a Latin1 file in an UTF8 encoded database, but presumably > this is no different from any other pair of client/server encodings; > recoding took place correctly during execution). client_encoding won't work at all because read_sql_queries_from_file() uses pg_verifymbstr(), that is verify the input with *server_encoding*. Even if we replace it with pg_verify_mbstr(client_encoding, ...) and pg_do_encoding_conversion(from client_encoding to server_encoding), it still won't work well when error messages are raised. The client expects the original client encoding, but messages are sent in the file encoding. It would be a security hole. -- Itagaki Takahiro
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > I think there are two topics here: > 1. Do we need to restrict locations in which sql files are executable? > 2. Which is better, pg_execute_sql_file() or EXECUTE pg_read_file() ? > > There are no discussion yet for 1, but I think we need some restrictions Well, as a first level of restrictions, the function is superuser only. I understand and share your concerns, but as the main use for this function is in the extension's patch which is superuser only too, I feel like this discussion could (should) be taken to after commit. We would issue the next alpha with current coding, and the one after that with more security thoughts in. Is it possible to attack it this way? (The fear being that it might be hard to get to a common decision hereand we have other patches to care about depending onthis one, thatwill continue working fine --- that's the goal --- once the securitypolicy land in. This one is the mechanismpatch) > For 2, I'd like to monofunctionalize pg_execute_sql_file() into > separated functions something like: > - FUNCTION pg_execute_sql(sql text) > - FUNCTION replace(str text, from1 text, to1 text, VARIADIC text) > - FUNCTION pg_read_binary_file(path text, offset bigint, size bigint) > (size == -1 means whole-file) > > pg_read_binary_file() is the most important functions probably. > pg_execute_sql_file() can be rewritten as follows. We can also use > existing convert_from() to support encodings. > > SELECT pg_execute_sql( > replace( > convert_from( > pg_read_binary_file('path', 0, -1), > 'encoding'), > 'key1', 'value1', 'key2', 'value2') > ); I think the current pg_execute_sql_file() is a better user interface, but it could be extended to support queries in a text argument too, of course. I proposed it and Robert said he's not thinking that should happen in this very patch, if at all, if I understood correctly. Again, I'd like to see pg_read_binary_file() and it's easy to expose the other derivatives you're proposing here: the support code is already in my patch and is organised this way internally. Now, is there an agreement that all those new SQL functions this should be in the pg_execute_from_file patch? If so, I'll prepare v7 with that. Overall, I'd like it if it'd be possible to separate the concerns directly relevant to the extension patch to other legitimate ones, so that the main patch gets its share of review in this commitfest. But maybe I'm over-worried here. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > client_encoding won't work at all because read_sql_queries_from_file() > uses pg_verifymbstr(), that is verify the input with *server_encoding*. > > Even if we replace it with pg_verify_mbstr(client_encoding, ...) and > pg_do_encoding_conversion(from client_encoding to server_encoding), > it still won't work well when error messages are raised. The client > expects the original client encoding, but messages are sent in the > file encoding. It would be a security hole. I'll confess I'm at a loss here wrt how to solve your concerns. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Tue, Nov 30, 2010 at 18:47, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: >> There are no discussion yet for 1, but I think we need some restrictions > > Well, as a first level of restrictions, the function is superuser > only. I understand and share your concerns, but as the main use for this > function is in the extension's patch which is superuser only too, I found superuser can read any files in the server via COPY FROM and lo_import(). So, I think the restriction in pg_read_file() is inconsistent and doesn't protect the server at all. > Again, I'd like to see pg_read_binary_file() and it's easy to expose the > other derivatives you're proposing here: the support code is already in > my patch and is organised this way internally. Now, is there an > agreement that all those new SQL functions this should be in the > pg_execute_from_file patch? If so, I'll prepare v7 with that. My suggestion is to introduce pg_read_binary_file() function that can read any files in the server, and make CREATE EXTENSION to use the function. Of course, pg_execute_[sql|from]_file() can simplify queries in some degree, but I think it has too many jobs -- reading a file, (converting the encoding), replacing some texts in it, and executing the sql. If we have pg_read_binary_file(), you could implement CREATE EXTENSION like as below using SPI or nested function calls: $sql := replace( convert_from( pg_read_binary_file($path, 0, -1), $encoding), '@extschema@',$schema)); EXECUTE $sql; You seem to want to replace one variable @extschema@. So, you don't need replace(VARIADIC) function. The patch will be a bit simpler. -- Itagaki Takahiro
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > My suggestion is to introduce pg_read_binary_file() function that can > read any files in the server, and make CREATE EXTENSION to use the > function. Of course, pg_execute_[sql|from]_file() can simplify queries It seems like all you're missing in the current patch is the encoding option in there. Exposing the internal functions is simple enough and solves it, but I much prefer the current simple-case user API. Do you want me to add a variant with an extra 'encoding' parameter? In this case, CREATE EXTENSION WITH ENCODING … would just use this variant internally. I'll send another version of the patch with the following SQL callable functions: - replace_placeholders(text, variadic placeholders[]) returns text- pg_read_binary_file(text, offset, length) returns bytea-pg_execute_sql_string(text)- pg_execute_sql_string(text, encoding)- pg_execute_sql_string(text, encoding, variadicplaceholders[])- pg_execute_sql_file(text)- pg_execute_sql_file(text, encoding)- pg_execute_sql_file(text, encoding,variadic placeholders[]) I'm not too sure about the EXECUTE syntax variant, are you talking about a plpgsql block (in which case that's supported for free) or about a new plain SQL variant of it? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: >> My suggestion is to introduce pg_read_binary_file() function that can >> read any files in the server, and make CREATE EXTENSION to use the >> function. Of course, pg_execute_[sql|from]_file() can simplify queries > > It seems like all you're missing in the current patch is the encoding > option in there. Exposing the internal functions is simple enough and > solves it, but I much prefer the current simple-case user API. Do you > want me to add a variant with an extra 'encoding' parameter? Here's the result: dim=# \df pg_exe*|replace_*|*binary* List of functions Schema | Name | Result data type | Argument data types | Type ------------+-----------------------+------------------+---------------------------+-------- pg_catalog | pg_execute_sql_file | void | text | normal pg_catalog | pg_execute_sql_file | void | text, name | normal pg_catalog | pg_execute_sql_file | void | text, name, VARIADIC text | normal pg_catalog | pg_execute_sql_string | void | text | normal pg_catalog | pg_execute_sql_string | void | text, VARIADIC text | normal pg_catalog | pg_read_binary_file | bytea | text, bigint, bigint | normal pg_catalog | replace_placeholders | text | text, VARIADIC text | normal (7 rows) I think that answers fine to your concerns, in that you can manipulate the low-level functions in order to control each step, or you can use the higher-level functions and just pass them the arguments you want. Note that the "name" arguments here are the encoding. The documentation of the pg_execute_sql_string() function should close the gap noted by Joshua Tolley about how to format placeholder names, because it gives examples both using '@schema@' and 's' as names. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Вложения
On Thu, Dec 2, 2010 at 07:00, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Here's the result: > > dim=# \df pg_exe*|replace_*|*binary* > List of functions > Schema | Name | Result data type | Argument data types | Type > ------------+-----------------------+------------------+---------------------------+-------- > pg_catalog | pg_execute_sql_file | void | text | normal > pg_catalog | pg_execute_sql_file | void | text, name | normal > pg_catalog | pg_execute_sql_file | void | text, name, VARIADIC text | normal > pg_catalog | pg_execute_sql_string | void | text | normal > pg_catalog | pg_execute_sql_string | void | text, VARIADIC text | normal > pg_catalog | pg_read_binary_file | bytea | text, bigint, bigint | normal > pg_catalog | replace_placeholders | text | text, VARIADIC text | normal > (7 rows) Thanks, good job! * pg_read_binary_file_internal() should return not only the contents as char * but also the length, because the file mightcontain 0x00. In addition, null-terminations for the contents buffer is useless. * The 1st argument of pg_convert must be bytea rather than cstring in pg_convert_and_execute_sql_file(). I think you canfix both the bug and the above one if pg_read_binary_file_internal() returns bytea. * pg_read_file() has stronger restrictions than pg_read_binary_file(). (absolute path not allowed) and -1 length is not supported.We could fix pg_read_file() to behave as like as pg_read_binary_file(). * (It was my suggestion, but) Is it reasonable to use -1 length to read the while file? It might fit with C, but NULL mightbe better for SQL. * The doc says pg_execute_sql_string() is restricted for superusers, but is not restricted actually. I think we don't haveto. * In docs, the example of replace_placeholders() is replace('abcdefabcdef', 'cd', 'XX', 'ef', 'YY'). ~~~~~~~ BTW, I thinkwe can call it just "replace" because parser can handle them correctly even if we have both replace(text, text, text)and replace(text, VARIADIC text[]). We will need only one doc for them. Note that if we call replace() with 3 args,the non-VARIADIC version is called. So, there is no performance penalty. * We might rename pg_convert_and_execute_sql_file() to pg_execute_sql_file_with_encoding() or so to have the same prefix. -- Itagaki Takahiro
Hi, Please find attached the v8 version of the patch, that fixes the following: Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > * pg_read_binary_file_internal() should return not only the contents > as char * but also the length, because the file might contain 0x00. > In addition, null-terminations for the contents buffer is useless. > > * The 1st argument of pg_convert must be bytea rather than cstring in > pg_convert_and_execute_sql_file(). I think you can fix both the bug > and the above one if pg_read_binary_file_internal() returns bytea. I've changed pg_read_binary_file_internal() to return bytea*, which is much cleaner, thanks for the suggestion! > * pg_read_file() has stronger restrictions than pg_read_binary_file(). > (absolute path not allowed) and -1 length is not supported. > We could fix pg_read_file() to behave as like as pg_read_binary_file(). It's now using the _internal function directly, so that there's only one code definition to care about here. > * (It was my suggestion, but) Is it reasonable to use -1 length to read > the while file? It might fit with C, but NULL might be better for SQL. Well thinking about it, omitting the length parameter alltogether seems like the more natural SQL level API for me, so I've made it happen: =# \df pg_read_*|replace|pg_exe* List of functions Schema | Name | Result data type | Argument data types | Type ------------+-----------------------+------------------+---------------------------------+-------- pg_catalog | pg_execute_sql_file | void | text | normal pg_catalog | pg_execute_sql_file | void | text, name | normal pg_catalog | pg_execute_sql_file | void | text, name, VARIADIC text | normal pg_catalog | pg_execute_sql_string | void | text | normal pg_catalog | pg_execute_sql_string | void | text, VARIADIC text | normal pg_catalog | pg_read_binary_file | bytea | text, bigint | normal pg_catalog | pg_read_binary_file | bytea | text, bigint, bigint | normal pg_catalog | pg_read_file | text | text, bigint | normal pg_catalog | pg_read_file | text | text, bigint, bigint | normal pg_catalog | replace | text | text, text, text | normal pg_catalog | replace | text | text, text, text, VARIADIC text | normal (11 rows) > * The doc says pg_execute_sql_string() is restricted for superusers, > but is not restricted actually. I think we don't have to. Agreed, fixed the doc. > * In docs, the example of replace_placeholders() is > replace('abcdefabcdef', 'cd', 'XX', 'ef', 'YY'). > ~~~~~~~ > BTW, I think we can call it just "replace" because parser can handle > them correctly even if we have both replace(text, text, text) and > replace(text, VARIADIC text[]). We will need only one doc for them. > Note that if we call replace() with 3 args, the non-VARIADIC version > is called. So, there is no performance penalty. The same idea occured to me yesternight when reading through the patch to send. It's now done in the way you can see above. The idea is not to change the existing behavior at all, so I've not changed the non-VARIADIC version of the function. > * We might rename pg_convert_and_execute_sql_file() to > pg_execute_sql_file_with_encoding() or so to have the same prefix. Well, I think I prefer reading the verbs in the order that things are happening in the code, it's actually convert then execute. But again, maybe some Native Speaker will have a say here, or maybe your proposed name fits better in PostgreSQL. I'd leave it for commiter :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Вложения
On Thu, Dec 2, 2010 at 20:00, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Please find attached the v8 version of the patch, that fixes the following: I fixed and cleanup some of codes in it; v9 patch attached. Please check my modifications, and set the status to "Ready to Committer" if you find no problems. I think documentation and code comments might need to be checked by native English speakers. > Well thinking about it, omitting the length parameter alltogether seems > like the more natural SQL level API for me, so I've made it happen: Good idea. I re-added negative lengths checks in pg_read_file functions; negative length is used internally, but not exposed as SQL functions. >> BTW, I think we can call it just "replace" > The same idea occured to me yesternight when reading through the patch > to send. It's now done in the way you can see above. The idea is not to > change the existing behavior at all, so I've not changed the > non-VARIADIC version of the function. You added replace(text, text, text, VARIADIC text), but I think replace(text, VARIADIC text) also works. If we have the short form, we can use it easily from execute functions with placeholders. Other changes: * Added some regression tests. * Int64GetDatum((int64) fst.st_size) was broken. * An error checks for "could not read file" didn't work. * Read file contents into bytea buffer directly to avoid memcpy. * Fixed bad usages of text and bytea values because they are not null-terminated. * I don't want to expose ArrayType to builtins.h. So I call replace_text_variadic() from execute functions. * pg_execute_sql_file(path, NULL) won't work because it's a STRICT function. It returns NULL with no works when at least one of the argument is NULL. BTW, we have many text from/to cstring conversions in the new codes. It would be not an item for now, but we would need to improve them if those functions are heavily used, Especially replace_text_variadic(). >> * We might rename pg_convert_and_execute_sql_file() to >> pg_execute_sql_file_with_encoding() or so to have the same prefix. > > Well, I think I prefer reading the verbs in the order that things are > happening in the code, it's actually convert then execute. But again, > maybe some Native Speaker will have a say here, or maybe your proposed > name fits better in PostgreSQL. I'd leave it for commiter :) Agreed. I also prefer pg_read_file_all rather than pg_read_whole_file :P -- Itagaki Takahiro
Вложения
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > I fixed and cleanup some of codes in it; v9 patch attached. Please check > my modifications, and set the status to "Ready to Committer" if you find > no problems. I think documentation and code comments might need to be > checked by native English speakers. Many thanks, that version is so much cleaner than the previous one. Comments above. > You added replace(text, text, text, VARIADIC text), but I think > replace(text, VARIADIC text) also works. If we have the short form, > we can use it easily from execute functions with placeholders. So we now have the following: List of functions Schema | Name | Result data type | Argument data types | Type ------------+---------+------------------+---------------------+--------pg_catalog | replace | text | text, VARIADICtext | normalpg_catalog | replace | text | text, text, text | normal My understanding is that the variadic form shadows the other one in a way that it's now impossible to call it from SQL level. That's the reason why I did the (text, text, text, VARIADIC text) version before, but is it true? Also, is it worthwhile to keep the non VARIADIC version exposed at SQL level? The only other nitpicking I seem to be able to find is that you forgot to remove the following from builtins.h: + extern Datum replace_placeholders(PG_FUNCTION_ARGS); So I'll go update the commitfest to point to your version of the patch, add an entry for this "comments" email, and mark as ready for commiter > Other changes: All for the best, thank you! I can't help but noticing that some of them are fixing things that we could want to backpatch. Those: > * Int64GetDatum((int64) fst.st_size) was broken. > * An error checks for "could not read file" didn't work. That's straight from master's branch code, IIRC. > * Added some regression tests. > * Read file contents into bytea buffer directly to avoid memcpy. > * Fixed bad usages of text and bytea values > because they are not null-terminated. > * I don't want to expose ArrayType to builtins.h. > So I call replace_text_variadic() from execute functions. > * pg_execute_sql_file(path, NULL) won't work because it's a STRICT function. > It returns NULL with no works when at least one of the argument is NULL. Not sure to understand this last point, because I already had 3 versions of it, so surely you would call pg_execute_sql_file(path) in this case? > BTW, we have many text from/to cstring conversions in the new codes. > It would be not an item for now, but we would need to improve them > if those functions are heavily used, Especially replace_text_variadic(). That could become a concern if it actually shadows the other version. > Agreed. I also prefer pg_read_file_all rather than pg_read_whole_file :P Going in this line of thought, maybe we should provide a third variant here, the "real" pg_read_whole_file(path), then we have the existing other variants, pg_read_file_to_end(path, offset) and the historic one, pg_read_file(path, offset, bytes_to_read). Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Dec 3, 2010 at 18:02, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Schema | Name | Result data type | Argument data types | Type > ------------+---------+------------------+---------------------+-------- > pg_catalog | replace | text | text, VARIADIC text | normal > pg_catalog | replace | text | text, text, text | normal > > My understanding is that the variadic form shadows the other one in a > way that it's now impossible to call it from SQL level. That's the > reason why I did the (text, text, text, VARIADIC text) version before, > but is it true? The VARIADIC version doesn't hide the 3-args version. I tested the behavior by printf-debug. The planner seems to think the non VARIADIC version is the best-matched one when 3 arguments are passed. > Also, is it worthwhile to keep the non VARIADIC > version exposed at SQL level? Yes, because the non VARIADIC version is much faster than the VARIADIC one. Of course we could optimize the performance of replace_text_variadic(), but I think VARIADIC argument itself is slow because it puts arguments into an array shape. -- Itagaki Takahiro
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > The VARIADIC version doesn't hide the 3-args version. I tested the > behavior by printf-debug. The planner seems to think the non VARIADIC > version is the best-matched one when 3 arguments are passed. Nice! All's left is then the commit, right? :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > On Fri, Dec 3, 2010 at 18:02, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >> My understanding is that the variadic form shadows the other one in a >> way that it's now impossible to call it from SQL level. That's the >> reason why I did the (text, text, text, VARIADIC text) version before, >> but is it true? > The VARIADIC version doesn't hide the 3-args version. I tested the > behavior by printf-debug. The planner seems to think the non VARIADIC > version is the best-matched one when 3 arguments are passed. Why is there a variadic replace() in this patch at all? It seems just about entirely unrelated to the stated purpose of the patch, as well as being of dubious usefulness. When would it be superior to replace(replace(orig, from1, to1), from2, to2), ... The implementation doesn't appear to offer any material speed improvement over nested calls of that sort, and I'm finding it hard to visualize when it would be more useful than nested calls. The documentation is entirely inadequate as well. regards, tom lane
On Mon, Dec 6, 2010 at 08:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Why is there a variadic replace() in this patch at all? It seems just > about entirely unrelated to the stated purpose of the patch, as well > as being of dubious usefulness. As I wrote in the previous mail, the most important part of the patch for CREATE EXTENSION is pg_read_binary_file(). We can rewrite not only replace(VARIADIC) but also other functions in the patch with existing functions. However, the author wanted simple-case user APIs, and I also agreed to export each step of the complex pg_execute_sql_file(). But I have no objections to hide some of the subroutines if there are any problems. | $sql := replace( | convert_from( | pg_read_binary_file($path, 0), | $encoding), | '@extschema@', $schema)); | EXECUTE $sql; -- Itagaki Takahiro
On Sun, Dec 5, 2010 at 6:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: >> On Fri, Dec 3, 2010 at 18:02, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >>> My understanding is that the variadic form shadows the other one in a >>> way that it's now impossible to call it from SQL level. That's the >>> reason why I did the (text, text, text, VARIADIC text) version before, >>> but is it true? > >> The VARIADIC version doesn't hide the 3-args version. I tested the >> behavior by printf-debug. The planner seems to think the non VARIADIC >> version is the best-matched one when 3 arguments are passed. > > Why is there a variadic replace() in this patch at all? It seems just > about entirely unrelated to the stated purpose of the patch, as well > as being of dubious usefulness. When would it be superior to > > replace(replace(orig, from1, to1), from2, to2), ... > > The implementation doesn't appear to offer any material speed > improvement over nested calls of that sort, and I'm finding it hard to > visualize when it would be more useful than nested calls. The > documentation is entirely inadequate as well. An iterated replacement has different semantics from a simultaneous replace - replacing N placeholders with values simultaneously means you don't need to worry about the case where one of the replacement strings contains something that looks like a placeholder. I actually think a simultaneous replacement feature would be quite handy but I make no comment on whether it belongs as part of this patch. The discussion on this patch has been rather wide-ranging, and it's not clear to me that there's really consensus on what this patch needs to - or should - do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom Lane <tgl@sss.pgh.pa.us> writes: > Why is there a variadic replace() in this patch at all? It seems just > about entirely unrelated to the stated purpose of the patch, as well > as being of dubious usefulness. It used not to being exposed at the SQL level, but just an internal loop in pg_execute_sql_file() when using the placeholders enabled variant. Then Itagaki wanted me to expose internals so that he basically can implement the logics in SQL directly. It seems like we went a step too far in exposing this facility too. Agreed in removing it at the SQL level. I assume you want me to prepare a patch, I'm not sure about being able to send it to the list before Thursday --- unless I get around the git network errors at pgday. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Why is there a variadic replace() in this patch at all? It seems just >> about entirely unrelated to the stated purpose of the patch, as well >> as being of dubious usefulness. > It used not to being exposed at the SQL level, but just an internal loop > in pg_execute_sql_file() when using the placeholders enabled > variant. Then Itagaki wanted me to expose internals so that he basically > can implement the logics in SQL directly. It seems like we went a step > too far in exposing this facility too. Agreed in removing it at the SQL > level. Well, actually, my next question was going to be about removing the variadic substitution in pg_execute_string too. It's not apparent to me that that function should have a rather lame substitution mechanism hard-wired into it, when you can do the same thing with replace() in front of it. On the whole I'd prefer not to have any substitution functionality hard-wired into pg_execute_file either, though I can see the argument that it's necessary for practical use. Basically I'm concerned that replace-equivalent behavior is not going to be satisfactory over the long run: I think eventually we're going to need to think about quoting/escaping behavior. So I think it's a bad idea to expose the assumption that it'll be done that way at the SQL level. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Dec 5, 2010 at 6:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Why is there a variadic replace() in this patch at all? It seems just >> about entirely unrelated to the stated purpose of the patch, as well >> as being of dubious usefulness. When would it be superior to >> replace(replace(orig, from1, to1), from2, to2), ... > An iterated replacement has different semantics from a simultaneous > replace - replacing N placeholders with values simultaneously means > you don't need to worry about the case where one of the replacement > strings contains something that looks like a placeholder. Good point, but what the patch implements is in fact iterated replacement ... or at least it looked that way in a quick once-over. > I actually > think a simultaneous replacement feature would be quite handy but I > make no comment on whether it belongs as part of this patch. My point is that the replacement stuff really really needs to be factored out of the string-execution stuff, precisely because the desired behavior is debatable. regards, tom lane
On Dec 6, 2010, at 7:19 AM, Tom Lane wrote: > On the whole I'd prefer not to have any substitution functionality > hard-wired into pg_execute_file either, though I can see the argument > that it's necessary for practical use. Basically I'm concerned that > replace-equivalent behavior is not going to be satisfactory over the > long run: I think eventually we're going to need to think about > quoting/escaping behavior. So I think it's a bad idea to expose the > assumption that it'll be done that way at the SQL level. +1 I suspect that, for the purposes of the extensions patch, if CREATE EXTENSION could be modified to handle setting the schemaitself, without requiring that the file have this magic line: SET search_path = @extschema@; Then there would be no need for substitutions at all. Best, David
On Mon, Dec 6, 2010 at 12:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Dec 5, 2010 at 6:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Why is there a variadic replace() in this patch at all? It seems just >>> about entirely unrelated to the stated purpose of the patch, as well >>> as being of dubious usefulness. When would it be superior to >>> replace(replace(orig, from1, to1), from2, to2), ... > >> An iterated replacement has different semantics from a simultaneous >> replace - replacing N placeholders with values simultaneously means >> you don't need to worry about the case where one of the replacement >> strings contains something that looks like a placeholder. > > Good point, but what the patch implements is in fact iterated > replacement ... or at least it looked that way in a quick once-over. Oh. Well, -1 from me for including that. >> I actually >> think a simultaneous replacement feature would be quite handy but I >> make no comment on whether it belongs as part of this patch. > > My point is that the replacement stuff really really needs to be > factored out of the string-execution stuff, precisely because the > desired behavior is debatable. +1 for committing the uncontroversial parts separately. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
"David E. Wheeler" <david@kineticode.com> writes: > On Dec 6, 2010, at 7:19 AM, Tom Lane wrote: >> On the whole I'd prefer not to have any substitution functionality >> hard-wired into pg_execute_file either, though I can see the argument >> that it's necessary for practical use. Basically I'm concerned that >> replace-equivalent behavior is not going to be satisfactory over the >> long run: I think eventually we're going to need to think about >> quoting/escaping behavior. So I think it's a bad idea to expose the >> assumption that it'll be done that way at the SQL level. > +1 > I suspect that, for the purposes of the extensions patch, if CREATE EXTENSION could be modified to handle setting the schemaitself, without requiring that the file have this magic line: > SET search_path = @extschema@; > Then there would be no need for substitutions at all. That's an interesting idea, but I'm not sure it's wise to design around the assumption that we won't need substitutions ever. What I was thinking was that we should try to limit knowledge of the substitution behavior to the extension definition files and the implementation of CREATE EXTENSION itself. I don't agree with exposing that information at the SQL level. (On the other hand, if we *could* avoid using any explicit substitutions, it would certainly ease testing of extension files no? They'd be sourceable into psql then.) regards, tom lane
On Dec 6, 2010, at 10:43 AM, Tom Lane wrote: > That's an interesting idea, but I'm not sure it's wise to design around > the assumption that we won't need substitutions ever. What I was > thinking was that we should try to limit knowledge of the substitution > behavior to the extension definition files and the implementation of > CREATE EXTENSION itself. I don't agree with exposing that information > at the SQL level. > > (On the other hand, if we *could* avoid using any explicit > substitutions, it would certainly ease testing of extension files no? > They'd be sourceable into psql then.) Yes. And extension authors would not have to remember to include the magic line (which at any rate would break extensionsfor earlier versions of PostgreSQL). Best, dAvid
"David E. Wheeler" <david@kineticode.com> writes: > On Dec 6, 2010, at 10:43 AM, Tom Lane wrote: >> (On the other hand, if we *could* avoid using any explicit >> substitutions, it would certainly ease testing of extension files no? >> They'd be sourceable into psql then.) > Yes. And extension authors would not have to remember to include the magic line (which at any rate would break extensionsfor earlier versions of PostgreSQL). Well, I don't put any stock in the idea that it's important for existing module .sql files to be usable as-is as extension definition files. If it happens to fall out that way, fine, but we shouldn't give up anything else to get that. Letting extension files be directly sourceable in psql is probably worth a bit more, but I'm not sure how much. The argument that forgetting to include a magic source_path command would make CREATE EXTENSION behave surprisingly seems to have a good deal of merit though, certainly enough to justify having CREATE EXTENSION take care of that internally if at all possible. The real question in my mind is whether there are any other known or foreseeable cases where we would need to have substitution capability and there's not another good way to handle it. I haven't been paying real close attention to the threads about this patch --- do we have any specific use-cases in mind for substitution, besides this one? regards, tom lane
On Dec 6, 2010, at 11:12 AM, Tom Lane wrote: > Well, I don't put any stock in the idea that it's important for existing > module .sql files to be usable as-is as extension definition files. If > it happens to fall out that way, fine, but we shouldn't give up anything > else to get that. I agree, but I don't think we have to lose anything. > Letting extension files be directly sourceable in > psql is probably worth a bit more, but I'm not sure how much. The > argument that forgetting to include a magic source_path command would > make CREATE EXTENSION behave surprisingly seems to have a good deal of > merit though, certainly enough to justify having CREATE EXTENSION take > care of that internally if at all possible. Yes. The other question I have, though, is how important is it to have extensions live in a particular schema since there seemsto be no advantage to doing so. With the current patch, I can put extension "foo" in schema "bar", but I can't put anyother extension named "foo" in any other schema. It's in schema "bar" but is at the same time global. That doesn't makemuch sense to me. Best, David
"David E. Wheeler" <david@kineticode.com> writes: > The other question I have, though, is how important is it to have extensions live in a particular schema since there seemsto be no advantage to doing so. With the current patch, I can put extension "foo" in schema "bar", but I can't put anyother extension named "foo" in any other schema. It's in schema "bar" but is at the same time global. That doesn't makemuch sense to me. There's a difference between whether an extension as such is considered to belong to a schema and whether its contained objects do. We can't really avoid the fact that functions, operators, etc must be assigned to some particular schema. It seems not particularly important that extension names be schema-qualified, though --- the use-case for having two different extensions named "foo" installed simultaneously seems pretty darn small. On the other hand, if we were enforcing that all objects contained in an extension belong to the same schema, it'd make logistical sense to consider that the extension itself belongs to that schema as well. But last I heard we didn't want to enforce such a restriction. I believe what the search_path substitution is actually about is to provide a convenient shorthand for the case that all the contained objects do indeed live in one schema, and you'd like to be able to select that schema at CREATE EXTENSION time. Which seems like a useful feature for a common case. We've certainly heard multiple complaints about the fact that you can't do that easily now. BTW, I did think of a case where substitution solves a problem we don't presently have any other solution for: referring to the target schema within the definition of a contained object. As an example, you might wish to attach "SET search_path = @target_schema@" to the definition of a SQL function in an extension, to prevent search-path-related security issues in the use of the function. Without substitution you'll be reduced to hard-wiring the name of the target schema. regards, tom lane
On Dec 6, 2010, at 11:36 AM, Tom Lane wrote: > There's a difference between whether an extension as such is considered > to belong to a schema and whether its contained objects do. We can't > really avoid the fact that functions, operators, etc must be assigned to > some particular schema. Right, of course. > It seems not particularly important that > extension names be schema-qualified, though --- the use-case for having > two different extensions named "foo" installed simultaneously seems > pretty darn small. On the other hand, if we were enforcing that all > objects contained in an extension belong to the same schema, it'd make > logistical sense to consider that the extension itself belongs to that > schema as well. But last I heard we didn't want to enforce such a > restriction. Okay. > I believe what the search_path substitution is actually about is to > provide a convenient shorthand for the case that all the contained > objects do indeed live in one schema, and you'd like to be able to > select that schema at CREATE EXTENSION time. Which seems like a useful > feature for a common case. We've certainly heard multiple complaints > about the fact that you can't do that easily now. Yes, it *is* useful. But what happens if I have SET search_path = whatever; In my extension install script, and someone executes CREATE EXTENSION FOO WITH SCHEMA bar; Surprise! Everything is in whatever,not in bar. > BTW, I did think of a case where substitution solves a problem we don't > presently have any other solution for: referring to the target schema > within the definition of a contained object. As an example, you might > wish to attach "SET search_path = @target_schema@" to the definition of > a SQL function in an extension, to prevent search-path-related security > issues in the use of the function. Without substitution you'll be > reduced to hard-wiring the name of the target schema. You lost me. :-( David
Tom Lane <tgl@sss.pgh.pa.us> writes: > There's a difference between whether an extension as such is considered > to belong to a schema and whether its contained objects do. We can't > really avoid the fact that functions, operators, etc must be assigned to > some particular schema. It seems not particularly important that > extension names be schema-qualified, though --- the use-case for having > two different extensions named "foo" installed simultaneously seems > pretty darn small. On the other hand, if we were enforcing that all > objects contained in an extension belong to the same schema, it'd make > logistical sense to consider that the extension itself belongs to that > schema as well. But last I heard we didn't want to enforce such a > restriction. Very good summary, thank you, that's exactly the ideas I've been working with. Which ain't surprising, after all we've been talking about this for 18 months already :) So in the current patch, extensions are not schema qualified. > I believe what the search_path substitution is actually about is to > provide a convenient shorthand for the case that all the contained > objects do indeed live in one schema, and you'd like to be able to > select that schema at CREATE EXTENSION time. Which seems like a useful > feature for a common case. We've certainly heard multiple complaints > about the fact that you can't do that easily now. Exactly. It's just a useful little thing, but given that it depends on how the script is written, maybe the right interface would be a 2-steps process, so that either it does what you want or you get an error. Current patch: CREATE EXTENSION foo WITH SCHEMA bar; If foo's script isn't using @extschema@ or if it is using more than one schema, executing the script will not do anythinglike what you want to --- currently that's extension's author problem. Other idea: CREATE EXTENSION foo; ALTER EXTENSION foo SET SCHEMA utils; CREATE EXTENSION bar; ALTER EXTENSION bar SET SCHEMA utils; ERROR: the extension "bar" has installed objects in morethan one schema DETAIL: extension depends on schema "public" and "baz" HINT: use pg_extension_objects() to list bar'sobjects > BTW, I did think of a case where substitution solves a problem we don't > presently have any other solution for: referring to the target schema > within the definition of a contained object. As an example, you might > wish to attach "SET search_path = @target_schema@" to the definition of > a SQL function in an extension, to prevent search-path-related security > issues in the use of the function. Without substitution you'll be > reduced to hard-wiring the name of the target schema. Right. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr
On Mon, Dec 6, 2010 at 2:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > There's a difference between whether an extension as such is considered > to belong to a schema and whether its contained objects do. We can't > really avoid the fact that functions, operators, etc must be assigned to > some particular schema. It seems not particularly important that > extension names be schema-qualified, though --- the use-case for having > two different extensions named "foo" installed simultaneously seems > pretty darn small. On the other hand, if we were enforcing that all > objects contained in an extension belong to the same schema, it'd make > logistical sense to consider that the extension itself belongs to that > schema as well. But last I heard we didn't want to enforce such a > restriction. Why not? This feature seems to be pretty heavily designed around the assumption that everything's going to live in one schema, so is there any harm in making that explicit? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/07/2010 11:13 AM, Robert Haas wrote: > On Mon, Dec 6, 2010 at 2:36 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> There's a difference between whether an extension as such is considered >> to belong to a schema and whether its contained objects do. We can't >> really avoid the fact that functions, operators, etc must be assigned to >> some particular schema. It seems not particularly important that >> extension names be schema-qualified, though --- the use-case for having >> two different extensions named "foo" installed simultaneously seems >> pretty darn small. On the other hand, if we were enforcing that all >> objects contained in an extension belong to the same schema, it'd make >> logistical sense to consider that the extension itself belongs to that >> schema as well. But last I heard we didn't want to enforce such a >> restriction. > Why not? This feature seems to be pretty heavily designed around the > assumption that everything's going to live in one schema, so is there > any harm in making that explicit? > In previous discussions IIRC the consensus was that we should not force that on either Extension writers or users. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 12/07/2010 11:13 AM, Robert Haas wrote: >> Why not? This feature seems to be pretty heavily designed around the >> assumption that everything's going to live in one schema, so is there >> any harm in making that explicit? > In previous discussions IIRC the consensus was that we should not force > that on either Extension writers or users. It's not very hard to imagine a complicated extension wanting to spread itself across multiple schemas --- for example, one schema for "public" functions and a separate one for internal objects might be desirable. So I'm definitely not in favor of trying to force a single-schema design on people. Although ... if the restriction did exist, one could imagine building that complex extension in two parts, foo and foo_internal. To make this work conveniently you'd need some sort of "requires" mechanism for extensions. The other problem is that foo will certainly need to know which schema foo_internal got loaded into. Anyway the main problem at the moment is that the proposed design is meant to allow "relocatable" extensions, but it doesn't behave pleasantly in the case where somebody tries to relocate a non-relocatable extension. It also strikes me that the plan appears to be to support ALTER EXTENSION SET SCHEMA to relocate an extension after the fact, but this will absolutely *not* work with the available infrastructure. Remember that example of a SQL function with a SET search_path = @extschema@ option? There's no way to fix that up, nor any other case where the schema was substituted into an object declaration. So I'm back to thinking that the textual-substitution idea is fundamentally deficient. regards, tom lane
On Tue, Dec 7, 2010 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Anyway the main problem at the moment is that the proposed design is > meant to allow "relocatable" extensions, but it doesn't behave > pleasantly in the case where somebody tries to relocate a > non-relocatable extension. > > It also strikes me that the plan appears to be to support ALTER > EXTENSION SET SCHEMA to relocate an extension after the fact, but this > will absolutely *not* work with the available infrastructure. Remember > that example of a SQL function with a SET search_path = @extschema@ > option? There's no way to fix that up, nor any other case where the > schema was substituted into an object declaration. So I'm back to > thinking that the textual-substitution idea is fundamentally deficient. I think you've gotten to the heart of the matter here. Extensions need to either be schema objects, or not. If they are, let's go all the way and compel everything in the extension to live in the schema that owns it, and make the extension itself live in that schema too. You can even imagine two different users, each with their own schema, installing the same extension with different settings or something (pay no attention to the man waving his hands). On the other hand, if they're NOT schema objects, then ALTER EXTENSION .. SET SCHEMA Is not well-defined and we should reject that portion of this effort. Being half-way in the middle doesn't seem like a good idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I think you've gotten to the heart of the matter here. Extensions > need to either be schema objects, or not. If they are, let's go all > the way and compel everything in the extension to live in the schema > that owns it, and make the extension itself live in that schema too. > You can even imagine two different users, each with their own schema, > installing the same extension with different settings or something > (pay no attention to the man waving his hands). On the other hand, if > they're NOT schema objects, then ALTER EXTENSION .. SET SCHEMA Is not > well-defined and we should reject that portion of this effort. Being > half-way in the middle doesn't seem like a good idea. I confess to not having paid a whole lot of attention to the threads about this feature, so maybe this point has been addressed already, but: what of the schema itself? That is, if an extension has some/all of its objects in a given schema, is that schema itself one of the objects created/owned by the extension, or not? I can imagine use-cases for both ways, but it seems like which of these models you choose is a pretty critical aspect of how you envision all this. In particular I wonder whether directly renaming an owned schema would cover the use-cases for ALTER EXTENSION .. SET SCHEMA. OTOH, this isn't going to help for what I suspect is the *real* motivating use-case, namely "I dumped all my extensions into schema public and now I've thought better of it". If you dumped an extension into public it clearly can't own that. Anyway, in a less blue-sky vein: we could fix some of these problems by having an explicit relocatable-or-not property for extensions. If it is relocatable, it's required to keep all its owned objects in the target schema, and ALTER EXTENSION .. SET SCHEMA is allowed; else not. This does nothing for the fix-the-search_path-property problem, though. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > I confess to not having paid a whole lot of attention to the threads > about this feature, so maybe this point has been addressed already, but: > what of the schema itself? That is, if an extension has some/all of its > objects in a given schema, is that schema itself one of the objects > created/owned by the extension, or not? I can imagine use-cases for > both ways, but it seems like which of these models you choose is a > pretty critical aspect of how you envision all this. In particular Well both cases are supported. If the extension's script issue the CREATE SCHEMA command, then we track the dependency, but the script is free to create its objects into whatever already existing schema. Some extensions you want separated, and some you want to put them all in the same schema, that's the documentation example using "utils" here. > I wonder whether directly renaming an owned schema would cover the > use-cases for ALTER EXTENSION .. SET SCHEMA. OTOH, this isn't going > to help for what I suspect is the *real* motivating use-case, namely > "I dumped all my extensions into schema public and now I've thought > better of it". If you dumped an extension into public it clearly > can't own that. Exactly. > Anyway, in a less blue-sky vein: we could fix some of these problems by > having an explicit relocatable-or-not property for extensions. If it is > relocatable, it's required to keep all its owned objects in the target > schema, and ALTER EXTENSION .. SET SCHEMA is allowed; else not. This > does nothing for the fix-the-search_path-property problem, though. The search_path is the complex (as in AI complete) part of it, but given your idea here, we could make it so that only the non-relocatable extensions benefit from the @extschema@ placeholder. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr
On Dec 7, 2010, at 1:17 PM, Dimitri Fontaine wrote: >> Anyway, in a less blue-sky vein: we could fix some of these problems by >> having an explicit relocatable-or-not property for extensions. If it is >> relocatable, it's required to keep all its owned objects in the target >> schema, and ALTER EXTENSION .. SET SCHEMA is allowed; else not. This >> does nothing for the fix-the-search_path-property problem, though. > > The search_path is the complex (as in AI complete) part of it, but given > your idea here, we could make it so that only the non-relocatable > extensions benefit from the @extschema@ placeholder. +1 That might be an appropriate compromise. Best, David
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Anyway, in a less blue-sky vein: we could fix some of these problems by >> having an explicit relocatable-or-not property for extensions. If it is >> relocatable, it's required to keep all its owned objects in the target >> schema, and ALTER EXTENSION .. SET SCHEMA is allowed; else not. This >> does nothing for the fix-the-search_path-property problem, though. > The search_path is the complex (as in AI complete) part of it, but given > your idea here, we could make it so that only the non-relocatable > extensions benefit from the @extschema@ placeholder. Er ... what good is that? A non-relocatable extension doesn't *need* any such substitution, because it knows perfectly well what schema it's putting its stuff into. Only the relocatable case has use for it. So you might as well drop the substitution mechanism entirely. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Er ... what good is that? A non-relocatable extension doesn't *need* > any such substitution, because it knows perfectly well what schema it's > putting its stuff into. Only the relocatable case has use for it. So > you might as well drop the substitution mechanism entirely. Funnily enough, I see it the exact opposite way. relocatable is true A relocatable extension installs all its object into the first schema of the search_path. As an extension's script author,you have to make sure you're not schema qualifying any object that you create. Note that contrib/ is working this way but forcing the search_path first entry into being "public". relocatable is false An extension that needs to know where some of its objects are installed is not relocatable. As the user won't be ableto change the schema where the extensions gets installed, he's given the possibility to specify the schema at installationtime. The extension installation script is then required to use the @extschema@ placeholer as the schemato work with. That will typically mean the script's first line is: SET search_path TO @extschema@; Then you can also CREATE FUNCTION … SET search_path TO @extschema@ for security reasons. With that support in, the CREATE EXTENSION foo WITH SCHEMA bar could simply run the ALTER EXTENSION foo SET SCHEMA bar internally, so that's not a burden for the user. So that simple things are simple both for the extension's author and the users, but complex things still possible and supported here. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support