Обсуждение: parallel workers and client encoding

Поиск
Список
Период
Сортировка

parallel workers and client encoding

От
Peter Eisentraut
Дата:
There appears to be a problem with how client encoding is handled in the
communication from parallel workers.  In a parallel worker, the client
encoding setting is inherited from its creating process as part of the
GUC setup.  So any plain-text stuff the parallel worker sends to its
leader is actually converted to the client encoding.  Since most data is
sent in binary format, the plain-text provision applies mainly to notice
and error messages.  At the other end, error messages are parsed using
pq_parse_errornotice(), which internally uses routines that were meant
for communication from the client, and therefore will convert everything
back from the client encoding to the server encoding.  So this whole
thing actually happens to work as long as round tripping is possible
between the involved encodings.

In cases where it isn't, it's still hard to notice the difference
because depending on whether you get a parallel plan or not, the
following happens:

not parallel: conversion error happens between server and client, client
sees an error message about that

parallel: conversion error happens between worker and leader, worker
generates an error message about that, sends it to leader, leader
forwards it to client

The client sees the same error message in both cases.

To construct a case where this makes a difference, the leader has to be
set up to catch certain errors.  Here is an example:

"""
create table test1 (a int, b text);
truncate test1;
insert into test1 values (1, 'a');

create or replace function test1() returns text language plpgsql
as $$
declare
   res text;
begin
   perform from test1 where a = test2();
   return res;
exception when division_by_zero then
   return 'boom';
end;
$$;

create or replace function test2() returns int language plpgsql
parallel safe
as $$
begin
   raise division_by_zero using message = 'Motörhead';
   return 1;
end
$$;

set force_parallel_mode to on;

select test1();
"""

With client_encoding = server_encoding, this will return a single row
'boom'.  But with, say, database encoding UTF8 and
PGCLIENTENCODING=KOI8R, it will error:

ERROR:  22P05: character with byte sequence 0xef 0xbe 0x83 in encoding
"UTF8" has no equivalent in encoding "KOI8R"
CONTEXT:  parallel worker

(Note that changing force_parallel_mode does not force replanning in
plpgsql, so if you run test1() first before setting force_parallel_mode,
then you won't get the error.)

Attached is a patch to illustrates how this could be fixed.  There might
be similar issues elsewhere.  The notification propagation in particular
could be affected.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: parallel workers and client encoding

От
Peter Eisentraut
Дата:
On 6/6/16 9:45 PM, Peter Eisentraut wrote:
> There appears to be a problem with how client encoding is handled in the
> communication from parallel workers.

I have added this to the open items for now.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: parallel workers and client encoding

От
Robert Haas
Дата:
On Mon, Jun 6, 2016 at 9:45 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> There appears to be a problem with how client encoding is handled in the
> communication from parallel workers.  In a parallel worker, the client
> encoding setting is inherited from its creating process as part of the GUC
> setup.  So any plain-text stuff the parallel worker sends to its leader is
> actually converted to the client encoding.  Since most data is sent in
> binary format, the plain-text provision applies mainly to notice and error
> messages.  At the other end, error messages are parsed using
> pq_parse_errornotice(), which internally uses routines that were meant for
> communication from the client, and therefore will convert everything back
> from the client encoding to the server encoding.  So this whole thing
> actually happens to work as long as round tripping is possible between the
> involved encodings.

Hmm.  I didn't realize that we had encodings where round-tripping
wasn't possible.  I figured that if you could convert from A to B, you
would also be able to convert from B to A.  I see that this isn't
necessarily true in theory, but I had assumed (incorrectly, it seems)
that it was true in practice.  It seems very odd to me.

> Attached is a patch to illustrates how this could be fixed.  There might be
> similar issues elsewhere.  The notification propagation in particular could
> be affected.

Making the parallel worker always use the database encoding seems like
a good approach to me, but won't the changes you made to
HandleParallelMessage() leave the expect client encoding in the wrong
state if pq_parse_errornotice throws an error?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: parallel workers and client encoding

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jun 6, 2016 at 9:45 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> ...  So this whole thing
>> actually happens to work as long as round tripping is possible between the
>> involved encodings.

> Hmm.  I didn't realize that we had encodings where round-tripping
> wasn't possible.  I figured that if you could convert from A to B, you
> would also be able to convert from B to A.

Uh, that's not the point: the real problem is when B is simply smaller
than A.  For example, server encoding utf8, client encoding latin1.
The current code results in failures if any non-latin1 data has to be
transmitted from worker to leader, even though the query might not have
ever sent that data to the client, and therefore would work fine in
non-parallel mode.
        regards, tom lane



Re: parallel workers and client encoding

От
Robert Haas
Дата:
On Thu, Jun 9, 2016 at 1:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Jun 6, 2016 at 9:45 PM, Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>>> ...  So this whole thing
>>> actually happens to work as long as round tripping is possible between the
>>> involved encodings.
>
>> Hmm.  I didn't realize that we had encodings where round-tripping
>> wasn't possible.  I figured that if you could convert from A to B, you
>> would also be able to convert from B to A.
>
> Uh, that's not the point: the real problem is when B is simply smaller
> than A.  For example, server encoding utf8, client encoding latin1.
> The current code results in failures if any non-latin1 data has to be
> transmitted from worker to leader, even though the query might not have
> ever sent that data to the client, and therefore would work fine in
> non-parallel mode.

So, I don't think this is true.  First, to be clear, there's no
encoding conversion going on when tuples are sent from worker to
leader, so that case has no problem of this type at all.  This is
limited to non-tuple protocol messages: errors, notices, and possibly
notifies.

Second, if you can't convert an error or notice message (or possibly a
notify message) from the server encoding to the client coding, you are
definitely going to fail, with or without parallel query, because that
conversion has to be done at some stage anyway.  The difference is
that without parallel query, we convert server->client and that's it,
whereas with parallel query we convert server->client->server->client
if the error occurs in the worker, and so the conversion in the
reverse direction (client back to server) can introduce a failure that
couldn't occur otherwise.

That's a pretty obscure corner case, but of course it should still be fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: parallel workers and client encoding

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jun 9, 2016 at 1:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The current code results in failures if any non-latin1 data has to be
>> transmitted from worker to leader, even though the query might not have
>> ever sent that data to the client, and therefore would work fine in
>> non-parallel mode.

> So, I don't think this is true.  First, to be clear, there's no
> encoding conversion going on when tuples are sent from worker to
> leader, so that case has no problem of this type at all.  This is
> limited to non-tuple protocol messages: errors, notices, and possibly
> notifies.

Okay ...

> Second, if you can't convert an error or notice message (or possibly a
> notify message) from the server encoding to the client coding, you are
> definitely going to fail, with or without parallel query, because that
> conversion has to be done at some stage anyway.

Only if the message gets sent to the client, which it might never be;
for example because the query is inside a plpgsql exception block that
traps the error.

You do have a bug here; please don't argue you don't.
        regards, tom lane



Re: parallel workers and client encoding

От
Robert Haas
Дата:
On Thu, Jun 9, 2016 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Second, if you can't convert an error or notice message (or possibly a
>> notify message) from the server encoding to the client coding, you are
>> definitely going to fail, with or without parallel query, because that
>> conversion has to be done at some stage anyway.
>
> Only if the message gets sent to the client, which it might never be;
> for example because the query is inside a plpgsql exception block that
> traps the error.

Hmm... yeah, OK, that's another case.

> You do have a bug here; please don't argue you don't.

I just said it was a bug in my previous post, so I think it is clear
that I am not arguing that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: parallel workers and client encoding

От
Tatsuo Ishii
Дата:
> There appears to be a problem with how client encoding is handled in
> the communication from parallel workers.

Ouch.

>  In a parallel worker, the
> client encoding setting is inherited from its creating process as part
> of the GUC setup.  So any plain-text stuff the parallel worker sends
> to its leader is actually converted to the client encoding.  Since
> most data is sent in binary format, the plain-text provision applies
> mainly to notice and error messages.  At the other end, error messages
> are parsed using pq_parse_errornotice(), which internally uses
> routines that were meant for communication from the client, and
> therefore will convert everything back from the client encoding to the
> server encoding.  So this whole thing actually happens to work as long
> as round tripping is possible between the involved encodings.
>
> In cases where it isn't, it's still hard to notice the difference
> because depending on whether you get a parallel plan or not, the
> following happens:
>
> not parallel: conversion error happens between server and client,
> client sees an error message about that
>
> parallel: conversion error happens between worker and leader, worker
> generates an error message about that, sends it to leader, leader
> forwards it to client
>
> The client sees the same error message in both cases.
>
> To construct a case where this makes a difference, the leader has to
> be set up to catch certain errors.  Here is an example:
>
> """
> create table test1 (a int, b text);
> truncate test1;
> insert into test1 values (1, 'a');
>
> create or replace function test1() returns text language plpgsql
> as $$
> declare
>   res text;
> begin
>   perform from test1 where a = test2();
>   return res;
> exception when division_by_zero then
>   return 'boom';
> end;
> $$;
>
> create or replace function test2() returns int language plpgsql
> parallel safe
> as $$
> begin
>   raise division_by_zero using message = 'Motörhead';
>   return 1;
> end
> $$;
>
> set force_parallel_mode to on;
>
> select test1();
> """
>
> With client_encoding = server_encoding, this will return a single row
> 'boom'.  But with, say, database encoding UTF8 and
> PGCLIENTENCODING=KOI8R, it will error:
>
> ERROR: 22P05: character with byte sequence 0xef 0xbe 0x83 in encoding
> "UTF8" has no equivalent in encoding "KOI8R"
> CONTEXT:  parallel worker
>
> (Note that changing force_parallel_mode does not force replanning in
> plpgsql, so if you run test1() first before setting
> force_parallel_mode, then you won't get the error.)
>
> Attached is a patch to illustrates how this could be fixed.  There
> might be similar issues elsewhere.  The notification propagation in
> particular could be affected.

Something like SetClientEncoding(GetDatabaseEncoding()) is a Little
bit ugly. It would be nice if we could have a switch to turn off the
automatic encoding conversion in the future, but for 9.6, I feel I'm
fine with your proposed patch.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: parallel workers and client encoding

От
Peter Eisentraut
Дата:
On 6/6/16 9:45 PM, Peter Eisentraut wrote:
> Attached is a patch to illustrates how this could be fixed.  There might
> be similar issues elsewhere.  The notification propagation in particular
> could be affected.

Tracing the code, NotificationResponse messages are converted to the 
client encoding during transmission from the worker to the leader and 
then sent on binarily to the client, so this should appear to work at 
the moment.  But it will break if we make a change like I suggested, 
namely changing the client encoding in the worker process to be the 
server encoding, because then nothing will transcode it before it 
reaches the client anymore.  So this will need a well-considered 
solution in concert with the error/notice issue.

Then again, it's not clear to me under what circumstances a parallel 
worker could or should be sending a NotificationResponse.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: parallel workers and client encoding

От
Noah Misch
Дата:
On Thu, Jun 09, 2016 at 12:04:59PM -0400, Peter Eisentraut wrote:
> On 6/6/16 9:45 PM, Peter Eisentraut wrote:
> >There appears to be a problem with how client encoding is handled in the
> >communication from parallel workers.
> 
> I have added this to the open items for now.

[Action required within 72 hours.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
efforts toward speedy resolution.  Thanks.

[1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com



Re: parallel workers and client encoding

От
Robert Haas
Дата:
On Sun, Jun 12, 2016 at 3:05 AM, Noah Misch <noah@leadboat.com> wrote:
> On Thu, Jun 09, 2016 at 12:04:59PM -0400, Peter Eisentraut wrote:
>> On 6/6/16 9:45 PM, Peter Eisentraut wrote:
>> >There appears to be a problem with how client encoding is handled in the
>> >communication from parallel workers.
>>
>> I have added this to the open items for now.
>
> [Action required within 72 hours.  This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.

There is no realistic way that I am going to have this fixed before
beta2.  There are currently 10 open items listed on the open items
page, of which 8 relate to my commits and 5 to parallel query in
particular.  I am not going to get 8 issues fixed in the next 4 days,
or even the next 6 days if you assume I can work through the weekend
on this (and that it would be desirable that I be slinging fixes into
the tree just before the wrap, which seems doubtful).  Furthermore, of
those issues, I judge this to be least important (except for the
documentation update, but that's pending further from Peter Geoghegan
about exactly what he thinks needs to be changed).

Therefore, my plan is to revisit this in two weeks once beta2 is out
the door, unless someone else would like to volunteer to fix it
sooner.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: parallel workers and client encoding

От
Peter Eisentraut
Дата:
On 6/10/16 2:08 PM, Peter Eisentraut wrote:
> On 6/6/16 9:45 PM, Peter Eisentraut wrote:
>> Attached is a patch to illustrates how this could be fixed.  There might
>> be similar issues elsewhere.  The notification propagation in particular
>> could be affected.
>
> Tracing the code, NotificationResponse messages are converted to the
> client encoding during transmission from the worker to the leader and
> then sent on binarily to the client, so this should appear to work at
> the moment.  But it will break if we make a change like I suggested,
> namely changing the client encoding in the worker process to be the
> server encoding, because then nothing will transcode it before it
> reaches the client anymore.  So this will need a well-considered
> solution in concert with the error/notice issue.
>
> Then again, it's not clear to me under what circumstances a parallel
> worker could or should be sending a NotificationResponse.

Modulo that last point, here is a patch that shows how I think this
could work, in combination with the patch I posted previously that sets
the "client encoding" in the parallel worker to the server encoding.

This patch disassembles the NotificationResponse message with a
temporary client encoding, and then sends it off to the real frontend
using the real client encoding.

Doing it this way also takes care of a few special cases that
NotifyMyFrontEnd() handles, such as a client with protocol version 2
that doesn't expect a payload in the message.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: parallel workers and client encoding

От
Peter Eisentraut
Дата:
On 6/9/16 7:16 PM, Tatsuo Ishii wrote:
> Something like SetClientEncoding(GetDatabaseEncoding()) is a Little
> bit ugly. It would be nice if we could have a switch to turn off the
> automatic encoding conversion in the future, but for 9.6, I feel I'm
> fine with your proposed patch.

One way to make this nicer would be to put the encoding of a string into 
the StringInfoData structure.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: parallel workers and client encoding

От
Peter Geoghegan
Дата:
On Mon, Jun 13, 2016 at 9:39 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> There is no realistic way that I am going to have this fixed before
> beta2.  There are currently 10 open items listed on the open items
> page, of which 8 relate to my commits and 5 to parallel query in
> particular.  I am not going to get 8 issues fixed in the next 4 days,
> or even the next 6 days if you assume I can work through the weekend
> on this (and that it would be desirable that I be slinging fixes into
> the tree just before the wrap, which seems doubtful).  Furthermore, of
> those issues, I judge this to be least important (except for the
> documentation update, but that's pending further from Peter Geoghegan
> about exactly what he thinks needs to be changed).

I got sidetracked on that, in part due to investigating a bug in the
9.6 external sort work. I'll have more information on that, soon.

-- 
Peter Geoghegan



Re: parallel workers and client encoding

От
Robert Haas
Дата:
On Mon, Jun 13, 2016 at 10:27 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Modulo that last point, here is a patch that shows how I think this could
> work, in combination with the patch I posted previously that sets the
> "client encoding" in the parallel worker to the server encoding.
>
> This patch disassembles the NotificationResponse message with a temporary
> client encoding, and then sends it off to the real frontend using the real
> client encoding.
>
> Doing it this way also takes care of a few special cases that
> NotifyMyFrontEnd() handles, such as a client with protocol version 2 that
> doesn't expect a payload in the message.

How does this address the concern raised in the last sentence of
https://www.postgresql.org/message-id/CA+TgmoaAAEXmjVMB4nM=znF_5b9JhUim6q3aFrO4SpT23TiN8g@mail.gmail.com
?  It seems that if an error occurs between the two SetClientEncoding
calls, the change will persist for the rest of the session, resulting
in chaos.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: parallel workers and client encoding

От
Robert Haas
Дата:
On Mon, Jun 27, 2016 at 12:53 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jun 13, 2016 at 10:27 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> Modulo that last point, here is a patch that shows how I think this could
>> work, in combination with the patch I posted previously that sets the
>> "client encoding" in the parallel worker to the server encoding.
>>
>> This patch disassembles the NotificationResponse message with a temporary
>> client encoding, and then sends it off to the real frontend using the real
>> client encoding.
>>
>> Doing it this way also takes care of a few special cases that
>> NotifyMyFrontEnd() handles, such as a client with protocol version 2 that
>> doesn't expect a payload in the message.
>
> How does this address the concern raised in the last sentence of
> https://www.postgresql.org/message-id/CA+TgmoaAAEXmjVMB4nM=znF_5b9JhUim6q3aFrO4SpT23TiN8g@mail.gmail.com
> ?  It seems that if an error occurs between the two SetClientEncoding
> calls, the change will persist for the rest of the session, resulting
> in chaos.

Please find attached an a patch for a proposed alternative approach.
This does the following:

1. When the client_encoding GUC is changed in the worker,
SetClientEncoding() is not called.  Thus, GetClientEncoding() in the
worker always returns the database encoding, regardless of how the
client encoding is set.  This is better than your approach of calling
SetClientEncoding() during worker startup, I think, because the worker
could call a parallel-safe function with SET clause, and one of the
GUCs temporarily set could be client_encoding.  That would be stupid,
but somebody could do it.

2. A new function pq_getmsgrawstring() is added.  This is like
pq_getmsgstring() but it does no encoding conversion.
pq_parse_errornotice() is changed to use pq_getmsgrawstring() instead
of pq_getmsgstring().  Because of (1), when the leader receives an
ErrorResponse or NoticeResponse from the worker, it will not have been
subject to encoding conversion; because of this item, the leader will
not try to convert it either when initially parsing it.  So the extra
encoding round-trip is avoided.

3. The changes for NotifyResponse which you proposed are included
here, but with the modification that pq_getmsgrawstring() is used.

This seems to fix your original test case for me, and hopefully all of
the related cases also.  Review is appreciated.  The main thing about
this is that it doesn't rely on being able to make temporary changes
to global variables, thus hopefully making it robust in the face of
non-local transfer of control.

(Official status update: I'll commit this on Thursday unless anyone
reports a problem with it before then.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: parallel workers and client encoding

От
Peter Eisentraut
Дата:
On 6/27/16 5:37 PM, Robert Haas wrote:
> Please find attached an a patch for a proposed alternative approach.
> This does the following:
>
> 1. When the client_encoding GUC is changed in the worker,
> SetClientEncoding() is not called.

I think this could be a problem, because then the client encoding in the 
background worker process is inherited from the postmaster, which could 
in theory be anything.  I think you need to set it at least once to the 
correct value.

> Thus, GetClientEncoding() in the
> worker always returns the database encoding, regardless of how the
> client encoding is set.  This is better than your approach of calling
> SetClientEncoding() during worker startup, I think, because the worker
> could call a parallel-safe function with SET clause, and one of the
> GUCs temporarily set could be client_encoding.  That would be stupid,
> but somebody could do it.

I think if we're worried about this, then this should be an error, but 
that's a minor concern.

I realize that we don't have a good mechanism in the GUC code to 
distinguish these two situations.

Then again, this shouldn't be so much different in concept from the 
restoring of GUC variables in the EXEC_BACKEND case.  There is special 
code in set_config_option() to bypass some of the rules in that case. 
RestoreGUCState() should be able to get the same sort of pass.

Also, set_config_option() knows something about what settings are 
allowed in a parallel worker, so I wonder if setting client_encoding 
would even work in spite of that?

> 2. A new function pq_getmsgrawstring() is added.  This is like
> pq_getmsgstring() but it does no encoding conversion.
> pq_parse_errornotice() is changed to use pq_getmsgrawstring() instead
> of pq_getmsgstring().  Because of (1), when the leader receives an
> ErrorResponse or NoticeResponse from the worker, it will not have been
> subject to encoding conversion; because of this item, the leader will
> not try to convert it either when initially parsing it.  So the extra
> encoding round-trip is avoided.

I like that.

> 3. The changes for NotifyResponse which you proposed are included
> here, but with the modification that pq_getmsgrawstring() is used.

and that.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: parallel workers and client encoding

От
Robert Haas
Дата:
On Tue, Jun 28, 2016 at 10:10 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 6/27/16 5:37 PM, Robert Haas wrote:
>> Please find attached an a patch for a proposed alternative approach.
>> This does the following:
>>
>> 1. When the client_encoding GUC is changed in the worker,
>> SetClientEncoding() is not called.
>
> I think this could be a problem, because then the client encoding in the
> background worker process is inherited from the postmaster, which could in
> theory be anything.  I think you need to set it at least once to the correct
> value.

Fixed in the attached version.

>> Thus, GetClientEncoding() in the
>> worker always returns the database encoding, regardless of how the
>> client encoding is set.  This is better than your approach of calling
>> SetClientEncoding() during worker startup, I think, because the worker
>> could call a parallel-safe function with SET clause, and one of the
>> GUCs temporarily set could be client_encoding.  That would be stupid,
>> but somebody could do it.
>
> I think if we're worried about this, then this should be an error, but
> that's a minor concern.

We can't actually throw an error at that point, because we really do
want the GUC to have the same value in the worker as it does in the
leader.  Otherwise, anything that can observe the value of an
arbitrary GUC suddenly becomes parallel-restricted, which is certainly
unwanted.

> I realize that we don't have a good mechanism in the GUC code to distinguish
> these two situations.
>
> Then again, this shouldn't be so much different in concept from the
> restoring of GUC variables in the EXEC_BACKEND case.  There is special code
> in set_config_option() to bypass some of the rules in that case.
> RestoreGUCState() should be able to get the same sort of pass.

I think we can use the existing flag InitializingParallelWorker to
handle this case.  See attached.

> Also, set_config_option() knows something about what settings are allowed in
> a parallel worker, so I wonder if setting client_encoding would even work in
> spite of that?

I think that with this change, a SET client_encoding on a supposedly
parallel-safe function will just fail to have any effect when the
function runs inside a worker.  The attached patch will make that kind
of thing fail outright when attempted inside a parallel worker.

>> 2. A new function pq_getmsgrawstring() is added.  This is like
>> pq_getmsgstring() but it does no encoding conversion.
>> pq_parse_errornotice() is changed to use pq_getmsgrawstring() instead
>> of pq_getmsgstring().  Because of (1), when the leader receives an
>> ErrorResponse or NoticeResponse from the worker, it will not have been
>> subject to encoding conversion; because of this item, the leader will
>> not try to convert it either when initially parsing it.  So the extra
>> encoding round-trip is avoided.
>
> I like that.
>
>> 3. The changes for NotifyResponse which you proposed are included
>> here, but with the modification that pq_getmsgrawstring() is used.
>
> and that.

Thanks for the review.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: parallel workers and client encoding

От
Peter Eisentraut
Дата:
I'm happy with this patch.


On 6/29/16 12:41 PM, Robert Haas wrote:
> On Tue, Jun 28, 2016 at 10:10 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> On 6/27/16 5:37 PM, Robert Haas wrote:
>>> Please find attached an a patch for a proposed alternative approach.
>>> This does the following:
>>>
>>> 1. When the client_encoding GUC is changed in the worker,
>>> SetClientEncoding() is not called.
>>
>> I think this could be a problem, because then the client encoding in the
>> background worker process is inherited from the postmaster, which could in
>> theory be anything.  I think you need to set it at least once to the correct
>> value.
>
> Fixed in the attached version.
>
>>> Thus, GetClientEncoding() in the
>>> worker always returns the database encoding, regardless of how the
>>> client encoding is set.  This is better than your approach of calling
>>> SetClientEncoding() during worker startup, I think, because the worker
>>> could call a parallel-safe function with SET clause, and one of the
>>> GUCs temporarily set could be client_encoding.  That would be stupid,
>>> but somebody could do it.
>>
>> I think if we're worried about this, then this should be an error, but
>> that's a minor concern.
>
> We can't actually throw an error at that point, because we really do
> want the GUC to have the same value in the worker as it does in the
> leader.  Otherwise, anything that can observe the value of an
> arbitrary GUC suddenly becomes parallel-restricted, which is certainly
> unwanted.
>
>> I realize that we don't have a good mechanism in the GUC code to distinguish
>> these two situations.
>>
>> Then again, this shouldn't be so much different in concept from the
>> restoring of GUC variables in the EXEC_BACKEND case.  There is special code
>> in set_config_option() to bypass some of the rules in that case.
>> RestoreGUCState() should be able to get the same sort of pass.
>
> I think we can use the existing flag InitializingParallelWorker to
> handle this case.  See attached.
>
>> Also, set_config_option() knows something about what settings are allowed in
>> a parallel worker, so I wonder if setting client_encoding would even work in
>> spite of that?
>
> I think that with this change, a SET client_encoding on a supposedly
> parallel-safe function will just fail to have any effect when the
> function runs inside a worker.  The attached patch will make that kind
> of thing fail outright when attempted inside a parallel worker.
>
>>> 2. A new function pq_getmsgrawstring() is added.  This is like
>>> pq_getmsgstring() but it does no encoding conversion.
>>> pq_parse_errornotice() is changed to use pq_getmsgrawstring() instead
>>> of pq_getmsgstring().  Because of (1), when the leader receives an
>>> ErrorResponse or NoticeResponse from the worker, it will not have been
>>> subject to encoding conversion; because of this item, the leader will
>>> not try to convert it either when initially parsing it.  So the extra
>>> encoding round-trip is avoided.
>>
>> I like that.
>>
>>> 3. The changes for NotifyResponse which you proposed are included
>>> here, but with the modification that pq_getmsgrawstring() is used.
>>
>> and that.
>
> Thanks for the review.
>
>
>
>


-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: parallel workers and client encoding

От
Robert Haas
Дата:
On Wed, Jun 29, 2016 at 10:52 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> I'm happy with this patch.

Great.  I have committed it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company