Обсуждение: Advice request : simultaneous function/data updates on many databases

Поиск
Список
Период
Сортировка

Advice request : simultaneous function/data updates on many databases

От
Rory Campbell-Lange
Дата:
We have many databases of the same type separated for data governance
reasons. They, however, share the same web front-end code.

Presently, replacing functions and performing data updates on the
databases in series often executes across all databases in less than a
minute. (The updates are currently done with simple sql files connecting
to each database and then loading a stub file pointing to each function
to drop and reload, and running the data update queries.)

However, for larger updates, the time when the front end code is
out-of-step with the database can cause end-user problems.

Unfortunately our schema arrangement isn't clean enough to swap out
function schemas in a transaction to sort out that part of the problem
(if in fact that would work anyway).

One solution might be to do the updates in parallel. Another thought
would be to somehow execute the code update from a text field in a table
in each database triggered with pg_cron.

Bearing in mind the possible problems of connection saturation or
massive IO spikes, I'd be grateful to learn of any thoughts on how to
negotiate this problem.

Rory



Re: Advice request : simultaneous function/data updates on manydatabases

От
Adrian Klaver
Дата:
On 3/4/20 2:04 PM, Rory Campbell-Lange wrote:
> We have many databases of the same type separated for data governance
> reasons. They, however, share the same web front-end code.
> 
> Presently, replacing functions and performing data updates on the
> databases in series often executes across all databases in less than a
> minute. (The updates are currently done with simple sql files connecting
> to each database and then loading a stub file pointing to each function
> to drop and reload, and running the data update queries.)
> 
> However, for larger updates, the time when the front end code is
> out-of-step with the database can cause end-user problems.

So the issue is synchronization between the code in the database and the 
code outside the database?

I'm assuming the problems are changes in function signatures and return 
values?

> 
> Unfortunately our schema arrangement isn't clean enough to swap out
> function schemas in a transaction to sort out that part of the problem
> (if in fact that would work anyway).
> 
> One solution might be to do the updates in parallel. Another thought
> would be to somehow execute the code update from a text field in a table
> in each database triggered with pg_cron.
> 
> Bearing in mind the possible problems of connection saturation or
> massive IO spikes, I'd be grateful to learn of any thoughts on how to
> negotiate this problem.
> 
> Rory
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Advice request : simultaneous function/data updates on manydatabases

От
Rory Campbell-Lange
Дата:
On 04/03/20, Adrian Klaver (adrian.klaver@aklaver.com) wrote:
> On 3/4/20 2:04 PM, Rory Campbell-Lange wrote:
> > We have many databases of the same type separated for data governance
> > reasons. They, however, share the same web front-end code.
> > 
> > Presently, replacing functions and performing data updates on the
> > databases in series often executes across all databases in less than a
> > minute. (The updates are currently done with simple sql files connecting
> > to each database and then loading a stub file pointing to each function
> > to drop and reload, and running the data update queries.)
> > 
> > However, for larger updates, the time when the front end code is
> > out-of-step with the database can cause end-user problems.
> 
> So the issue is synchronization between the code in the database and the
> code outside the database?
> 
> I'm assuming the problems are changes in function signatures and return
> values?

That is one problem; sometimes we also need to make some table
definition or data changes.

> > Unfortunately our schema arrangement isn't clean enough to swap out
> > function schemas in a transaction to sort out that part of the problem
> > (if in fact that would work anyway).
> > 
> > One solution might be to do the updates in parallel. Another thought
> > would be to somehow execute the code update from a text field in a table
> > in each database triggered with pg_cron.
> > 
> > Bearing in mind the possible problems of connection saturation or
> > massive IO spikes, I'd be grateful to learn of any thoughts on how to
> > negotiate this problem.



Re: Advice request : simultaneous function/data updates on manydatabases

От
Adrian Klaver
Дата:
On 3/4/20 2:22 PM, Rory Campbell-Lange wrote:
> On 04/03/20, Adrian Klaver (adrian.klaver@aklaver.com) wrote:
>> On 3/4/20 2:04 PM, Rory Campbell-Lange wrote:
>>> We have many databases of the same type separated for data governance
>>> reasons. They, however, share the same web front-end code.
>>>
>>> Presently, replacing functions and performing data updates on the
>>> databases in series often executes across all databases in less than a
>>> minute. (The updates are currently done with simple sql files connecting
>>> to each database and then loading a stub file pointing to each function
>>> to drop and reload, and running the data update queries.)
>>>
>>> However, for larger updates, the time when the front end code is
>>> out-of-step with the database can cause end-user problems.
>>
>> So the issue is synchronization between the code in the database and the
>> code outside the database?
>>
>> I'm assuming the problems are changes in function signatures and return
>> values?
> 
> That is one problem; sometimes we also need to make some table
> definition or data changes.
> 

Alright, but the general issue is that the world as seen by the database 
can be different from that seen by the front end code.

So the solution is to make those world views sync, or am I missing 
something?

>>> Unfortunately our schema arrangement isn't clean enough to swap out
>>> function schemas in a transaction to sort out that part of the problem
>>> (if in fact that would work anyway).
>>>
>>> One solution might be to do the updates in parallel. Another thought
>>> would be to somehow execute the code update from a text field in a table
>>> in each database triggered with pg_cron.
>>>
>>> Bearing in mind the possible problems of connection saturation or
>>> massive IO spikes, I'd be grateful to learn of any thoughts on how to
>>> negotiate this problem.


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Advice request : simultaneous function/data updates on many databases

От
"David G. Johnston"
Дата:
On Wed, Mar 4, 2020 at 3:04 PM Rory Campbell-Lange <rory@campbell-lange.net> wrote:
However, for larger updates, the time when the front end code is
out-of-step with the database can cause end-user problems.

You should try very hard to structure your database migrations so that instead of going directly from uniquely valid state to another uniquely valid state you instead transition to a dual-ly valid state (i.e., don't break the old way of doing things while adding the pieces to make the new way of doing things work) and then separately remove the old valid state components once you know all of the software upgrades have been deployed.

Add New Stuff, Leave Old Stuff Alone
Remove Old Stuff

Triggers can be installed during the transition period to facilitate the duplication of data that will result.

This should be considerably easier for non-data impacting updates as you can just choose different names for the new stuff then remove the old stuff separately.

David J.

Re: Advice request : simultaneous function/data updates on manydatabases

От
Rory Campbell-Lange
Дата:
On 04/03/20, Adrian Klaver (adrian.klaver@aklaver.com) wrote:
> On 3/4/20 2:22 PM, Rory Campbell-Lange wrote:
> > On 04/03/20, Adrian Klaver (adrian.klaver@aklaver.com) wrote:
> > > On 3/4/20 2:04 PM, Rory Campbell-Lange wrote:
> > > > We have many databases of the same type separated for data governance
> > > > reasons. They, however, share the same web front-end code.
> > > > 
> > > > Presently, replacing functions and performing data updates on the
> > > > databases in series often executes across all databases in less than a
> > > > minute. (The updates are currently done with simple sql files connecting
> > > > to each database and then loading a stub file pointing to each function
> > > > to drop and reload, and running the data update queries.)
> > > > 
> > > > However, for larger updates, the time when the front end code is
> > > > out-of-step with the database can cause end-user problems.
> > > 
> > > So the issue is synchronization between the code in the database and the
> > > code outside the database?
> > > 
> > > I'm assuming the problems are changes in function signatures and return
> > > values?
> > 
> > That is one problem; sometimes we also need to make some table
> > definition or data changes.
> 
> Alright, but the general issue is that the world as seen by the database can
> be different from that seen by the front end code.
> 
> So the solution is to make those world views sync, or am I missing
> something?

Essentially we wish to reduce the window where the frontend and backend
aren't synchronised.

If we have (for example) 200 databases which each take 2 seconds to
update, a client could be on the wrong frontend code for over 6 minutes.

> > > > Unfortunately our schema arrangement isn't clean enough to swap out
> > > > function schemas in a transaction to sort out that part of the problem
> > > > (if in fact that would work anyway).
> > > > 
> > > > One solution might be to do the updates in parallel. Another thought
> > > > would be to somehow execute the code update from a text field in a table
> > > > in each database triggered with pg_cron.
> > > > 
> > > > Bearing in mind the possible problems of connection saturation or
> > > > massive IO spikes, I'd be grateful to learn of any thoughts on how to
> > > > negotiate this problem.



Re: Advice request : simultaneous function/data updates on manydatabases

От
Ron
Дата:
On 3/4/20 4:33 PM, Rory Campbell-Lange wrote:
> On 04/03/20, Adrian Klaver (adrian.klaver@aklaver.com) wrote:
>> On 3/4/20 2:22 PM, Rory Campbell-Lange wrote:
>>> On 04/03/20, Adrian Klaver (adrian.klaver@aklaver.com) wrote:
>>>> On 3/4/20 2:04 PM, Rory Campbell-Lange wrote:
>>>>> We have many databases of the same type separated for data governance
>>>>> reasons. They, however, share the same web front-end code.
>>>>>
>>>>> Presently, replacing functions and performing data updates on the
>>>>> databases in series often executes across all databases in less than a
>>>>> minute. (The updates are currently done with simple sql files connecting
>>>>> to each database and then loading a stub file pointing to each function
>>>>> to drop and reload, and running the data update queries.)
>>>>>
>>>>> However, for larger updates, the time when the front end code is
>>>>> out-of-step with the database can cause end-user problems.
>>>> So the issue is synchronization between the code in the database and the
>>>> code outside the database?
>>>>
>>>> I'm assuming the problems are changes in function signatures and return
>>>> values?
>>> That is one problem; sometimes we also need to make some table
>>> definition or data changes.
>> Alright, but the general issue is that the world as seen by the database can
>> be different from that seen by the front end code.
>>
>> So the solution is to make those world views sync, or am I missing
>> something?
> Essentially we wish to reduce the window where the frontend and backend
> aren't synchronised.
>
> If we have (for example) 200 databases which each take 2 seconds to
> update, a client could be on the wrong frontend code for over 6 minutes.

The only solution to that is parallel updates (not all 200 at once!!) with a 
progress bar.

>>>>> Unfortunately our schema arrangement isn't clean enough to swap out
>>>>> function schemas in a transaction to sort out that part of the problem
>>>>> (if in fact that would work anyway).
>>>>>
>>>>> One solution might be to do the updates in parallel. Another thought
>>>>> would be to somehow execute the code update from a text field in a table
>>>>> in each database triggered with pg_cron.
>>>>>
>>>>> Bearing in mind the possible problems of connection saturation or
>>>>> massive IO spikes, I'd be grateful to learn of any thoughts on how to
>>>>> negotiate this problem.
>

-- 
Angular momentum makes the world go 'round.



Re: Advice request : simultaneous function/data updates on manydatabases

От
Guyren Howe
Дата:
On Mar 4, 2020, at 14:33 , Rory Campbell-Lange <rory@campbell-lange.net> wrote:

Essentially we wish to reduce the window where the frontend and backend
aren't synchronised.

If we have (for example) 200 databases which each take 2 seconds to
update, a client could be on the wrong frontend code for over 6 minutes.ul[class*='mb-extra__public-links'], ul[class*='mb-note__public-links'], ul[class*='mb-task__public-links'] { display: none !important; }

Send each of the servers a PL/PGSQL method that executes all the things in a transaction and then waits until the same clock time to commit. Then all the servers are committing at the same moment. They will still be out of synch somewhat, but this would reduce the degree.

Re: Advice request : simultaneous function/data updates on manydatabases

От
Rory Campbell-Lange
Дата:
On 04/03/20, Guyren Howe (guyren@gmail.com) wrote:
> On Mar 4, 2020, at 14:33 , Rory Campbell-Lange <rory@campbell-lange.net> wrote:
> > 
> > Essentially we wish to reduce the window where the frontend and backend
> > aren't synchronised.
> > 
> > If we have (for example) 200 databases which each take 2 seconds to
> > update, a client could be on the wrong frontend code for over 6 minutes.
> 
> Send each of the servers a PL/PGSQL method that executes all the
> things in a transaction and then waits until the same clock time to
> commit. Then all the servers are committing at the same moment. They
> will still be out of synch somewhat, but this would reduce the degree.

This is a really interesting idea. 

Any thoughts on how to wrap pl/pgsql function dropping and recreation code
within a wrapper pl/pgsql function? 



Re: Advice request : simultaneous function/data updates on many databases

От
"David G. Johnston"
Дата:
On Wed, Mar 4, 2020 at 3:48 PM Rory Campbell-Lange <rory@campbell-lange.net> wrote:
Any thoughts on how to wrap pl/pgsql function dropping and recreation code
within a wrapper pl/pgsql function?

Not endorsing this but dynamic SQL works just fine (though can get hard to read).  Use format() and EXECUTE ... USING liberally.

CREATE FUNCTION perform_update()...
AS $outer$
BEGIN

drop_sql := $inner$ DROP FUNCTION ...; $inner$
EXECUTE drop_sql;

END;
$outer$;

David J.

Re: Advice request : simultaneous function/data updates on many databases

От
"David G. Johnston"
Дата:
On Wed, Mar 4, 2020 at 3:55 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wed, Mar 4, 2020 at 3:48 PM Rory Campbell-Lange <rory@campbell-lange.net> wrote:
Any thoughts on how to wrap pl/pgsql function dropping and recreation code
within a wrapper pl/pgsql function?

Not endorsing this but dynamic SQL works just fine (though can get hard to read).  Use format() and EXECUTE ... USING liberally.


Or, more readable depending upon your trust level:

INSERT INTO dynamic_codes VALUES (1, 'DROP FUNCTION ...');

CREATE FUNCTION execute_dynamic(code_id int)
AS $$
sql_cmd := (SELECT val FROM dynamic_codes WHERE id = code_id);
EXECUTE sql_cmd;
$$;

SELECT execute_dynamic(1);

David J.

Re: Advice request : simultaneous function/data updates on manydatabases

От
Adrian Klaver
Дата:
On 3/4/20 2:42 PM, Guyren Howe wrote:
> On Mar 4, 2020, at 14:33 , Rory Campbell-Lange <rory@campbell-lange.net 
> <mailto:rory@campbell-lange.net>> wrote:
>>
>> Essentially we wish to reduce the window where the frontend and backend
>> aren't synchronised.
>>
>> If we have (for example) 200 databases which each take 2 seconds to
>> update, a client could be on the wrong frontend code for over 6 minutes.
> 
> Send each of the servers a PL/PGSQL method that executes all the things 
> in a transaction and then waits until the same clock time to commit. 

How do you know what the clock time will be?

> Then all the servers are committing at the same moment. They will still 
> be out of synch somewhat, but this would reduce the degree.


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Advice request : simultaneous function/data updates on manydatabases

От
Rory Campbell-Lange
Дата:
On 04/03/20, David G. Johnston (david.g.johnston@gmail.com) wrote:
> On Wed, Mar 4, 2020 at 3:55 PM David G. Johnston <david.g.johnston@gmail.com>
> wrote:
> 
> > On Wed, Mar 4, 2020 at 3:48 PM Rory Campbell-Lange <
> > rory@campbell-lange.net> wrote:
> >
> >> Any thoughts on how to wrap pl/pgsql function dropping and recreation code
> >> within a wrapper pl/pgsql function?
> >
> >
> > Not endorsing this but dynamic SQL works just fine (though can get hard to
> > read).  Use format() and EXECUTE ... USING liberally.
> >
> >
> Or, more readable depending upon your trust level:
> 
> INSERT INTO dynamic_codes VALUES (1, 'DROP FUNCTION ...');
> 
> CREATE FUNCTION execute_dynamic(code_id int)
> AS $$
> sql_cmd := (SELECT val FROM dynamic_codes WHERE id = code_id);
> EXECUTE sql_cmd;
> $$;
> 
> SELECT execute_dynamic(1);

Thanks very much for the useful examples.

Based on your second example, we could drop and then reload a upgrade
schema with entries in dynamic_codes then use execute_dynamic(...) as
you suggest.

Any idea on how to run execute_dynamic across many databases at roughly
the same time?

I'm just wondering if Guyren Howe's idea of having many transactions
open waiting for a clock time to commit is in fact feasible due to
(presumably) having to have all the connections open to every database
from the client until the transactions complete.




Re: Advice request : simultaneous function/data updates on many databases

От
"David G. Johnston"
Дата:
On Wed, Mar 4, 2020 at 4:41 PM Rory Campbell-Lange <rory@campbell-lange.net> wrote:
Any idea on how to run execute_dynamic across many databases at roughly
the same time?

I'm just wondering if Guyren Howe's idea of having many transactions
open waiting for a clock time to commit is in fact feasible due to
(presumably) having to have all the connections open to every database
from the client until the transactions complete.

Clock time synchronization is possible so its largely a matter of resources at that point.  If your servers are on machines where you can get shell having the server run psql on its own databases should provide sufficient.

I'll go back to my earlier comment, on a separate line of thought, which may have been missed, in that having two commits involved here is probably a better option.  First commit is setup to allow both the old and new software to continue working normally.  The second commit then removes the functionality the older software versions are using - after they've been phased out.

David J.

Re: Advice request : simultaneous function/data updates on manydatabases

От
Rory Campbell-Lange
Дата:
On 04/03/20, David G. Johnston (david.g.johnston@gmail.com) wrote:
> On Wed, Mar 4, 2020 at 4:41 PM Rory Campbell-Lange <rory@campbell-lange.net>
> wrote:
> 
> > Any idea on how to run execute_dynamic across many databases at roughly
> > the same time?
> >
> > I'm just wondering if Guyren Howe's idea of having many transactions
> > open waiting for a clock time to commit is in fact feasible due to
> > (presumably) having to have all the connections open to every database
> > from the client until the transactions complete.
> >
> 
> Clock time synchronization is possible so its largely a matter of resources
> at that point.  If your servers are on machines where you can get shell
> having the server run psql on its own databases should provide sufficient.

Yes, that is how we do it at present. We'll have to do some tests.

> I'll go back to my earlier comment, on a separate line of thought, which
> may have been missed, in that having two commits involved here is probably
> a better option.  First commit is setup to allow both the old and new
> software to continue working normally.  The second commit then removes the
> functionality the older software versions are using - after they've been
> phased out.

I did miss that point; thanks for reiterating it.

I think the issue we will have with old/new coexistence is that we would
sometimes hit the "cannot find best candidate" function signature
problem, as we often extend existing function arguments with new
arguments with defaults.

But it is certainly something worth testing.

Thanks a lot for the pointers.




Re: Advice request : simultaneous function/data updates on manydatabases

От
Alban Hertroys
Дата:
> On 4 Mar 2020, at 23:42, Guyren Howe <guyren@gmail.com> wrote:
>
> On Mar 4, 2020, at 14:33 , Rory Campbell-Lange <rory@campbell-lange.net> wrote:
>>
>> Essentially we wish to reduce the window where the frontend and backend
>> aren't synchronised.
>>
>> If we have (for example) 200 databases which each take 2 seconds to
>> update, a client could be on the wrong frontend code for over 6 minutes.
>> Send each of the servers a PL/PGSQL method that executes all the things in a transaction and then waits until the
sameclock time to commit. Then all the servers are committing at the same moment. They will still be out of synch
somewhat,but this would reduce the degree. 


I’m wondering whether this could be done with a more generic event-based approach, where each server sends a ‘done’
eventto a central machine once it’s ready to commit, and the central machine returns an ‘acknowledged’ once the last
serversent it’s ‘done’ event. 
The challenge there is that the ‘ack’ needs to be caught and processed within the same waiting transaction… Not sure
howto do that right now - maybe through web services, MQTT or similar. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: Advice request : simultaneous function/data updates on manydatabases

От
Rory Campbell-Lange
Дата:
On 04/03/20, Rory Campbell-Lange (rory@campbell-lange.net) wrote:
> We have many databases of the same type separated for data governance
> reasons. They, however, share the same web front-end code.
> 
> Presently, replacing functions and performing data updates on the
> databases in series often executes across all databases in less than a
> minute. (The updates are currently done with simple sql files connecting
> to each database and then loading a stub file pointing to each function
> to drop and reload, and running the data update queries.)
> 
> However, for larger updates, the time when the front end code is
> out-of-step with the database can cause end-user problems.

For information, following the very helpful advice here, we intend to
proceed as follows, using a rolling upgrade methodology:

    for each database:
    * upgrade the functions and sql
    * on success, callout haproxy to switch the client from web code
      version old to new
    * else investigate the upgrade failure

We're planning to use postgres for recording state.



Re: Advice request : simultaneous function/data updates on manydatabases

От
"Peter J. Holzer"
Дата:
On 2020-03-04 14:42:01 -0800, Guyren Howe wrote:
> On Mar 4, 2020, at 14:33 , Rory Campbell-Lange <rory@campbell-lange.net> wrote:
>     Essentially we wish to reduce the window where the frontend and backend
>     aren't synchronised.
>
>     If we have (for example) 200 databases which each take 2 seconds to
>     update, a client could be on the wrong frontend code for over 6 minutes.
>
>
> Send each of the servers a PL/PGSQL method that executes all the things in a
> transaction and then waits until the same clock time to commit.

Last time I looked, some DDL commands (especially "drop table") took an
exclusive lock on the affected table. So you may want to keep
transactions which execute such commands very short to prevent them from
blocking other transactions for a noticeable amount of time.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения