Обсуждение: PGSQL 9.3 - Materialized View - multithreading
Hello,
My question is about multiprocess and materialized View.
http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html
http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html
I (will) have something like 3600 materialised views, and I would like to know the way to refresh them in a multithread way
(anderstand 8 cpu cores -> 8 refresh process in the same time)
Thanks a lot,
On 4 April 2014 17:29, Nicolas Paris <niparisco@gmail.com> wrote: > Hello, > > My question is about multiprocess and materialized View. > http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html > I (will) have something like 3600 materialised views, and I would like to > know the way to refresh them in a multithread way > (anderstand 8 cpu cores -> 8 refresh process in the same time) The only thing that immediately comes to mind would be running a rather hacky DO function in 4 separate sessions: DO $$ DECLARE session CONSTANT BIGINT := 0; rec RECORD; BEGIN FOR rec IN SELECT quote_ident(nspname) || '.' || quote_ident(relname) AS mv FROM pg_class c INNER JOIN pg_namespace n ON c.relnamespace = n.oid WHERE relkind = 'm' AND c.oid::bigint % 8 = session LOOP RAISE NOTICE 'Refreshing materialized view: %', rec.mv; EXECUTE 'REFRESH MATERIALIZED VIEW ' || rec.mv || ';'; END LOOP; END$$ language plpgsql; Where you would set session to 0 for the first session, 1 for the next, 2 for the next and 3 for the next, and so on until you reach 7 for the last. These would each be run in a separate parallel session, although someone may come up with a better solution. -- Thom
Thanks, "The only thing that immediately comes to mind would be running a rather hacky DO function in 4 separate sessions:" You mean 8 sessions I guess. 8 separate sessions ? Have you any idea how to manage sessions ? Is it possible to create separate session internaly ? Do I have to make 8 external connection to database, to get 8 process. It would be great if I could manage session internaly, in a pl/sql by example. Le 04/04/2014 18:54, Thom Brown a écrit : > On 4 April 2014 17:29, Nicolas Paris <niparisco@gmail.com> wrote: >> Hello, >> >> My question is about multiprocess and materialized View. >> http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html >> I (will) have something like 3600 materialised views, and I would like to >> know the way to refresh them in a multithread way >> (anderstand 8 cpu cores -> 8 refresh process in the same time) > > The only thing that immediately comes to mind would be running a > rather hacky DO function in 4 separate sessions: > > DO $$ > DECLARE > session CONSTANT BIGINT := 0; > rec RECORD; > BEGIN > FOR rec IN SELECT quote_ident(nspname) || '.' || > quote_ident(relname) AS mv FROM pg_class c INNER JOIN pg_namespace n > ON c.relnamespace = n.oid WHERE relkind = 'm' AND c.oid::bigint % 8 = > session LOOP > RAISE NOTICE 'Refreshing materialized view: %', rec.mv; > EXECUTE 'REFRESH MATERIALIZED VIEW ' || rec.mv || ';'; > END LOOP; > END$$ language plpgsql; > > Where you would set session to 0 for the first session, 1 for the > next, 2 for the next and 3 for the next, and so on until you reach 7 > for the last. These would each be run in a separate parallel session, > although someone may come up with a better solution. >
On 4 April 2014 20:49, PARIS Nicolas <niparisco@gmail.com> wrote: > Thanks, > > "The only thing that immediately comes to mind would be running a > rather hacky DO function in 4 separate sessions:" > You mean 8 sessions I guess. Yes, typo. > 8 separate sessions ? > Have you any idea how to manage sessions ? Is it possible to create > separate session internaly ? > Do I have to make 8 external connection to database, to get 8 process. > It would be great if I could manage session internaly, in a pl/sql by > example. Well you can't have multiple sessions per connection, so yes, you'd need to issue each of them in separate connections. I can't think of a more convenient way of doing it, but the solution I've proposed isn't particularly elegant anyway. -- Thom
Ok thanks, And what about triggers. 8 triggers based on the same event won't be multithreaded ? Le 04/04/2014 21:57, Thom Brown a écrit : > On 4 April 2014 20:49, PARIS Nicolas <niparisco@gmail.com> wrote: >> Thanks, >> >> "The only thing that immediately comes to mind would be running a >> rather hacky DO function in 4 separate sessions:" >> You mean 8 sessions I guess. > > Yes, typo. > >> 8 separate sessions ? >> Have you any idea how to manage sessions ? Is it possible to create >> separate session internaly ? >> Do I have to make 8 external connection to database, to get 8 process. >> It would be great if I could manage session internaly, in a pl/sql by >> example. > > Well you can't have multiple sessions per connection, so yes, you'd > need to issue each of them in separate connections. > > I can't think of a more convenient way of doing it, but the solution > I've proposed isn't particularly elegant anyway. >
On 4 April 2014 21:07, PARIS Nicolas <niparisco@gmail.com> wrote: > Ok thanks, > > And what about triggers. 8 triggers based on the same event won't be > multithreaded ? I'm not clear on how triggers come into this. You can't have triggers on materialized views, and they don't fire triggers on tables or views that they are based on. -- Thom
this postgres documentation : http://www.postgresql.org/docs/9.3/static/ecpg-connect.html says it is actually possible to manage connection in C stored procedure. I may be wrong... Le 04/04/2014 22:14, Thom Brown a écrit : > lear on how triggers come into this. You can't have triggers > on materialized views, and they don't fire triggers on tables or views > that they are based o
On Fri, Apr 04, 2014 at 10:26:22PM +0200, PARIS Nicolas wrote: > this postgres documentation : > http://www.postgresql.org/docs/9.3/static/ecpg-connect.html > says it is actually possible to manage connection in C stored procedure. > > I may be wrong... > > > Le 04/04/2014 22:14, Thom Brown a écrit : > > lear on how triggers come into this. You can't have triggers > > on materialized views, and they don't fire triggers on tables or views > > that they are based o > Hi, I do not know if it can be used in this fashion, but could pl/proxy be used by defining a cluster to be the same server and use a partitioned remote call? Someone with pl/proxy experience may have more information. Regards, Ken
On 4 April 2014 21:26, PARIS Nicolas <niparisco@gmail.com> wrote: > this postgres documentation : > http://www.postgresql.org/docs/9.3/static/ecpg-connect.html > says it is actually possible to manage connection in C stored procedure. > > I may be wrong... That page doesn't refer to triggers at all, so I'm still not sure what you mean. -- Thom
Right, not refering triggers, seems to be kind of mix C/sql compiled (= external). To conclude : - pl/proxy, it appears difficult, and not designed to. - pgAgent (supposed to apply jobs in a multithreaded way) - bash (xargs does the job) - external scripts (R, python, perl...) So I will test pgAgent and feedback it Thanks Le 06/04/2014 21:07, Thom Brown a écrit : > On 4 April 2014 21:26, PARIS Nicolas <niparisco@gmail.com> wrote: >> this postgres documentation : >> http://www.postgresql.org/docs/9.3/static/ecpg-connect.html >> says it is actually possible to manage connection in C stored procedure. >> >> I may be wrong... > > That page doesn't refer to triggers at all, so I'm still not sure what you mean. >
On 04 Apr 2014, at 18:29, Nicolas Paris <niparisco@gmail.com> wrote: > Hello, > > My question is about multiprocess and materialized View. > http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html > I (will) have something like 3600 materialised views, and I would like to know the way to refresh them in a multithreadway > (anderstand 8 cpu cores -> 8 refresh process in the same time) Hi Nick, out of DB solution: 1. Produce a text file which contains the 3600 refresh commands you want to run in parallel. You can do that with selectand format() if you don't have a list already. 2. I'm going to simulate your 3600 'refresh' commands here with some select and sleep statements that finish at unknown times. (In BASH): for i in {1..3600} ; do echo "echo \"select pg_sleep(1+random()::int*10); select $i\" | psql mydb" ; done > 3600commands 3. Install Gnu Parallel and type: parallel < 3600commands 4. Parallel will automatically work out the appropriate number of cores/threads for your CPUs, or you can control it manuallywith -j. It will also give you a live progress report if you use --progress. e.g. this command balances 8 jobs at a time, prints a dynamic progress report and dumps stdout to /dev/null parallel -j 8 --progress < 3600commands > /dev/null 5. If you want to make debugging easier use the parameter --tag to tag output for each command. Of course it would be much more elegant if someone implemented something like Gnu Parallel inside postgres or psql ... :-) Hope this helps & have a nice day, Graeme.
Hello,
Thanks for this clear explanation !
Then I have a sub-question :
Supposed I have 3600 materialised views say 600 mat views from 6 main table. (A,B,C,D,E,F are repetead 600 times with some differences)
Is it faster to :
1) parallel refresh 600 time A, then 600 time B etc,
1) parallel refresh 600 time A, then 600 time B etc,
OR
2) parallel refresh 600 time A,B,C,D,E,F
I guess 1) is faster because they are 600 access to same table loaded in memory ? But do parallel access to the same table implies concurency
and bad performance ?
Thanks
Nicolas PARIS
2014-04-07 12:29 GMT+02:00 Graeme B. Bell <grb@skogoglandskap.no>:
Hi Nick,On 04 Apr 2014, at 18:29, Nicolas Paris <niparisco@gmail.com> wrote:
> Hello,
>
> My question is about multiprocess and materialized View.
> http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html
> I (will) have something like 3600 materialised views, and I would like to know the way to refresh them in a multithread way
> (anderstand 8 cpu cores -> 8 refresh process in the same time)
out of DB solution:
1. Produce a text file which contains the 3600 refresh commands you want to run in parallel. You can do that with select and format() if you don't have a list already.
2. I'm going to simulate your 3600 'refresh' commands here with some select and sleep statements that finish at unknown times.
(In BASH):
for i in {1..3600} ; do echo "echo \"select pg_sleep(1+random()::int*10); select $i\" | psql mydb" ; done > 3600commands
3. Install Gnu Parallel and type:
parallel < 3600commands
4. Parallel will automatically work out the appropriate number of cores/threads for your CPUs, or you can control it manually with -j.
It will also give you a live progress report if you use --progress.
e.g. this command balances 8 jobs at a time, prints a dynamic progress report and dumps stdout to /dev/null
parallel -j 8 --progress < 3600commands > /dev/null
5. If you want to make debugging easier use the parameter --tag to tag output for each command.
Of course it would be much more elegant if someone implemented something like Gnu Parallel inside postgres or psql ... :-)
Hope this helps & have a nice day,
Graeme.
Hi again Nick. Glad it helped. Generally, I would expect that doing all the A's first, then all the B's, and so on, would be fastest since you can re-usethe data from cache. Concurrency when reading isn't generally a problem. Lots of things can read at the same time and it will be nice and fast. It's concurrent writes or concurrent read/write of the same data item that causes problems with locking. That shouldn't behappening here, judging by your description. If possible, try to make sure nothing is modifying those source tables A/B/C/D/E/F when you are doing your view refresh. Graeme. On 07 Apr 2014, at 14:49, Nicolas Paris <niparisco@gmail.com> wrote: > Hello, > Thanks for this clear explanation ! > > Then I have a sub-question : > Supposed I have 3600 materialised views say 600 mat views from 6 main table. (A,B,C,D,E,F are repetead 600 times with somedifferences) > Is it faster to : > 1) parallel refresh 600 time A, then 600 time B etc, > OR > 2) parallel refresh 600 time A,B,C,D,E,F > > I guess 1) is faster because they are 600 access to same table loaded in memory ? But do parallel access to the same tableimplies concurency > and bad performance ? > > Thanks > > Nicolas PARIS > > > 2014-04-07 12:29 GMT+02:00 Graeme B. Bell <grb@skogoglandskap.no>: > On 04 Apr 2014, at 18:29, Nicolas Paris <niparisco@gmail.com> wrote: > > > Hello, > > > > My question is about multiprocess and materialized View. > > http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html > > I (will) have something like 3600 materialised views, and I would like to know the way to refresh them in a multithreadway > > (anderstand 8 cpu cores -> 8 refresh process in the same time) > > Hi Nick, > > out of DB solution: > > 1. Produce a text file which contains the 3600 refresh commands you want to run in parallel. You can do that with selectand format() if you don't have a list already. > > 2. I'm going to simulate your 3600 'refresh' commands here with some select and sleep statements that finish at unknowntimes. > > (In BASH): > for i in {1..3600} ; do echo "echo \"select pg_sleep(1+random()::int*10); select $i\" | psql mydb" ; done > 3600commands > > 3. Install Gnu Parallel and type: > > parallel < 3600commands > > 4. Parallel will automatically work out the appropriate number of cores/threads for your CPUs, or you can control it manuallywith -j. > It will also give you a live progress report if you use --progress. > e.g. this command balances 8 jobs at a time, prints a dynamic progress report and dumps stdout to /dev/null > > parallel -j 8 --progress < 3600commands > /dev/null > > 5. If you want to make debugging easier use the parameter --tag to tag output for each command. > > Of course it would be much more elegant if someone implemented something like Gnu Parallel inside postgres or psql ...:-) > > Hope this helps & have a nice day, > > Graeme. > > > > > >
Excellent.
Maybe the last sub-question :
Those 3600 mat views do have indexes.
I guess I will get better performances in dropping indexes first, then refresh, then re-creating indexes.
Are there other way to improve performances (like mat views storage parameters), because this routines will be at night, and need to be finished quickly.
Thanks
Nicolas PARIS
2014-04-07 14:59 GMT+02:00 Graeme B. Bell <grb@skogoglandskap.no>:
Hi again Nick.
Glad it helped.
Generally, I would expect that doing all the A's first, then all the B's, and so on, would be fastest since you can re-use the data from cache.
Concurrency when reading isn't generally a problem. Lots of things can read at the same time and it will be nice and fast.
It's concurrent writes or concurrent read/write of the same data item that causes problems with locking. That shouldn't be happening here, judging by your description.
If possible, try to make sure nothing is modifying those source tables A/B/C/D/E/F when you are doing your view refresh.
Graeme.
On 07 Apr 2014, at 14:49, Nicolas Paris <niparisco@gmail.com> wrote:
> Hello,
> Thanks for this clear explanation !
>
> Then I have a sub-question :
> Supposed I have 3600 materialised views say 600 mat views from 6 main table. (A,B,C,D,E,F are repetead 600 times with some differences)
> Is it faster to :
> 1) parallel refresh 600 time A, then 600 time B etc,
> OR
> 2) parallel refresh 600 time A,B,C,D,E,F
>
> I guess 1) is faster because they are 600 access to same table loaded in memory ? But do parallel access to the same table implies concurency
> and bad performance ?
>
> Thanks
>
> Nicolas PARIS
>
>
> 2014-04-07 12:29 GMT+02:00 Graeme B. Bell <grb@skogoglandskap.no>:
> On 04 Apr 2014, at 18:29, Nicolas Paris <niparisco@gmail.com> wrote:
>
> > Hello,
> >
> > My question is about multiprocess and materialized View.
> > http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html
> > I (will) have something like 3600 materialised views, and I would like to know the way to refresh them in a multithread way
> > (anderstand 8 cpu cores -> 8 refresh process in the same time)
>
> Hi Nick,
>
> out of DB solution:
>
> 1. Produce a text file which contains the 3600 refresh commands you want to run in parallel. You can do that with select and format() if you don't have a list already.
>
> 2. I'm going to simulate your 3600 'refresh' commands here with some select and sleep statements that finish at unknown times.
>
> (In BASH):
> for i in {1..3600} ; do echo "echo \"select pg_sleep(1+random()::int*10); select $i\" | psql mydb" ; done > 3600commands
>
> 3. Install Gnu Parallel and type:
>
> parallel < 3600commands
>
> 4. Parallel will automatically work out the appropriate number of cores/threads for your CPUs, or you can control it manually with -j.
> It will also give you a live progress report if you use --progress.
> e.g. this command balances 8 jobs at a time, prints a dynamic progress report and dumps stdout to /dev/null
>
> parallel -j 8 --progress < 3600commands > /dev/null
>
> 5. If you want to make debugging easier use the parameter --tag to tag output for each command.
>
> Of course it would be much more elegant if someone implemented something like Gnu Parallel inside postgres or psql ... :-)
>
> Hope this helps & have a nice day,
>
> Graeme.
>
>
>
>
>
>
- http://wiki.postgresql.org/wiki/Performance_Optimization - run it on the most powerful machine you can find - get some more memory - get a big (512-1TB) SSD drive - avoid recalculating the same things over and over. if your views have many similar elements, then calculate those firstinto a partial result, then build the final views from the partial result. - make sure your source tables are fully indexed and have good statistics - run all the views once with \timing and keep track of how long they took. Fix the slow ones. G On 07 Apr 2014, at 15:56, Nicolas Paris <niparisco@gmail.com> wrote: > Excellent. > > Maybe the last sub-question : > > Those 3600 mat views do have indexes. > I guess I will get better performances in dropping indexes first, then refresh, then re-creating indexes. > > Are there other way to improve performances (like mat views storage parameters), because this routines will be at night,and need to be finished quickly. > > Thanks > > Nicolas PARIS > > > 2014-04-07 14:59 GMT+02:00 Graeme B. Bell <grb@skogoglandskap.no>: > > Hi again Nick. > > Glad it helped. > > Generally, I would expect that doing all the A's first, then all the B's, and so on, would be fastest since you can re-usethe data from cache. > > Concurrency when reading isn't generally a problem. Lots of things can read at the same time and it will be nice and fast. > It's concurrent writes or concurrent read/write of the same data item that causes problems with locking. That shouldn'tbe happening here, judging by your description. > > If possible, try to make sure nothing is modifying those source tables A/B/C/D/E/F when you are doing your view refresh. > > Graeme. > > On 07 Apr 2014, at 14:49, Nicolas Paris <niparisco@gmail.com> wrote: > > > Hello, > > Thanks for this clear explanation ! > > > > Then I have a sub-question : > > Supposed I have 3600 materialised views say 600 mat views from 6 main table. (A,B,C,D,E,F are repetead 600 times withsome differences) > > Is it faster to : > > 1) parallel refresh 600 time A, then 600 time B etc, > > OR > > 2) parallel refresh 600 time A,B,C,D,E,F > > > > I guess 1) is faster because they are 600 access to same table loaded in memory ? But do parallel access to the sametable implies concurency > > and bad performance ? > > > > Thanks > > > > Nicolas PARIS > > > > > > 2014-04-07 12:29 GMT+02:00 Graeme B. Bell <grb@skogoglandskap.no>: > > On 04 Apr 2014, at 18:29, Nicolas Paris <niparisco@gmail.com> wrote: > > > > > Hello, > > > > > > My question is about multiprocess and materialized View. > > > http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html > > > I (will) have something like 3600 materialised views, and I would like to know the way to refresh them in a multithreadway > > > (anderstand 8 cpu cores -> 8 refresh process in the same time) > > > > Hi Nick, > > > > out of DB solution: > > > > 1. Produce a text file which contains the 3600 refresh commands you want to run in parallel. You can do that with selectand format() if you don't have a list already. > > > > 2. I'm going to simulate your 3600 'refresh' commands here with some select and sleep statements that finish at unknowntimes. > > > > (In BASH): > > for i in {1..3600} ; do echo "echo \"select pg_sleep(1+random()::int*10); select $i\" | psql mydb" ; done > 3600commands > > > > 3. Install Gnu Parallel and type: > > > > parallel < 3600commands > > > > 4. Parallel will automatically work out the appropriate number of cores/threads for your CPUs, or you can control itmanually with -j. > > It will also give you a live progress report if you use --progress. > > e.g. this command balances 8 jobs at a time, prints a dynamic progress report and dumps stdout to /dev/null > > > > parallel -j 8 --progress < 3600commands > /dev/null > > > > 5. If you want to make debugging easier use the parameter --tag to tag output for each command. > > > > Of course it would be much more elegant if someone implemented something like Gnu Parallel inside postgres or psql ...:-) > > > > Hope this helps & have a nice day, > > > > Graeme. > > > > > > > > > > > > > >