Обсуждение: Re: [ADMIN] Best system for a data warehouse application

Поиск
Список
Период
Сортировка

Re: [ADMIN] Best system for a data warehouse application

От
"Lazaro Garcia"
Дата:

You could use pg_upgrade utility.

 

Regards.

 

De: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] En nombre de John Scalia
Enviado el: martes, 21 de febrero de 2017 06:58 p. m.
Para: Yuri Paes Leme
CC: pgsql-admin@postgresql.org
Asunto: Re: [ADMIN] Best system for a data warehouse application

 

Good to know, We haven't upgraded to 9.6 yet, however, but I'll suggest that for this upgrade.

 

On Tue, Feb 21, 2017 at 12:46 PM, Yuri Paes Leme <yuripl@gmail.com> wrote:

So, IMHO, if you can use the 9.6.x version, more cores, more power

 

On Tue, 21 Feb 2017 at 14:39 John Scalia <jayknowsunix@gmail.com> wrote:

Hi all,

One of my system admins has approached me about replacing our production data warehouse system. I believe I know the answer, but would just like more opinions. He's giving me two options so far:

1) CPU = Intel e5-2637 3.5GHz 4 core with hyperthreading

vs.

2) CPU = Intel e5-2650v4 2.2GHz 14 core with hyperthreading

The rest of the specs are identical with 512Gb RAM and more than 2 Tb SSD drives. Our data warehouse is just shy of 2Tb now. I suspect the the first option would be more optimal as I don't think PostgreSQL can make effective use of a lot of processor cores. So fewer cores would be better optimized with the O/S. Can anyone weigh in on whether my assumptions are correct?
--

Jay

 

[ADMIN] Best system for a data warehouse application

От
John Scalia
Дата:
Hi all,

One of my system admins has approached me about replacing our production data warehouse system. I believe I know the answer, but would just like more opinions. He's giving me two options so far:

1) CPU = Intel e5-2637 3.5GHz 4 core with hyperthreading

vs.

2) CPU = Intel e5-2650v4 2.2GHz 14 core with hyperthreading

The rest of the specs are identical with 512Gb RAM and more than 2 Tb SSD drives. Our data warehouse is just shy of 2Tb now. I suspect the the first option would be more optimal as I don't think PostgreSQL can make effective use of a lot of processor cores. So fewer cores would be better optimized with the O/S. Can anyone weigh in on whether my assumptions are correct?
--
Jay

Re: [ADMIN] Best system for a data warehouse application

От
Yuri Paes Leme
Дата:
So, IMHO, if you can use the 9.6.x version, more cores, more power

On Tue, 21 Feb 2017 at 14:39 John Scalia <jayknowsunix@gmail.com> wrote:
Hi all,

One of my system admins has approached me about replacing our production data warehouse system. I believe I know the answer, but would just like more opinions. He's giving me two options so far:

1) CPU = Intel e5-2637 3.5GHz 4 core with hyperthreading

vs.

2) CPU = Intel e5-2650v4 2.2GHz 14 core with hyperthreading

The rest of the specs are identical with 512Gb RAM and more than 2 Tb SSD drives. Our data warehouse is just shy of 2Tb now. I suspect the the first option would be more optimal as I don't think PostgreSQL can make effective use of a lot of processor cores. So fewer cores would be better optimized with the O/S. Can anyone weigh in on whether my assumptions are correct?
--
Jay

Re: [ADMIN] Best system for a data warehouse application

От
John Scalia
Дата:
Good to know, We haven't upgraded to 9.6 yet, however, but I'll suggest that for this upgrade.

On Tue, Feb 21, 2017 at 12:46 PM, Yuri Paes Leme <yuripl@gmail.com> wrote:
So, IMHO, if you can use the 9.6.x version, more cores, more power

On Tue, 21 Feb 2017 at 14:39 John Scalia <jayknowsunix@gmail.com> wrote:
Hi all,

One of my system admins has approached me about replacing our production data warehouse system. I believe I know the answer, but would just like more opinions. He's giving me two options so far:

1) CPU = Intel e5-2637 3.5GHz 4 core with hyperthreading

vs.

2) CPU = Intel e5-2650v4 2.2GHz 14 core with hyperthreading

The rest of the specs are identical with 512Gb RAM and more than 2 Tb SSD drives. Our data warehouse is just shy of 2Tb now. I suspect the the first option would be more optimal as I don't think PostgreSQL can make effective use of a lot of processor cores. So fewer cores would be better optimized with the O/S. Can anyone weigh in on whether my assumptions are correct?
--
Jay

Re: [ADMIN] Best system for a data warehouse application

От
Steven Chang
Дата:
Hello,

   One of the 9.6 new features is "Parallel Query".
   Therefore , in my view more cores do favor your dataware house system.
   My concern is the share buffer usage.  
   By my experience in 9.2 , 8.3 and 8.4 and ever consulting EDB technical support,
   do no set share_buffer over 8GB, the more the setting, the more memory you are wasting.
   Postgres mostly count on file system cache for I/O performance.
   However, multiprocess access the same data do need share memory to co-work.
   If share buffer ages out, postgres still page in data from OS cache. 
   Though memory access is much faster than DISK I/O, it still triggers some overhead.
   Greg also stats the same situation in his own book , high performance 9. 
   pg_prewarm and pgfincore do give some more control on memory management.
   But I still concern the postgres share memory management when taking care of large amount of data.
   
   Can anyone in postgres kernel development team gives comments or advices ??

Best Regards,
Steven


2017-02-21 21:02 GMT+08:00 Lazaro Garcia <lazaro3487@gmail.com>:

You could use pg_upgrade utility.

 

Regards.

 

De: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] En nombre de John Scalia
Enviado el: martes, 21 de febrero de 2017 06:58 p. m.
Para: Yuri Paes Leme
CC: pgsql-admin@postgresql.org
Asunto: Re: [ADMIN] Best system for a data warehouse application

 

Good to know, We haven't upgraded to 9.6 yet, however, but I'll suggest that for this upgrade.

 

On Tue, Feb 21, 2017 at 12:46 PM, Yuri Paes Leme <yuripl@gmail.com> wrote:

So, IMHO, if you can use the 9.6.x version, more cores, more power

 

On Tue, 21 Feb 2017 at 14:39 John Scalia <jayknowsunix@gmail.com> wrote:

Hi all,

One of my system admins has approached me about replacing our production data warehouse system. I believe I know the answer, but would just like more opinions. He's giving me two options so far:

1) CPU = Intel e5-2637 3.5GHz 4 core with hyperthreading

vs.

2) CPU = Intel e5-2650v4 2.2GHz 14 core with hyperthreading

The rest of the specs are identical with 512Gb RAM and more than 2 Tb SSD drives. Our data warehouse is just shy of 2Tb now. I suspect the the first option would be more optimal as I don't think PostgreSQL can make effective use of a lot of processor cores. So fewer cores would be better optimized with the O/S. Can anyone weigh in on whether my assumptions are correct?
--

Jay

 


Re: [ADMIN] Best system for a data warehouse application

От
Fernando Hevia
Дата:
Option 1 is best if you will handle but a few connections to postgres (~<=4), since the higher cpu clock will prevail in this scenario.
But if you expect having more than a few simultaneous connections go with option 2 without a doubt.
Check how many simultaneous connections your DW system typically uses and go from there. Although more cores would in general be the correct choice, keep in mind that common DW in-database computations will greatly profit from the extra cpu cycles of the 3.5 GHz cpu.

Cheers.


El 21 feb. 2017 2:39 PM, "John Scalia" <jayknowsunix@gmail.com> escribió:
Hi all,

One of my system admins has approached me about replacing our production data warehouse system. I believe I know the answer, but would just like more opinions. He's giving me two options so far:

1) CPU = Intel e5-2637 3.5GHz 4 core with hyperthreading

vs.

2) CPU = Intel e5-2650v4 2.2GHz 14 core with hyperthreading

The rest of the specs are identical with 512Gb RAM and more than 2 Tb SSD drives. Our data warehouse is just shy of 2Tb now. I suspect the the first option would be more optimal as I don't think PostgreSQL can make effective use of a lot of processor cores. So fewer cores would be better optimized with the O/S. Can anyone weigh in on whether my assumptions are correct?
--
Jay

Re: [ADMIN] Best system for a data warehouse application

От
Jaime Soler
Дата:
Well I prefer 1 choice, but first of choose a cpu model, I would check how many concurrent connections use your DW system. Also if you prefer a many-cores cpu, please check some previous errors related with cache line conetention in large numa system:
https://www.enterprisedb.com/blog/solving-cache-line-contention-large-numa-systems

2017-02-22 3:15 GMT+01:00 Fernando Hevia <fhevia@gmail.com>:
Option 1 is best if you will handle but a few connections to postgres (~<=4), since the higher cpu clock will prevail in this scenario.
But if you expect having more than a few simultaneous connections go with option 2 without a doubt.
Check how many simultaneous connections your DW system typically uses and go from there. Although more cores would in general be the correct choice, keep in mind that common DW in-database computations will greatly profit from the extra cpu cycles of the 3.5 GHz cpu.

Cheers.


El 21 feb. 2017 2:39 PM, "John Scalia" <jayknowsunix@gmail.com> escribió:
Hi all,

One of my system admins has approached me about replacing our production data warehouse system. I believe I know the answer, but would just like more opinions. He's giving me two options so far:

1) CPU = Intel e5-2637 3.5GHz 4 core with hyperthreading

vs.

2) CPU = Intel e5-2650v4 2.2GHz 14 core with hyperthreading

The rest of the specs are identical with 512Gb RAM and more than 2 Tb SSD drives. Our data warehouse is just shy of 2Tb now. I suspect the the first option would be more optimal as I don't think PostgreSQL can make effective use of a lot of processor cores. So fewer cores would be better optimized with the O/S. Can anyone weigh in on whether my assumptions are correct?
--
Jay

Re: Best system for a data warehouse application

От
Steven Chang
Дата:
hello Jaim,

        Regarding cache-line-contention issue , I think you can refer this page
        Once I resized my shared_buffer from 15GB to 45GB,  AP team reported performance issue after 2 days ,
        and we found swap in/out activities increased vastly
        At that time, the only thing I can do was returning to the original shared_buffer setting -- 15GB, 
        and performance recovered to normal.
        I consulted EDB for memory issue, because their slides introduced at most 8GB for shared_buffer.
        And  I got positive response from them at that time.
       Now I think it maybe an numa issue.
       I will try the same evualtion size of  data buffer size as Oracle's SGA in my future projects.
       Or anybody can share his/her experiences with large shared_buffer size with postgres db.

Regards,
Steven

2017-02-22 22:23 GMT+08:00 Jaime Soler <jaime.soler@gmail.com>:
Well I prefer 1 choice, but first of choose a cpu model, I would check how many concurrent connections use your DW system. Also if you prefer a many-cores cpu, please check some previous errors related with cache line conetention in large numa system:
https://www.enterprisedb.com/blog/solving-cache-line-contention-large-numa-systems

2017-02-22 3:15 GMT+01:00 Fernando Hevia <fhevia@gmail.com>:
Option 1 is best if you will handle but a few connections to postgres (~<=4), since the higher cpu clock will prevail in this scenario.
But if you expect having more than a few simultaneous connections go with option 2 without a doubt.
Check how many simultaneous connections your DW system typically uses and go from there. Although more cores would in general be the correct choice, keep in mind that common DW in-database computations will greatly profit from the extra cpu cycles of the 3.5 GHz cpu.

Cheers.


El 21 feb. 2017 2:39 PM, "John Scalia" <jayknowsunix@gmail.com> escribió:
Hi all,

One of my system admins has approached me about replacing our production data warehouse system. I believe I know the answer, but would just like more opinions. He's giving me two options so far:

1) CPU = Intel e5-2637 3.5GHz 4 core with hyperthreading

vs.

2) CPU = Intel e5-2650v4 2.2GHz 14 core with hyperthreading

The rest of the specs are identical with 512Gb RAM and more than 2 Tb SSD drives. Our data warehouse is just shy of 2Tb now. I suspect the the first option would be more optimal as I don't think PostgreSQL can make effective use of a lot of processor cores. So fewer cores would be better optimized with the O/S. Can anyone weigh in on whether my assumptions are correct?
--
Jay