Обсуждение: Experience with large number of tables in single PostgreSQL instance
Hello all, We are exploring possible strategies on deploying PostgreSQL with application which will store fairly much data (current implementation stores around 345 GB of data, and it will be subject of up to 10 times more data stored in the database). Now, not to go into too much details to avoid bothering you, the key fact is that expected data sets needed at any moment for manipulation at any given point in time (let's call them operational data sets) will be around 1/10.000th of whole data set. There can be up to 100 concurrent different data sets operations at the same time. Operational data sets are strictly defined in terms of what they contain, but they could contain any piece of complete data set, so we have 2 obvious strategies here to go with: 1) we can go with different instances of PostgreSQL service, let's say (for pure theory) 10 of them on the same HA cluster setup. Every instance would hold let's say 1/10th of that big recordset, and around 3.000 database tables (this apparently shouldn't be of any problem to PostgreSQL, see below). 2) we can go with single instance of PostgreSQL service which would then contain 30.000 database tables. So, consider this purely theoretical discussion - does anyone here have experience of running PostgreSQL service with large number of database tables (couple of thousands up to couple of tens of thousands)? This entry in Wiki indicates PostgreSQL in general doesn't have problem with "thousands of tables", how about tens of thousands? http://postgres.cz/wiki/PostgreSQL_SQL_Tricks#Faster_table.27s_list Best regards, V. -- Pozdrav/Greetings, Vedran Krivokuća Disclaimer: This message may contain information.
On Wed, Apr 10, 2013 at 9:07 AM, Vedran Krivokuca <vkrivokuca@gmail.com> wrote: > 1) we can go with different instances of PostgreSQL service, let's say > (for pure theory) 10 of them on the same HA cluster setup. Every > instance would hold let's say 1/10th of that big recordset, and around > 3.000 database tables (this apparently shouldn't be of any problem to > PostgreSQL, see below). I am talking shit here, obviously, excuse me. We would go with, for example, 10 or 100 databases in the same PostgreSQL instance. Buzzwording got better of me. :) The rest of my initial e-mail still applies, I will cherish any input of experience on this subject. V. -- Pozdrav/Greetings, Vedran Krivokuća Disclaimer: This message may contain information.
Re: Experience with large number of tables in single PostgreSQL instance
От
Vasilis Ventirozos
Дата:
On Wed, Apr 10, 2013 at 10:10 AM, Vedran Krivokuca <vkrivokuca@gmail.com> wrote:
On Wed, Apr 10, 2013 at 9:07 AM, Vedran Krivokuca <vkrivokuca@gmail.com> wrote:I am talking shit here, obviously, excuse me. We would go with, for
> 1) we can go with different instances of PostgreSQL service, let's say
> (for pure theory) 10 of them on the same HA cluster setup. Every
> instance would hold let's say 1/10th of that big recordset, and around
> 3.000 database tables (this apparently shouldn't be of any problem to
> PostgreSQL, see below).
example, 10 or 100 databases in the same PostgreSQL instance.
Buzzwording got better of me. :) The rest of my initial e-mail still
applies, I will cherish any input of experience on this subject.
V.
--
Pozdrav/Greetings,
Vedran Krivokuća
Disclaimer: This message may contain information.
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
I think that a limitation would be on the OS (filesystem and kernel) because each table
is a file (or more) under a directory (per database). I haven't got experience on how
modern Linux system cope with that but i think it would be something easy to test.
Database wise you shouldn't have a problem.
I'd agree, certainly in my experiences.
You need to ensure OS parameters such as the max open files (fs.file-max if we're talking Linux) is set appropriately. Baring in mind each user will have an open file on each underlying datafile for the databases they're connected to.
Dale
.hss1 {font-family:Verdana; font-weight:bold; font-size:10pt; color:154584 ; margin-bottom: 0px}
.hss2 {font-family:Verdana; font-size:8pt; color:EE3150 ; margin-bottom:20px}
.hss3 {font-family:Verdana; font-size:7pt; color:154584 ; margin-bottom: 0px}
.hss4 {font-family:Verdana; font-size:7pt; color:gray ; margin-top: 20px}
On 10/04/2013 08:24, Vasilis Ventirozos wrote:
You need to ensure OS parameters such as the max open files (fs.file-max if we're talking Linux) is set appropriately. Baring in mind each user will have an open file on each underlying datafile for the databases they're connected to.
Dale
On Wed, Apr 10, 2013 at 10:10 AM, Vedran Krivokuca <vkrivokuca@gmail.com> wrote:On Wed, Apr 10, 2013 at 9:07 AM, Vedran Krivokuca <vkrivokuca@gmail.com> wrote:I am talking shit here, obviously, excuse me. We would go with, for
> 1) we can go with different instances of PostgreSQL service, let's say
> (for pure theory) 10 of them on the same HA cluster setup. Every
> instance would hold let's say 1/10th of that big recordset, and around
> 3.000 database tables (this apparently shouldn't be of any problem to
> PostgreSQL, see below).
example, 10 or 100 databases in the same PostgreSQL instance.
Buzzwording got better of me. :) The rest of my initial e-mail still
applies, I will cherish any input of experience on this subject.
V.
--
Pozdrav/Greetings,
Vedran Krivokuća
Disclaimer: This message may contain information.
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-adminI think that a limitation would be on the OS (filesystem and kernel) because each table is a file (or more) under a directory (per database). I haven't got experience on how modern Linux system cope with that but i think it would be something easy to test. Database wise you shouldn't have a problem.
Вложения
On Wed, Apr 10, 2013 at 10:43 AM, Dale Betts <dale.betts@hssnet.com> wrote: > > I'd agree, certainly in my experiences. > > You need to ensure OS parameters such as the max open files (fs.file-max if we're talking > Linux) is set appropriately. Baring in mind each user will have an open file on each > underlying datafile for the databases they're connected to. Ok, to reply to myself and other interested parties, further insights are welcome! I've been digging today for information on this subject and: http://postgresql.1045698.n5.nabble.com/GENERAL-Maximum-number-of-tables-per-database-and-slowness-td1853836.html Basically saying: any number of tables is acceptable (well in range of couple of tens of thousands we might end up with), but have in mind pg_catalog might seriously grow with such number of tables. Huge pg_catalog tends to slow down query planner. This thread is from 2005., but I doubt those fact changed substionally with time. I'll probably do some testing, but I can hardly replicate production volume of data. Seems that due to predictable nature of the queries we are running against this system we'll go with multiple databases, each containing up to couple thousand tables at most. Still weighting pros and cons of each approach. -- Pozdrav/Greetings, Vedran Krivokuća Disclaimer: This message may contain information.
Re: Experience with large number of tables in single PostgreSQL instance
От
Prashanth Ranjalkar
Дата:
Every table or index is created in the form of OS files therefore max open files need to be set appropriately in order to achieve the larger table count. There would be no limitation for creation of tables in PostgreSQL and performance would be the major criteria as catalogs get overburdened.
Thanks & Regards,
Prashanth Ranjalkar
Database Consultant & Architect
Skype:prashanth.ranjalkar
On Wed, Apr 10, 2013 at 2:13 PM, Dale Betts <dale.betts@hssnet.com> wrote:
I'd agree, certainly in my experiences.
You need to ensure OS parameters such as the max open files (fs.file-max if we're talking Linux) is set appropriately. Baring in mind each user will have an open file on each underlying datafile for the databases they're connected to.
DaleOn 10/04/2013 08:24, Vasilis Ventirozos wrote:On Wed, Apr 10, 2013 at 10:10 AM, Vedran Krivokuca <vkrivokuca@gmail.com> wrote:On Wed, Apr 10, 2013 at 9:07 AM, Vedran Krivokuca <vkrivokuca@gmail.com> wrote:I am talking shit here, obviously, excuse me. We would go with, for
> 1) we can go with different instances of PostgreSQL service, let's say
> (for pure theory) 10 of them on the same HA cluster setup. Every
> instance would hold let's say 1/10th of that big recordset, and around
> 3.000 database tables (this apparently shouldn't be of any problem to
> PostgreSQL, see below).
example, 10 or 100 databases in the same PostgreSQL instance.
Buzzwording got better of me. :) The rest of my initial e-mail still
applies, I will cherish any input of experience on this subject.
V.
--
Pozdrav/Greetings,
Vedran Krivokuća
Disclaimer: This message may contain information.
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-adminI think that a limitation would be on the OS (filesystem and kernel) because each table is a file (or more) under a directory (per database). I haven't got experience on how modern Linux system cope with that but i think it would be something easy to test. Database wise you shouldn't have a problem.
Вложения
> > 2) we can go with single instance of PostgreSQL service which would > then contain 30.000 database tables. > > So, consider this purely theoretical discussion - does anyone here have > experience of running PostgreSQL service with large number of database > tables (couple of thousands up to couple of tens of thousands)? Hello, We are handling data for several customers, each of them having its own schema. This allows us to easily bulk move data on other servers when place gets rare. These are comparable to data warehouse data and are used by a reporting application. A majority of the tables are seldom or never used which may moderate possible issues with very large pg_catalog. We are overall impressed how well Postgres handle this :-) On the other hand we've spent a *LOT* of work in table and query design and we do have some problems with the parser performance: We do log all durations greater than 1 seconds. 1 % of these are "parse" statements about 90 % of these long parse statements are below 10 seconds. Here the figure of our server hosting the most tables. I expect them to be somewhat above the maximum that we should allow. DB size :3500 GB pg_catalog size: 2GB select count(*) from pg_tables; 120'884 select count(*) from pg_indexes; 219'082 select count(*) from pg_attribute; 2'779'199 Server: 48 GB RAM & 12 cores regards, Marc Mamin