Обсуждение: max_files_per_process limit
Hi,
We have a database of about 62000 tables (about 2000 tablespaces) with an index on each table. Postgresql version is 8.1.
Although after the initial inserts to about 32000 tables the subsequent inserts are considerable fast, subsequent inserts to more than 32000 tables are very slow.
This seems to be due to the datatype (integer) of max_files_per_process option in the postgres.conf file which is used to set the maximum number of open file descriptors.
Any suggestions are wellcome.
Kind regards,
Dilek Küçük
Στις Monday 10 November 2008 16:18:37 ο/η Dilek Küçük έγραψε: > Hi, > > We have a database of about 62000 tables (about 2000 tablespaces) with an > index on each table. Postgresql version is 8.1. > So you have about 62000 distinct schemata in your db? Imagine that the average enterprise has about 200 tables max, and an average sized country has about 300 such companies, including public sector, with 62000 tables you could blindly model .... the whole activity of a whole country. Is this some kind of replicated data? Whats the story? Just curious. > Although after the initial inserts to about 32000 tables the subsequent > inserts are considerable fast, subsequent inserts to more than 32000 tables > are very slow. > > This seems to be due to the datatype (integer) of max_files_per_process > option in the postgres.conf file which is used to set the maximum number of > open file descriptors. > Is there anything we could do about this max_files_per_process limit or any > other way to speed up inserts to all these tables? > > Any suggestions are wellcome. > > Kind regards, > Dilek Küçük > -- Achilleas Mantzios
"=?ISO-8859-1?Q?Dilek_K=FC=E7=FCk?=" <dilekkucuk@gmail.com> writes: > We have a database of about 62000 tables (about 2000 tablespaces) with an > index on each table. Postgresql version is 8.1. You should probably rethink that schema. A lot of similar tables can be folded into one table with an extra key column. Also, where did you get the idea that 2000 tablespaces would be a good thing? There's really no point in more than one per spindle or filesystem. > Although after the initial inserts to about 32000 tables the subsequent > inserts are considerable fast, subsequent inserts to more than 32000 tables > are very slow. This has probably got more to do with inefficiencies of your filesystem than anything else --- did you pick one that scales well to lots of files per directory? > This seems to be due to the datatype (integer) of max_files_per_process > option in the postgres.conf file which is used to set the maximum number of > open file descriptors. It's not so much the datatype of max_files_per_process as the datatype of kernel file descriptors that's the limitation ... regards, tom lane
Στις Monday 10 November 2008 16:18:37 ο/η Dilek Küçük έγραψε:> Hi,So you have about 62000 distinct schemata in your db?
>
> We have a database of about 62000 tables (about 2000 tablespaces) with an
> index on each table. Postgresql version is 8.1.
>
Imagine that the average enterprise has about 200 tables max,
and an average sized country has about 300 such companies,
including public sector, with 62000 tables you could blindly model
.... the whole activity of a whole country.
Is this some kind of replicated data?
Whats the story?
Just curious.
Actually we had 31 distinct tables but this amounted to tens of billions of records (streaming data from 2000 sites) per table a year, so we horizontally partition each table into 2000 tables. This allowed us to discard one of the indexes that we have created and freed us from periodical cluster operations which turned out to be infeasible for a system with tight querying constraints in terms of time.
--
> Although after the initial inserts to about 32000 tables the subsequent
> inserts are considerable fast, subsequent inserts to more than 32000 tables
> are very slow.
>
> This seems to be due to the datatype (integer) of max_files_per_process
> option in the postgres.conf file which is used to set the maximum number of
> open file descriptors.
> Is there anything we could do about this max_files_per_process limit or any
> other way to speed up inserts to all these tables?
>
> Any suggestions are wellcome.
>
> Kind regards,
> Dilek Küçük
>
Achilleas Mantzios
"Dilek Küçük" <dilekkucuk@gmail.com> writes:You should probably rethink that schema. A lot of similar tables can be
> We have a database of about 62000 tables (about 2000 tablespaces) with an
> index on each table. Postgresql version is 8.1.
folded into one table with an extra key column. Also, where did you get
the idea that 2000 tablespaces would be a good thing? There's really no
point in more than one per spindle or filesystem.This has probably got more to do with inefficiencies of your filesystem
> Although after the initial inserts to about 32000 tables the subsequent
> inserts are considerable fast, subsequent inserts to more than 32000 tables
> are very slow.
than anything else --- did you pick one that scales well to lots of
files per directory?
It's not so much the datatype of max_files_per_process as the datatype
> This seems to be due to the datatype (integer) of max_files_per_process
> option in the postgres.conf file which is used to set the maximum number of
> open file descriptors.
of kernel file descriptors that's the limitation ...
regards, tom lane
On Tue, Nov 11, 2008 at 5:10 AM, Dilek Küçük <dilekkucuk@gmail.com> wrote: > > On Mon, Nov 10, 2008 at 4:51 PM, Achilleas Mantzios > <achill@matrix.gatewaynet.com> wrote: >> >> Στις Monday 10 November 2008 16:18:37 ο/η Dilek Küçük έγραψε: >> > Hi, >> > >> > We have a database of about 62000 tables (about 2000 tablespaces) with >> > an >> > index on each table. Postgresql version is 8.1. >> > >> >> So you have about 62000 distinct schemata in your db? >> Imagine that the average enterprise has about 200 tables max, >> and an average sized country has about 300 such companies, >> including public sector, with 62000 tables you could blindly model >> .... the whole activity of a whole country. >> >> Is this some kind of replicated data? >> Whats the story? > > Actually we had 31 distinct tables but this amounted to tens of billions of > records (streaming data from 2000 sites) per table a year, so we > horizontally partition each table into 2000 tables. This allowed us to > discard one of the indexes that we have created and freed us from periodical > cluster operations which turned out to be infeasible for a system with tight > querying constraints in terms of time. Any chance of combining less used tables back together to reduce the number of them? I'd also look at using more schemas and fewer tablespaces. Just a thought.