Обсуждение: max_files_per_process limit

Поиск
Список
Период
Сортировка

max_files_per_process limit

От
"Dilek Küçük"
Дата:

Hi,

We have a database of about 62000 tables (about 2000 tablespaces) with an index on each table. Postgresql version is 8.1.

Although after the initial inserts to about 32000 tables the subsequent inserts are considerable fast, subsequent inserts to more than 32000 tables are very slow.

This seems to be due to the datatype (integer) of max_files_per_process option in the postgres.conf file which is used to set the maximum number of open file descriptors.

Is there anything we could do about this max_files_per_process limit or any other way to speed up inserts to all these tables?

Any suggestions are wellcome.

Kind regards,
Dilek Küçük

Re: max_files_per_process limit

От
Achilleas Mantzios
Дата:
Στις Monday 10 November 2008 16:18:37 ο/η Dilek Küçük έγραψε:
> Hi,
>
> We have a database of about 62000 tables (about 2000 tablespaces) with an
> index on each table. Postgresql version is 8.1.
>

So you have about 62000 distinct schemata in your db?
Imagine that the average enterprise has about 200 tables max,
and an average sized country has about 300 such companies,
including public sector, with 62000 tables you could blindly model
.... the whole activity of a whole country.

Is this some kind of replicated data?
Whats the story?
Just curious.

> Although after the initial inserts to about 32000 tables the subsequent
> inserts are considerable fast, subsequent inserts to more than 32000 tables
> are very slow.
>
> This seems to be due to the datatype (integer) of max_files_per_process
> option in the postgres.conf file which is used to set the maximum number of
> open file descriptors.
> Is there anything we could do about this max_files_per_process limit or any
> other way to speed up inserts to all these tables?
>
> Any suggestions are wellcome.
>
> Kind regards,
> Dilek Küçük
>



--
Achilleas Mantzios

Re: max_files_per_process limit

От
Tom Lane
Дата:
"=?ISO-8859-1?Q?Dilek_K=FC=E7=FCk?=" <dilekkucuk@gmail.com> writes:
> We have a database of about 62000 tables (about 2000 tablespaces) with an
> index on each table. Postgresql version is 8.1.

You should probably rethink that schema.  A lot of similar tables can be
folded into one table with an extra key column.  Also, where did you get
the idea that 2000 tablespaces would be a good thing?  There's really no
point in more than one per spindle or filesystem.

> Although after the initial inserts to about 32000 tables the subsequent
> inserts are considerable fast, subsequent inserts to more than 32000 tables
> are very slow.

This has probably got more to do with inefficiencies of your filesystem
than anything else --- did you pick one that scales well to lots of
files per directory?

> This seems to be due to the datatype (integer) of max_files_per_process
> option in the postgres.conf file which is used to set the maximum number of
> open file descriptors.

It's not so much the datatype of max_files_per_process as the datatype
of kernel file descriptors that's the limitation ...

            regards, tom lane

Re: max_files_per_process limit

От
"Dilek Küçük"
Дата:

On Mon, Nov 10, 2008 at 4:51 PM, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
Στις Monday 10 November 2008 16:18:37 ο/η Dilek Küçük έγραψε:
> Hi,
>
> We have a database of about 62000 tables (about 2000 tablespaces) with an
> index on each table. Postgresql version is 8.1.
>

So you have about 62000 distinct schemata in your db?
Imagine that the average enterprise has about 200 tables max,
and an average sized country has about 300 such companies,
including public sector, with 62000 tables you could blindly model
.... the whole activity of a whole country.

Is this some kind of replicated data?
Whats the story?
Just curious.
 

Actually we had 31 distinct tables but this amounted to tens of billions of records (streaming data from 2000 sites) per table a year, so we horizontally partition each table into 2000 tables. This allowed us to discard one of the indexes that we have created and freed us from periodical cluster operations which turned out to be infeasible for a system with tight querying constraints in terms of time.

 



> Although after the initial inserts to about 32000 tables the subsequent
> inserts are considerable fast, subsequent inserts to more than 32000 tables
> are very slow.
>
> This seems to be due to the datatype (integer) of max_files_per_process
> option in the postgres.conf file which is used to set the maximum number of
> open file descriptors.
> Is there anything we could do about this max_files_per_process limit or any
> other way to speed up inserts to all these tables?
>
> Any suggestions are wellcome.
>
> Kind regards,
> Dilek Küçük
>



--
Achilleas Mantzios
 
 

Re: max_files_per_process limit

От
"Dilek Küçük"
Дата:

On Mon, Nov 10, 2008 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Dilek Küçük" <dilekkucuk@gmail.com> writes:
> We have a database of about 62000 tables (about 2000 tablespaces) with an
> index on each table. Postgresql version is 8.1.

You should probably rethink that schema.  A lot of similar tables can be
folded into one table with an extra key column.  Also, where did you get
the idea that 2000 tablespaces would be a good thing?  There's really no
point in more than one per spindle or filesystem.

> Although after the initial inserts to about 32000 tables the subsequent
> inserts are considerable fast, subsequent inserts to more than 32000 tables
> are very slow.

This has probably got more to do with inefficiencies of your filesystem
than anything else --- did you pick one that scales well to lots of
files per directory?
 
The database is working on FreeBSD 6.3 with UFS file system. It has 32 GB of RAM with 2 quadcore Intel Xeon 2.66 GHz processor, and about 11 TB of RAID5 storage.
 


> This seems to be due to the datatype (integer) of max_files_per_process
> option in the postgres.conf file which is used to set the maximum number of
> open file descriptors.

It's not so much the datatype of max_files_per_process as the datatype
of kernel file descriptors that's the limitation ...
 
We do not get any system messages related to the kernel file descriptor limit (like file: table is full) yet we will work again on both the database schema (tablespaces etc.) and system kernel variables.
 
Thanks,
Dilek Küçük
 


                       regards, tom lane

Re: max_files_per_process limit

От
"Scott Marlowe"
Дата:
On Tue, Nov 11, 2008 at 5:10 AM, Dilek Küçük <dilekkucuk@gmail.com> wrote:
>
> On Mon, Nov 10, 2008 at 4:51 PM, Achilleas Mantzios
> <achill@matrix.gatewaynet.com> wrote:
>>
>> Στις Monday 10 November 2008 16:18:37 ο/η Dilek Küçük έγραψε:
>> > Hi,
>> >
>> > We have a database of about 62000 tables (about 2000 tablespaces) with
>> > an
>> > index on each table. Postgresql version is 8.1.
>> >
>>
>> So you have about 62000 distinct schemata in your db?
>> Imagine that the average enterprise has about 200 tables max,
>> and an average sized country has about 300 such companies,
>> including public sector, with 62000 tables you could blindly model
>> .... the whole activity of a whole country.
>>
>> Is this some kind of replicated data?
>> Whats the story?
>
> Actually we had 31 distinct tables but this amounted to tens of billions of
> records (streaming data from 2000 sites) per table a year, so we
> horizontally partition each table into 2000 tables. This allowed us to
> discard one of the indexes that we have created and freed us from periodical
> cluster operations which turned out to be infeasible for a system with tight
> querying constraints in terms of time.

Any chance of combining less used tables back together to reduce the
number of them?  I'd also look at using more schemas and fewer
tablespaces.  Just a thought.