Обсуждение: [rfc] overhauling pgstat.stat

Поиск

Список

Период

Сортировка

[rfc] overhauling pgstat.stat

От

Satoshi Nagayasu

Дата:

04 сентября 2013 г., 01:10:48

Hi,

I'm considering overhauling pgstat.stat, and would like to know how many
people are interested in this topic.

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it needs
to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

To support multi-tenancy or just a large number of tables (up to 10k
tables in single database), I think pgstat.stat needs to be overhauled.

I think using heap and btree in pgstat.stat would be preferred to reduce
read/write and to allow updating access statistics for specific tables
in pgstat.stat file.

Is this good for us?

Any comments or suggestions?

Regards,
-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

Re: [rfc] overhauling pgstat.stat

От

Atri Sharma

Дата:

04 сентября 2013 г., 03:52:20

On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
> Hi,
>
> I'm considering overhauling pgstat.stat, and would like to know how many
> people are interested in this topic.
>
> As you may know, this file could be handreds of MB in size, because
> pgstat.stat holds all access statistics in each database, and it needs
> to read/write an entire pgstat.stat frequently.
>
> As a result, pgstat.stat often generates massive I/O operation,
> particularly when having a large number of tables in the database.
>
> To support multi-tenancy or just a large number of tables (up to 10k
> tables in single database), I think pgstat.stat needs to be overhauled.
>
> I think using heap and btree in pgstat.stat would be preferred to reduce
> read/write and to allow updating access statistics for specific tables
> in pgstat.stat file.
>
> Is this good for us?

Hi,

Nice thought. I/O reduction in pgstat can be really helpful.

I am trying to think of our aim here. Would we be looking to split
pgstat per table, so that the I/O write happens for only a portion of
pgstat? Or reduce the I/O in general?

If the later, how would using BTree help us? I would rather go for a
range tree or something. But again, I may be completely wrong.

Please elaborate a bit more on the solution we are trying to
achieve.It seems really interesting.

Regards,

Atri

-- 
Regards,

Atri
l'apprenant

Re: [rfc] overhauling pgstat.stat

От

Alvaro Herrera

Дата:

04 сентября 2013 г., 04:08:07

Satoshi Nagayasu wrote:

> As you may know, this file could be handreds of MB in size, because
> pgstat.stat holds all access statistics in each database, and it needs
> to read/write an entire pgstat.stat frequently.
> 
> As a result, pgstat.stat often generates massive I/O operation,
> particularly when having a large number of tables in the database.

We already changed it:

commit 187492b6c2e8cafc5b39063ca3b67846e8155d24
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   Mon Feb 18 17:56:08 2013 -0300
   Split pgstat file in smaller pieces      We now write one file per database and one global file, instead of   having
thewhole thing in a single huge file.  This reduces the I/O that   must be done when partial data is required -- which
isall the time,   because each process only needs information on its own database anyway.   Also, the autovacuum
launcherdoes not need data about tables and   functions in each database; having the global stats for all DBs is
enough.     Catalog version bumped because we have a new subdir under PGDATA.      Author: Tomas Vondra.  Some rework
byÁlvaro   Testing by Jeff Janes   Other discussion by Heikki Linnakangas, Tom Lane.

I don't oppose further tweaking, of course, but I wonder if you are
considering these changes.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [rfc] overhauling pgstat.stat

От

Satoshi Nagayasu

Дата:

04 сентября 2013 г., 05:09:19

Hi,

(2013/09/04 13:07), Alvaro Herrera wrote:
> Satoshi Nagayasu wrote:
>
>> As you may know, this file could be handreds of MB in size, because
>> pgstat.stat holds all access statistics in each database, and it needs
>> to read/write an entire pgstat.stat frequently.
>>
>> As a result, pgstat.stat often generates massive I/O operation,
>> particularly when having a large number of tables in the database.
>
> We already changed it:>> commit 187492b6c2e8cafc5b39063ca3b67846e8155d24> Author: Alvaro Herrera
<alvherre@alvh.no-ip.org>>Date:   Mon Feb 18 17:56:08 2013 -0300>>      Split pgstat file in smaller pieces
 

Thanks for the comments. I forgot to mention that.

Yes, we have already split single pgstat.stat file into
several pieces.

However, we still need to read/write large amount of statistics
data when we have a large number of tables in single database
or multiple databases being accessed. Right?

I think the issue here is that it is necessary to write/read
statistics data even it's not actually changed.

So, I'm wondering how we can minimize read/write operations
on these statistics data files with using heap and btree.

Regards,

>
> commit 187492b6c2e8cafc5b39063ca3b67846e8155d24
> Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Date:   Mon Feb 18 17:56:08 2013 -0300
>
>      Split pgstat file in smaller pieces
>
>      We now write one file per database and one global file, instead of
>      having the whole thing in a single huge file.  This reduces the I/O that
>      must be done when partial data is required -- which is all the time,
>      because each process only needs information on its own database anyway.
>      Also, the autovacuum launcher does not need data about tables and
>      functions in each database; having the global stats for all DBs is
>      enough.
>
>      Catalog version bumped because we have a new subdir under PGDATA.
>
>      Author: Tomas Vondra.  Some rework by Álvaro
>      Testing by Jeff Janes
>      Other discussion by Heikki Linnakangas, Tom Lane.
>
>
> I don't oppose further tweaking, of course, but I wonder if you are
> considering these changes.
>

-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

Re: [rfc] overhauling pgstat.stat

От

Satoshi Nagayasu

Дата:

04 сентября 2013 г., 05:25:13

Hi,

(2013/09/04 12:52), Atri Sharma wrote:
> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>> Hi,
>>
>> I'm considering overhauling pgstat.stat, and would like to know how many
>> people are interested in this topic.
>>
>> As you may know, this file could be handreds of MB in size, because
>> pgstat.stat holds all access statistics in each database, and it needs
>> to read/write an entire pgstat.stat frequently.
>>
>> As a result, pgstat.stat often generates massive I/O operation,
>> particularly when having a large number of tables in the database.
>>
>> To support multi-tenancy or just a large number of tables (up to 10k
>> tables in single database), I think pgstat.stat needs to be overhauled.
>>
>> I think using heap and btree in pgstat.stat would be preferred to reduce
>> read/write and to allow updating access statistics for specific tables
>> in pgstat.stat file.
>>
>> Is this good for us?
>
> Hi,
>
> Nice thought. I/O reduction in pgstat can be really helpful.
>
> I am trying to think of our aim here. Would we be looking to split
> pgstat per table, so that the I/O write happens for only a portion of
> pgstat? Or reduce the I/O in general?

I prefer the latter.

Under the current implementation, DBA need to split single database
into many smaller databases with considering access locality of the
tables. It's difficult and could be change in future.

And splitting the statistics data into many files (per table,
for example) would cause another performance issue when
collecting/showing statistics at once. Just my guess though.

So, I'm looking for a new way to reduce I/O for the statistics data
in general.

Regards,

>
> If the later, how would using BTree help us? I would rather go for a
> range tree or something. But again, I may be completely wrong.
>
> Please elaborate a bit more on the solution we are trying to
> achieve.It seems really interesting.
>
> Regards,
>
> Atri
>
>

-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

Re: [rfc] overhauling pgstat.stat

От

Atri Sharma

Дата:

04 сентября 2013 г., 06:23:30


Sent from my iPad

On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga@uptime.jp> wrote:

> Hi,
>
> (2013/09/04 12:52), Atri Sharma wrote:
>> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>>> Hi,
>>>
>>> I'm considering overhauling pgstat.stat, and would like to know how many
>>> people are interested in this topic.
>>>
>>> As you may know, this file could be handreds of MB in size, because
>>> pgstat.stat holds all access statistics in each database, and it needs
>>> to read/write an entire pgstat.stat frequently.
>>>
>>> As a result, pgstat.stat often generates massive I/O operation,
>>> particularly when having a large number of tables in the database.
>>>
>>> To support multi-tenancy or just a large number of tables (up to 10k
>>> tables in single database), I think pgstat.stat needs to be overhauled.
>>>
>>> I think using heap and btree in pgstat.stat would be preferred to reduce
>>> read/write and to allow updating access statistics for specific tables
>>> in pgstat.stat file.
>>>
>>> Is this good for us?
>>
>> Hi,
>>
>> Nice thought. I/O reduction in pgstat can be really helpful.
>>
>> I am trying to think of our aim here. Would we be looking to split
>> pgstat per table, so that the I/O write happens for only a portion of
>> pgstat? Or reduce the I/O in general?
>
> I prefer the latter.
>
> Under the current implementation, DBA need to split single database
> into many smaller databases with considering access locality of the
> tables. It's difficult and could be change in future.
>
> And splitting the statistics data into many files (per table,
> for example) would cause another performance issue when
> collecting/showing statistics at once. Just my guess though.
>
> So, I'm looking for a new way to reduce I/O for the statistics data
> in general.
>
> Regards,
>
>>
>> If the later, how would using BTree help us? I would rather go for a
>> range tree or something. But again, I may be completely wrong.
>>
>> Please elaborate a bit more on the solution we are trying to
>> achieve.It seems really interesting.
>>
>> Regards,
>>
>> Atri
>>
>>
>
>

Right,thanks.

How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of
thestats? 

Regards,

Atri

Re: [rfc] overhauling pgstat.stat

От

Satoshi Nagayasu

Дата:

04 сентября 2013 г., 10:09:59

(2013/09/04 15:23), Atri Sharma wrote:
>
>
> Sent from my iPad
>
> On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>
>> Hi,
>>
>> (2013/09/04 12:52), Atri Sharma wrote:
>>> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>>>> Hi,
>>>>
>>>> I'm considering overhauling pgstat.stat, and would like to know how many
>>>> people are interested in this topic.
>>>>
>>>> As you may know, this file could be handreds of MB in size, because
>>>> pgstat.stat holds all access statistics in each database, and it needs
>>>> to read/write an entire pgstat.stat frequently.
>>>>
>>>> As a result, pgstat.stat often generates massive I/O operation,
>>>> particularly when having a large number of tables in the database.
>>>>
>>>> To support multi-tenancy or just a large number of tables (up to 10k
>>>> tables in single database), I think pgstat.stat needs to be overhauled.
>>>>
>>>> I think using heap and btree in pgstat.stat would be preferred to reduce
>>>> read/write and to allow updating access statistics for specific tables
>>>> in pgstat.stat file.
>>>>
>>>> Is this good for us?
>>>
>>> Hi,
>>>
>>> Nice thought. I/O reduction in pgstat can be really helpful.
>>>
>>> I am trying to think of our aim here. Would we be looking to split
>>> pgstat per table, so that the I/O write happens for only a portion of
>>> pgstat? Or reduce the I/O in general?
>>
>> I prefer the latter.
>>
>> Under the current implementation, DBA need to split single database
>> into many smaller databases with considering access locality of the
>> tables. It's difficult and could be change in future.
>>
>> And splitting the statistics data into many files (per table,
>> for example) would cause another performance issue when
>> collecting/showing statistics at once. Just my guess though.
>>
>> So, I'm looking for a new way to reduce I/O for the statistics data
>> in general.
>>
>> Regards,
>>
>>>
>>> If the later, how would using BTree help us? I would rather go for a
>>> range tree or something. But again, I may be completely wrong.
>>>
>>> Please elaborate a bit more on the solution we are trying to
>>> achieve.It seems really interesting.
>>>
>>> Regards,
>>>
>>> Atri
>>>
>>>
>>
>>
>
> Right,thanks.
>
> How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system
ofthe stats?
 

For example, when you read only a single block from your table,
then you need to write all values in your database statistics next.
It often generates large amount of i/o operation.

However, if random access is allowed in the statistics, you can
update only as single record for the specific table which you read.
It would be less than 100 bytes for each table.

I have no idea about how a priority queue can work here so far.
However, if the statistics is overhauled, PostgreSQL would be able
to host a much larger number of customers more easily.

Regards,
-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

Re: [rfc] overhauling pgstat.stat

От

Stephen Frost

Дата:

04 сентября 2013 г., 12:11:26

Satoshi,

* Satoshi Nagayasu (snaga@uptime.jp) wrote:
> (2013/09/04 13:07), Alvaro Herrera wrote:
> >We already changed it:
> >
> > commit 187492b6c2e8cafc5b39063ca3b67846e8155d24
> > Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
> > Date:   Mon Feb 18 17:56:08 2013 -0300
> >
> >      Split pgstat file in smaller pieces
>
> Thanks for the comments. I forgot to mention that.
>
> Yes, we have already split single pgstat.stat file into
> several pieces.
>
> However, we still need to read/write large amount of statistics
> data when we have a large number of tables in single database
> or multiple databases being accessed. Right?

Would simply also splitting per tablespace help?

> I think the issue here is that it is necessary to write/read
> statistics data even it's not actually changed.
>
> So, I'm wondering how we can minimize read/write operations
> on these statistics data files with using heap and btree.

It does sound like an interesting idea to use heap/btree instead but I
wonder about the effort involved, particularly around coordinating
access.  We wouldn't want to end up introducing additional contention
points by doing this..
Thanks,
    Stephen

Re: [rfc] overhauling pgstat.stat

От

Atri Sharma

Дата:

04 сентября 2013 г., 12:37:09


Sent from my iPad

On 04-Sep-2013, at 15:39, Satoshi Nagayasu <snaga@uptime.jp> wrote:

> (2013/09/04 15:23), Atri Sharma wrote:
>>
>>
>> Sent from my iPad
>>
>> On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>>
>>> Hi,
>>>
>>> (2013/09/04 12:52), Atri Sharma wrote:
>>>> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>>>>> Hi,
>>>>>
>>>>> I'm considering overhauling pgstat.stat, and would like to know how many
>>>>> people are interested in this topic.
>>>>>
>>>>> As you may know, this file could be handreds of MB in size, because
>>>>> pgstat.stat holds all access statistics in each database, and it needs
>>>>> to read/write an entire pgstat.stat frequently.
>>>>>
>>>>> As a result, pgstat.stat often generates massive I/O operation,
>>>>> particularly when having a large number of tables in the database.
>>>>>
>>>>> To support multi-tenancy or just a large number of tables (up to 10k
>>>>> tables in single database), I think pgstat.stat needs to be overhauled.
>>>>>
>>>>> I think using heap and btree in pgstat.stat would be preferred to reduce
>>>>> read/write and to allow updating access statistics for specific tables
>>>>> in pgstat.stat file.
>>>>>
>>>>> Is this good for us?
>>>>
>>>> Hi,
>>>>
>>>> Nice thought. I/O reduction in pgstat can be really helpful.
>>>>
>>>> I am trying to think of our aim here. Would we be looking to split
>>>> pgstat per table, so that the I/O write happens for only a portion of
>>>> pgstat? Or reduce the I/O in general?
>>>
>>> I prefer the latter.
>>>
>>> Under the current implementation, DBA need to split single database
>>> into many smaller databases with considering access locality of the
>>> tables. It's difficult and could be change in future.
>>>
>>> And splitting the statistics data into many files (per table,
>>> for example) would cause another performance issue when
>>> collecting/showing statistics at once. Just my guess though.
>>>
>>> So, I'm looking for a new way to reduce I/O for the statistics data
>>> in general.
>>>
>>> Regards,
>>>
>>>>
>>>> If the later, how would using BTree help us? I would rather go for a
>>>> range tree or something. But again, I may be completely wrong.
>>>>
>>>> Please elaborate a bit more on the solution we are trying to
>>>> achieve.It seems really interesting.
>>>>
>>>> Regards,
>>>>
>>>> Atri
>>
>> Right,thanks.
>>
>> How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system
ofthe stats? 
>
> For example, when you read only a single block from your table,
> then you need to write all values in your database statistics next.
> It often generates large amount of i/o operation.
>
> However, if random access is allowed in the statistics, you can
> update only as single record for the specific table which you read.
> It would be less than 100 bytes for each table.
>
> I have no idea about how a priority queue can work here so far.
> However, if the statistics is overhauled, PostgreSQL would be able
> to host a much larger number of customers


Ah, now I get it. Thanks a ton for the detailed explanation.

Yes, a BTree will sufficiently isolate per table stats here and allow for random access.

Another thing I can think of is having a write back cache which could probably be used for a buffer before the actual
statswrite. I am just musing here though. 

Regards,

Atri

Re: [rfc] overhauling pgstat.stat

От

Pavel Stehule

Дата:

04 сентября 2013 г., 12:42:29

2013/9/4 Atri Sharma <atri.jiit@gmail.com>

Sent from my iPad

On 04-Sep-2013, at 15:39, Satoshi Nagayasu <snaga@uptime.jp> wrote:

> (2013/09/04 15:23), Atri Sharma wrote:
>>
>>
>> Sent from my iPad
>>
>> On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>>
>>> Hi,
>>>
>>> (2013/09/04 12:52), Atri Sharma wrote:
>>>> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>>>>> Hi,
>>>>>
>>>>> I'm considering overhauling pgstat.stat, and would like to know how many
>>>>> people are interested in this topic.
>>>>>
>>>>> As you may know, this file could be handreds of MB in size, because
>>>>> pgstat.stat holds all access statistics in each database, and it needs
>>>>> to read/write an entire pgstat.stat frequently.
>>>>>
>>>>> As a result, pgstat.stat often generates massive I/O operation,
>>>>> particularly when having a large number of tables in the database.
>>>>>
>>>>> To support multi-tenancy or just a large number of tables (up to 10k
>>>>> tables in single database), I think pgstat.stat needs to be overhauled.
>>>>>
>>>>> I think using heap and btree in pgstat.stat would be preferred to reduce
>>>>> read/write and to allow updating access statistics for specific tables
>>>>> in pgstat.stat file.
>>>>>
>>>>> Is this good for us?
>>>>
>>>> Hi,
>>>>
>>>> Nice thought. I/O reduction in pgstat can be really helpful.
>>>>
>>>> I am trying to think of our aim here. Would we be looking to split
>>>> pgstat per table, so that the I/O write happens for only a portion of
>>>> pgstat? Or reduce the I/O in general?
>>>
>>> I prefer the latter.
>>>
>>> Under the current implementation, DBA need to split single database
>>> into many smaller databases with considering access locality of the
>>> tables. It's difficult and could be change in future.
>>>
>>> And splitting the statistics data into many files (per table,
>>> for example) would cause another performance issue when
>>> collecting/showing statistics at once. Just my guess though.
>>>
>>> So, I'm looking for a new way to reduce I/O for the statistics data
>>> in general.
>>>
>>> Regards,
>>>
>>>>
>>>> If the later, how would using BTree help us? I would rather go for a
>>>> range tree or something. But again, I may be completely wrong.
>>>>
>>>> Please elaborate a bit more on the solution we are trying to
>>>> achieve.It seems really interesting.
>>>>
>>>> Regards,
>>>>
>>>> Atri
>>
>> Right,thanks.
>>
>> How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of the stats?
>
> For example, when you read only a single block from your table,
> then you need to write all values in your database statistics next.
> It often generates large amount of i/o operation.
>
> However, if random access is allowed in the statistics, you can
> update only as single record for the specific table which you read.
> It would be less than 100 bytes for each table.
>
> I have no idea about how a priority queue can work here so far.
> However, if the statistics is overhauled, PostgreSQL would be able
> to host a much larger number of customers

Ah, now I get it. Thanks a ton for the detailed explanation.

Yes, a BTree will sufficiently isolate per table stats here and allow for random access.

Another thing I can think of is having a write back cache which could probably be used for a buffer before the actual stats write. I am just musing here though.

we very successfully use a tmpfs volume for pgstat files (use a backport of multiple statfiles from 9.3 to 9.1)

Regards

Pavel

Regards,

Atri

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [rfc] overhauling pgstat.stat

От

Pavel Stehule

Дата:

04 сентября 2013 г., 12:43:47

2013/9/4 Atri Sharma <atri.jiit@gmail.com>

Sent from my iPad

On 04-Sep-2013, at 15:39, Satoshi Nagayasu <snaga@uptime.jp> wrote:

> (2013/09/04 15:23), Atri Sharma wrote:
>>
>>
>> Sent from my iPad
>>
>> On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>>
>>> Hi,
>>>
>>> (2013/09/04 12:52), Atri Sharma wrote:
>>>> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>>>>> Hi,
>>>>>
>>>>> I'm considering overhauling pgstat.stat, and would like to know how many
>>>>> people are interested in this topic.
>>>>>
>>>>> As you may know, this file could be handreds of MB in size, because
>>>>> pgstat.stat holds all access statistics in each database, and it needs
>>>>> to read/write an entire pgstat.stat frequently.
>>>>>
>>>>> As a result, pgstat.stat often generates massive I/O operation,
>>>>> particularly when having a large number of tables in the database.
>>>>>
>>>>> To support multi-tenancy or just a large number of tables (up to 10k
>>>>> tables in single database), I think pgstat.stat needs to be overhauled.
>>>>>
>>>>> I think using heap and btree in pgstat.stat would be preferred to reduce
>>>>> read/write and to allow updating access statistics for specific tables
>>>>> in pgstat.stat file.
>>>>>
>>>>> Is this good for us?
>>>>
>>>> Hi,
>>>>
>>>> Nice thought. I/O reduction in pgstat can be really helpful.
>>>>
>>>> I am trying to think of our aim here. Would we be looking to split
>>>> pgstat per table, so that the I/O write happens for only a portion of
>>>> pgstat? Or reduce the I/O in general?
>>>
>>> I prefer the latter.
>>>
>>> Under the current implementation, DBA need to split single database
>>> into many smaller databases with considering access locality of the
>>> tables. It's difficult and could be change in future.
>>>
>>> And splitting the statistics data into many files (per table,
>>> for example) would cause another performance issue when
>>> collecting/showing statistics at once. Just my guess though.
>>>
>>> So, I'm looking for a new way to reduce I/O for the statistics data
>>> in general.
>>>
>>> Regards,
>>>
>>>>
>>>> If the later, how would using BTree help us? I would rather go for a
>>>> range tree or something. But again, I may be completely wrong.
>>>>
>>>> Please elaborate a bit more on the solution we are trying to
>>>> achieve.It seems really interesting.
>>>>
>>>> Regards,
>>>>
>>>> Atri
>>
>> Right,thanks.
>>
>> How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of the stats?
>
> For example, when you read only a single block from your table,
> then you need to write all values in your database statistics next.
> It often generates large amount of i/o operation.
>
> However, if random access is allowed in the statistics, you can
> update only as single record for the specific table which you read.
> It would be less than 100 bytes for each table.
>
> I have no idea about how a priority queue can work here so far.
> However, if the statistics is overhauled, PostgreSQL would be able
> to host a much larger number of customers

Ah, now I get it. Thanks a ton for the detailed explanation.

Yes, a BTree will sufficiently isolate per table stats here and allow for random access.

Another thing I can think of is having a write back cache which could probably be used for a buffer before the actual stats write. I am just musing here though.

we very successfully use a tmpfs volume for pgstat files (use a backport of multiple statfiles from 9.3 to 9.1

Regards,

Atri

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [rfc] overhauling pgstat.stat

От

Tomas Vondra

Дата:

04 сентября 2013 г., 18:34:20

On 4.9.2013 07:24, Satoshi Nagayasu wrote:
> Hi,
> 
> (2013/09/04 12:52), Atri Sharma wrote:
>> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp>
>>  wrote:
>>> Hi,
>>> 
>>> I'm considering overhauling pgstat.stat, and would like to know 
>>> how many people are interested in this topic.
>>> 
>>> As you may know, this file could be handreds of MB in size, 
>>> because pgstat.stat holds all access statistics in each database,
>>> and it needs to read/write an entire pgstat.stat frequently.
>>> 
>>> As a result, pgstat.stat often generates massive I/O operation, 
>>> particularly when having a large number of tables in the 
>>> database.
>>> 
>>> To support multi-tenancy or just a large number of tables (up to
>>>  10k tables in single database), I think pgstat.stat needs to be
>>>  overhauled.
>>> 
>>> I think using heap and btree in pgstat.stat would be preferred to
>>> reduce read/write and to allow updating access statistics for 
>>> specific tables in pgstat.stat file.
>>> 
>>> Is this good for us?
>> 
>> Hi,
>> 
>> Nice thought. I/O reduction in pgstat can be really helpful.
>> 
>> I am trying to think of our aim here. Would we be looking to split 
>> pgstat per table, so that the I/O write happens for only a portion
>> of pgstat? Or reduce the I/O in general?
> 
> I prefer the latter.
> 
> Under the current implementation, DBA need to split single database 
> into many smaller databases with considering access locality of the 
> tables. It's difficult and could be change in future.
> 
> And splitting the statistics data into many files (per table, for 
> example) would cause another performance issue when 
> collecting/showing statistics at once. Just my guess though.
> 
> So, I'm looking for a new way to reduce I/O for the statistics data 
> in general.

Hi,

as one of the authors of the 9.3 patch (per database stats), I planned
to work on this a bit more in the 9.4 cycle. So a few comments / ideas.

I'm not entirely sure splitting the stats per table would be that bad.
After all we already have per-relation data files, so either the users
already have serious problems (so this won't make it noticeably worse)
or it will work fine. But I'm not saying it's the right choice either.

My idea was to keep the per-database stats, but allow some sort of
"random" access - updating / deleting the records in place, adding
records etc. The simplest way I could think of was adding a simple
"index" - a mapping of OID to position in the stat file.

I.e. a simple  array of (oid, offset) pairs, stored in oid.stat.index or
something like that. This would make it quite simple to access existing
record
 1: get position from the index 2: read sizeof(Entry) from the file 3: if it's update, just overwrite the bytes, for
deleteset isdeleted    flag (needs to be added to all entries)

or reading all the records (just read the whole file as today).

regards
Tomas

Re: [rfc] overhauling pgstat.stat

От

Tomas Vondra

Дата:

04 сентября 2013 г., 18:42:55

On 4.9.2013 14:43, Pavel Stehule wrote:
> 
> 
> 
> 2013/9/4 Atri Sharma <atri.jiit@gmail.com <mailto:atri.jiit@gmail.com>>
> 
> 
> 
>     Sent from my iPad
> 
>     On 04-Sep-2013, at 15:39, Satoshi Nagayasu <snaga@uptime.jp
>     <mailto:snaga@uptime.jp>> wrote:
> 
>     > (2013/09/04 15:23), Atri Sharma wrote:
>     >>
>     >>
>     >> Sent from my iPad
>     >>
>     >> On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga@uptime.jp
>     <mailto:snaga@uptime.jp>> wrote:
>     >>
>     >>> Hi,
>     >>>
>     >>> (2013/09/04 12:52), Atri Sharma wrote:
>     >>>> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu
>     <snaga@uptime.jp <mailto:snaga@uptime.jp>> wrote:
>     >>>>> Hi,
>     >>>>>
>     >>>>> I'm considering overhauling pgstat.stat, and would like to
>     know how many
>     >>>>> people are interested in this topic.
>     >>>>>
>     >>>>> As you may know, this file could be handreds of MB in size,
>     because
>     >>>>> pgstat.stat holds all access statistics in each database, and
>     it needs
>     >>>>> to read/write an entire pgstat.stat frequently.
>     >>>>>
>     >>>>> As a result, pgstat.stat often generates massive I/O operation,
>     >>>>> particularly when having a large number of tables in the database.
>     >>>>>
>     >>>>> To support multi-tenancy or just a large number of tables (up
>     to 10k
>     >>>>> tables in single database), I think pgstat.stat needs to be
>     overhauled.
>     >>>>>
>     >>>>> I think using heap and btree in pgstat.stat would be preferred
>     to reduce
>     >>>>> read/write and to allow updating access statistics for
>     specific tables
>     >>>>> in pgstat.stat file.
>     >>>>>
>     >>>>> Is this good for us?
>     >>>>
>     >>>> Hi,
>     >>>>
>     >>>> Nice thought. I/O reduction in pgstat can be really helpful.
>     >>>>
>     >>>> I am trying to think of our aim here. Would we be looking to split
>     >>>> pgstat per table, so that the I/O write happens for only a
>     portion of
>     >>>> pgstat? Or reduce the I/O in general?
>     >>>
>     >>> I prefer the latter.
>     >>>
>     >>> Under the current implementation, DBA need to split single database
>     >>> into many smaller databases with considering access locality of the
>     >>> tables. It's difficult and could be change in future.
>     >>>
>     >>> And splitting the statistics data into many files (per table,
>     >>> for example) would cause another performance issue when
>     >>> collecting/showing statistics at once. Just my guess though.
>     >>>
>     >>> So, I'm looking for a new way to reduce I/O for the statistics data
>     >>> in general.
>     >>>
>     >>> Regards,
>     >>>
>     >>>>
>     >>>> If the later, how would using BTree help us? I would rather go
>     for a
>     >>>> range tree or something. But again, I may be completely wrong.
>     >>>>
>     >>>> Please elaborate a bit more on the solution we are trying to
>     >>>> achieve.It seems really interesting.
>     >>>>
>     >>>> Regards,
>     >>>>
>     >>>> Atri
>     >>
>     >> Right,thanks.
>     >>
>     >> How would using heap and BTree help here? Are we looking at a
>     priority queue which supports the main storage system of the stats?
>     >
>     > For example, when you read only a single block from your table,
>     > then you need to write all values in your database statistics next.
>     > It often generates large amount of i/o operation.
>     >
>     > However, if random access is allowed in the statistics, you can
>     > update only as single record for the specific table which you read.
>     > It would be less than 100 bytes for each table.
>     >
>     > I have no idea about how a priority queue can work here so far.
>     > However, if the statistics is overhauled, PostgreSQL would be able
>     > to host a much larger number of customers
> 
> 
>     Ah, now I get it. Thanks a ton for the detailed explanation.
> 
>     Yes, a BTree will sufficiently isolate per table stats here and
>     allow for random access.
> 
>     Another thing I can think of is having a write back cache which
>     could probably be used for a buffer before the actual stats write. I
>     am just musing here though.
> 
> 
> we very successfully use a tmpfs volume for pgstat files (use a backport
> of multiple statfiles from 9.3 to 9.1

It works quite well as long as you have the objects (tables, indexes,
functions) spread across multiple databases. Once you have one database
with very large number of objects, tmpfs is not as effective.

It's going to help with stats I/O, but it's not going to help with high
CPU usage (you're reading and parsing the stat files over and over) and
every rewrite creates a copy of the file. So if you have 400MB stats,
you will need 800MB tmpfs + some slack (say, 200MB). That means you'll
use ~1GB tmpfs although 400MB would be just fine. And this 600MB won't
be used for page cache etc.

OTOH, it's true that if you have that many objects, 600MB of RAM is not
going to help you anyway.

Tomas

Re: [rfc] overhauling pgstat.stat

От

Pavel Stehule

Дата:

04 сентября 2013 г., 18:51:30

>
>
> we very successfully use a tmpfs volume for pgstat files (use a backport
> of multiple statfiles from 9.3 to 9.1

It works quite well as long as you have the objects (tables, indexes,
functions) spread across multiple databases. Once you have one database
with very large number of objects, tmpfs is not as effective.

It's going to help with stats I/O, but it's not going to help with high
CPU usage (you're reading and parsing the stat files over and over) and
every rewrite creates a copy of the file. So if you have 400MB stats,
you will need 800MB tmpfs + some slack (say, 200MB). That means you'll
use ~1GB tmpfs although 400MB would be just fine. And this 600MB won't
be used for page cache etc.

OTOH, it's true that if you have that many objects, 600MB of RAM is not
going to help you anyway.

and just idea - can we use a database for storing these files. It can be used in unlogged tables. Second idea - hold a one bg worker as persistent memory key value database and hold data in memory with some optimizations - using anti cache and similar memory database fetures.

Pavel

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [rfc] overhauling pgstat.stat

От

Tomas Vondra

Дата:

04 сентября 2013 г., 18:58:27

On 4.9.2013 14:13, Stephen Frost wrote:
> * Satoshi Nagayasu (snaga@uptime.jp) wrote:
>
>> Yes, we have already split single pgstat.stat file into several 
>> pieces.
>> 
>> However, we still need to read/write large amount of statistics 
>> data when we have a large number of tables in single database or 
>> multiple databases being accessed. Right?
> 
> Would simply also splitting per tablespace help?

I don't think that's a good solution. Forcing the users to use
tablespaces just to minimize the stats overhead is not that far from
forcing them to use multiple databases.

Tablespaces have disadvantages too, so I'd suggest to look for a
solution metting the "just works" criterion.

>> I think the issue here is that it is necessary to write/read 
>> statistics data even it's not actually changed.
>> 
>> So, I'm wondering how we can minimize read/write operations on 
>> these statistics data files with using heap and btree.
> 
> It does sound like an interesting idea to use heap/btree instead but 
> I wonder about the effort involved, particularly around coordinating
> access.  We wouldn't want to end up introducing additional contention
> points by doing this..

Yes, this is definitely an important thing to keep in mind. Maintaining
the "index" (no matter what kind of index will be used) will cause some
contention, but I believe it will be manageable with a bit of effort.

Tomas

Re: [rfc] overhauling pgstat.stat

От

Alvaro Herrera

Дата:

04 сентября 2013 г., 18:59:42

Tomas Vondra wrote:

> My idea was to keep the per-database stats, but allow some sort of
> "random" access - updating / deleting the records in place, adding
> records etc. The simplest way I could think of was adding a simple
> "index" - a mapping of OID to position in the stat file.
> 
> I.e. a simple  array of (oid, offset) pairs, stored in oid.stat.index or
> something like that. This would make it quite simple to access existing
> record
> 
>   1: get position from the index
>   2: read sizeof(Entry) from the file
>   3: if it's update, just overwrite the bytes, for delete set isdeleted
>      flag (needs to be added to all entries)
> 
> or reading all the records (just read the whole file as today).

Sounds reasonable.  However, I think the index should be a real index,
i.e. have a tree structure that can be walked down, not just a plain
array.  If you have a 400 MB stat file, then you must have about 4
million tables, and you will not want to scan such a large array every
time you want to find an entry.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [rfc] overhauling pgstat.stat

От

Tomas Vondra

Дата:

04 сентября 2013 г., 20:47:01

On 4.9.2013 20:59, Alvaro Herrera wrote:
> Tomas Vondra wrote:
> 
>> My idea was to keep the per-database stats, but allow some sort of
>> "random" access - updating / deleting the records in place, adding
>> records etc. The simplest way I could think of was adding a simple
>> "index" - a mapping of OID to position in the stat file.
>>
>> I.e. a simple  array of (oid, offset) pairs, stored in oid.stat.index or
>> something like that. This would make it quite simple to access existing
>> record
>>
>>   1: get position from the index
>>   2: read sizeof(Entry) from the file
>>   3: if it's update, just overwrite the bytes, for delete set isdeleted
>>      flag (needs to be added to all entries)
>>
>> or reading all the records (just read the whole file as today).
> 
> Sounds reasonable.  However, I think the index should be a real index,
> i.e. have a tree structure that can be walked down, not just a plain
> array.  If you have a 400 MB stat file, then you must have about 4
> million tables, and you will not want to scan such a large array every
> time you want to find an entry.

I was thinking about a sorted array, so a bisection would be a simple
and fast way to search. New items could be added to another small
unsorted array (say, 1000 elements) and this would be extended and
resorted only when this small one gets full.

Tomas

Re: [rfc] overhauling pgstat.stat

От

Satoshi Nagayasu

Дата:

05 сентября 2013 г., 03:19:29

(2013/09/05 3:50), Pavel Stehule wrote:
>      > we very successfully use a tmpfs volume for pgstat files (use a
>     backport
>      > of multiple statfiles from 9.3 to 9.1
>
>     It works quite well as long as you have the objects (tables, indexes,
>     functions) spread across multiple databases. Once you have one database
>     with very large number of objects, tmpfs is not as effective.
>
>     It's going to help with stats I/O, but it's not going to help with high
>     CPU usage (you're reading and parsing the stat files over and over) and
>     every rewrite creates a copy of the file. So if you have 400MB stats,
>     you will need 800MB tmpfs + some slack (say, 200MB). That means you'll
>     use ~1GB tmpfs although 400MB would be just fine. And this 600MB won't
>     be used for page cache etc.
>
>     OTOH, it's true that if you have that many objects, 600MB of RAM is not
>     going to help you anyway.
>
>
> and just idea - can we use a database for storing these files. It can be
> used in unlogged tables. Second idea - hold a one bg worker as
> persistent memory key value database and hold data in memory with some
> optimizations - using anti cache and similar memory database fetures.

Yeah, I'm interested in this idea too.

If the stat collector has a dedicated connection to the backend in
order to store statistics into dedicated tables, we can easily take
advantages of index (btree, or hash?) and heap storage.

Is this worth trying?

Regards,
-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

Re: [rfc] overhauling pgstat.stat

От

Satoshi Nagayasu

Дата:

05 сентября 2013 г., 03:28:49

(2013/09/05 3:59), Alvaro Herrera wrote:
> Tomas Vondra wrote:
>
>> My idea was to keep the per-database stats, but allow some sort of
>> "random" access - updating / deleting the records in place, adding
>> records etc. The simplest way I could think of was adding a simple
>> "index" - a mapping of OID to position in the stat file.
>>
>> I.e. a simple  array of (oid, offset) pairs, stored in oid.stat.index or
>> something like that. This would make it quite simple to access existing
>> record
>>
>>    1: get position from the index
>>    2: read sizeof(Entry) from the file
>>    3: if it's update, just overwrite the bytes, for delete set isdeleted
>>       flag (needs to be added to all entries)
>>
>> or reading all the records (just read the whole file as today).
>
> Sounds reasonable.  However, I think the index should be a real index,
> i.e. have a tree structure that can be walked down, not just a plain
> array.  If you have a 400 MB stat file, then you must have about 4
> million tables, and you will not want to scan such a large array every
> time you want to find an entry.

I thought an array structure at first.

But, for now, I think we should have a real index for the
statistics data because we already have several index storages,
and it will allow us to minimize read/write operations.

BTW, what kind of index would be preferred for this purpose?
btree or hash?

If we use btree, do we need "range scan" thing on the statistics
tables? I have no idea so far.

Regards,
-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

Re: [rfc] overhauling pgstat.stat

От

Atri Sharma

Дата:

05 сентября 2013 г., 03:55:13


Sent from my iPad

On 05-Sep-2013, at 8:58, Satoshi Nagayasu <snaga@uptime.jp> wrote:

> (2013/09/05 3:59), Alvaro Herrera wrote:
>> Tomas Vondra wrote:
>>
>>> My idea was to keep the per-database stats, but allow some sort of
>>> "random" access - updating / deleting the records in place, adding
>>> records etc. The simplest way I could think of was adding a simple
>>> "index" - a mapping of OID to position in the stat file.
>>>
>>> I.e. a simple  array of (oid, offset) pairs, stored in oid.stat.index or
>>> something like that. This would make it quite simple to access existing
>>> record
>>>
>>>   1: get position from the index
>>>   2: read sizeof(Entry) from the file
>>>   3: if it's update, just overwrite the bytes, for delete set isdeleted
>>>      flag (needs to be added to all entries)
>>>
>>> or reading all the records (just read the whole file as today).
>>
>> Sounds reasonable.  However, I think the index should be a real index,
>> i.e. have a tree structure that can be walked down, not just a plain
>> array.  If you have a 400 MB stat file, then you must have about 4
>> million tables, and you will not want to scan such a large array every
>> time you want to find an entry.
>
> I thought an array structure at first.
>
> But, for now, I think we should have a real index for the
> statistics data because we already have several index storages,
> and it will allow us to minimize read/write operations.
>
> BTW, what kind of index would be preferred for this purpose?
> btree or hash?
>
> If we use btree, do we need "range scan" thing on the statistics
> tables? I have no idea so far.
>

The thing I am interested in is range scan. That is the reason I wish to explore range tree usage here, maybe as a
secondaryindex. 

Regards,

Atri

Re: [rfc] overhauling pgstat.stat

От

Alvaro Herrera

Дата:

05 сентября 2013 г., 05:29:26

Satoshi Nagayasu wrote:

> But, for now, I think we should have a real index for the
> statistics data because we already have several index storages,
> and it will allow us to minimize read/write operations.
> 
> BTW, what kind of index would be preferred for this purpose?
> btree or hash?

I find it hard to get excited about using the AM interface for this
purpose.  To me it makes a lot more sense to have separate, much
simpler code.  We don't need any transactionality, user defined types,
user defined operators, or anything like that.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [rfc] overhauling pgstat.stat

От

Atri Sharma

Дата:

05 сентября 2013 г., 07:36:47

On Thu, Sep 5, 2013 at 10:59 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Satoshi Nagayasu wrote:
>
>> But, for now, I think we should have a real index for the
>> statistics data because we already have several index storages,
>> and it will allow us to minimize read/write operations.
>>
>> BTW, what kind of index would be preferred for this purpose?
>> btree or hash?
>
> I find it hard to get excited about using the AM interface for this
> purpose.  To me it makes a lot more sense to have separate, much
> simpler code.  We don't need any transactionality, user defined types,
> user defined operators, or anything like that.

+1.

But, would not rewriting a lot of existing functionalities potentially
lead to points of contention and/or much more effort?

Regards,

Atri
-- 
Regards,

Atri
l'apprenant

Re: [rfc] overhauling pgstat.stat

От

Tomas Vondra

Дата:

07 сентября 2013 г., 22:57:46

On 5.9.2013 09:36, Atri Sharma wrote:
> On Thu, Sep 5, 2013 at 10:59 AM, Alvaro Herrera 
> <alvherre@2ndquadrant.com> wrote:
>> Satoshi Nagayasu wrote:
>> 
>>> But, for now, I think we should have a real index for the 
>>> statistics data because we already have several index storages, 
>>> and it will allow us to minimize read/write operations.
>>> 
>>> BTW, what kind of index would be preferred for this purpose? 
>>> btree or hash?
>> 
>> I find it hard to get excited about using the AM interface for
>> this purpose.  To me it makes a lot more sense to have separate,
>> much simpler code.  We don't need any transactionality, user
>> defined types, user defined operators, or anything like that.
> 
> +1.
> 
> But, would not rewriting a lot of existing functionalities
> potentially lead to points of contention and/or much more effort?

Don't forget the stats are written only by the postmaster, all the
regular backends only read it (and eventually send updates back). But
yes, contention might be a problem, because there will have to be some
kind of locking that is not needed now when the postmaster is writing
fresh copy into a new file.

But I think we need to implement something and then measure this.
Because it might even with the contention the overall performance might
actually improve.

I'd vote to try a simple approach first - adding some simple array
'index' allowing fast access to particular records etc. At least that
was my plan. But feel free to implement something more advanced (e.g. a
BTree storage) and we can compare the results.

Tomas

Re: [rfc] overhauling pgstat.stat

От

Tomas Vondra

Дата:

07 сентября 2013 г., 23:09:46

On 5.9.2013 07:29, Alvaro Herrera wrote:
> Satoshi Nagayasu wrote:
> 
>> But, for now, I think we should have a real index for the 
>> statistics data because we already have several index storages, and
>> it will allow us to minimize read/write operations.
>> 
>> BTW, what kind of index would be preferred for this purpose? btree
>> or hash?
> 
> I find it hard to get excited about using the AM interface for this 
> purpose.  To me it makes a lot more sense to have separate, much 
> simpler code.  We don't need any transactionality, user defined
> types, user defined operators, or anything like that.

+1 to these concerns

And I think using regular tables might actually cause more harm than
benefits. For example let's say we have a large database with many
objects (which is the aim of this thread) with high activity - sessions
accessing objects, i.e. updating many "rows" in the stats tables.

Now, the stats table is likely to bloat thanks of the MVCC
copy-on-update. Not a good think, and it might easily happen the price
for maintenance of the table will be much higher than what we saved.

There are probably other similar scenarios.

Tomas

Re: [rfc] overhauling pgstat.stat

От

Atri Sharma

Дата:

08 сентября 2013 г., 05:01:28


Sent from my iPad

On 08-Sep-2013, at 4:27, Tomas Vondra <tv@fuzzy.cz> wrote:

> On 5.9.2013 09:36, Atri Sharma wrote:
>> On Thu, Sep 5, 2013 at 10:59 AM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>>> Satoshi Nagayasu wrote:
>>>
>>>> But, for now, I think we should have a real index for the
>>>> statistics data because we already have several index storages,
>>>> and it will allow us to minimize read/write operations.
>>>>
>>>> BTW, what kind of index would be preferred for this purpose?
>>>> btree or hash?
>>>
>>> I find it hard to get excited about using the AM interface for
>>> this purpose.  To me it makes a lot more sense to have separate,
>>> much simpler code.  We don't need any transactionality, user
>>> defined types, user defined operators, or anything like that.
>>
>> +1.
>>
>> But, would not rewriting a lot of existing functionalities
>> potentially lead to points of contention and/or much more effort?
>
> Don't forget the stats are written only by the postmaster, all the
> regular backends only read it (and eventually send updates back). But
> yes, contention might be a problem, because there will have to be some
> kind of locking that is not needed now when the postmaster is writing
> fresh copy into a new file.
>
> But I think we need to implement something and then measure this.
> Because it might even with the contention the overall performance might
> actually improve.
>
> I'd vote to try a simple approach first - adding some simple array
> 'index' allowing fast access to particular records etc. At least that
> was my plan. But feel free to implement something more advanced (e.g. a
> BTree storage) and we can compare the results.
>
>
+1 on the simple implementation and measure approach.

My focus here is to identify what kind of queries we expect to be serving from the stats.I think someone mentioned
rangequeries upthread. I feel we should be looking at range trees as secondary index, if not the primary storage for
pgstat.

Regards,

Atri

Re: [rfc] overhauling pgstat.stat

От

Atri Sharma

Дата:

08 сентября 2013 г., 05:22:16


Sent from my iPad

On 08-Sep-2013, at 4:27, Tomas Vondra <tv@fuzzy.cz> wrote:

> On 5.9.2013 09:36, Atri Sharma wrote:
>> On Thu, Sep 5, 2013 at 10:59 AM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>>> Satoshi Nagayasu wrote:
>>>
>>>> But, for now, I think we should have a real index for the
>>>> statistics data because we already have several index storages,
>>>> and it will allow us to minimize read/write operations.
>>>>
>>>> BTW, what kind of index would be preferred for this purpose?
>>>> btree or hash?
>>>
>>> I find it hard to get excited about using the AM interface for
>>> this purpose.  To me it makes a lot more sense to have separate,
>>> much simpler code.  We don't need any transactionality, user
>>> defined types, user defined operators, or anything like that.
>>
>> +1.
>>
>> But, would not rewriting a lot of existing functionalities
>> potentially lead to points of contention and/or much more effort?
>
> Don't forget the stats are written only by the postmaster, all the
> regular backends only read it (and eventually send updates back). But
> yes, contention might be a problem, because there will have to be some
> kind of locking that is not needed now when the postmaster is writing
> fresh copy into a new file.
>
> But I think we need to implement something and then measure this.
> Because it might even with the contention the overall performance might
> actually improve.
>
> I'd vote to try a simple approach first - adding some simple array
> 'index' allowing fast access to particular records etc. At least that
> was my plan. But feel free to implement something more advanced (e.g. a
> BTree storage) and we can compare

Another thing I would want to explore is if we could somehow prioritise the more frequently accessed records to reduce
theiraccess times even more.I am thinking on the lines of self organising lists.I am not sure if and how it would be
possibleto implement this idea of self organising in BTree or any other tree though. 

Regards,

Atri

Re: [rfc] overhauling pgstat.stat

От

Jeff Janes

Дата:

08 сентября 2013 г., 21:04:25

On Tue, Sep 3, 2013 at 10:09 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
> Hi,
>
>
> (2013/09/04 13:07), Alvaro Herrera wrote:
>>
>> Satoshi Nagayasu wrote:
>>
>>> As you may know, this file could be handreds of MB in size, because
>>> pgstat.stat holds all access statistics in each database, and it needs
>>> to read/write an entire pgstat.stat frequently.
>>>
>>> As a result, pgstat.stat often generates massive I/O operation,
>>> particularly when having a large number of tables in the database.
>>
>>
>> We already changed it:
>
>>
>> commit 187492b6c2e8cafc5b39063ca3b67846e8155d24
>> Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
>> Date:   Mon Feb 18 17:56:08 2013 -0300
>>
>>      Split pgstat file in smaller pieces
>
> Thanks for the comments. I forgot to mention that.
>
> Yes, we have already split single pgstat.stat file into
> several pieces.
>
> However, we still need to read/write large amount of statistics
> data when we have a large number of tables in single database
> or multiple databases being accessed. Right?

Do you have a test case for measuring this?  I vaguely remember from
when I was testing the split patch, that I thought that after that
improvement the load that was left was so low that there was little
point in optimizing it further.

Cheers,

Jeff

Re: [rfc] overhauling pgstat.stat

От

Tomas Vondra

Дата:

08 сентября 2013 г., 23:19:47

On 8.9.2013 23:04, Jeff Janes wrote:
> On Tue, Sep 3, 2013 at 10:09 PM, Satoshi Nagayasu <snaga@uptime.jp> 
> wrote:
>> Hi,
>> 
>> 
>> (2013/09/04 13:07), Alvaro Herrera wrote:
>>> 
>>> Satoshi Nagayasu wrote:
>>> 
>>>> As you may know, this file could be handreds of MB in size, 
>>>> because pgstat.stat holds all access statistics in each 
>>>> database, and it needs to read/write an entire pgstat.stat 
>>>> frequently.
>>>> 
>>>> As a result, pgstat.stat often generates massive I/O operation,
>>>> particularly when having a large number of tables in the
>>>> database.
>>> 
>>> 
>>> We already changed it:
>> 
>>> 
>>> commit 187492b6c2e8cafc5b39063ca3b67846e8155d24 Author: Alvaro 
>>> Herrera <alvherre@alvh.no-ip.org> Date:   Mon Feb 18 17:56:08 
>>> 2013 -0300
>>> 
>>> Split pgstat file in smaller pieces
>> 
>> Thanks for the comments. I forgot to mention that.
>> 
>> Yes, we have already split single pgstat.stat file into several 
>> pieces.
>> 
>> However, we still need to read/write large amount of statistics 
>> data when we have a large number of tables in single database or 
>> multiple databases being accessed. Right?
> 
> Do you have a test case for measuring this?  I vaguely remember from
>  when I was testing the split patch, that I thought that after that 
> improvement the load that was left was so low that there was little 
> point in optimizing it further.

This is actually a pretty good point. Creating a synthetic test case is
quite simple - just create 1.000.000 tables in a single database, but
I'm wondering if it's actually realistic. Do we have a real-world
example where the current "one stat file per db" is not enough?

The reason why I worked on the split patch is that our application is
slightly crazy and creates a lot of tables (+ indexes) on the fly, and
as we have up to a thousand databases on each host, we often ended up
with a huge stat file.

Splitting the stat file improved that considerably, although that's
partially because we have the stats on a tmpfs, so I/O is not a problem,
and the CPU overhead is negligible thanks to splitting the stats per
database.

But AFAIK there are operating systems where creating a filesystem in RAM
is not that simple - e.g. Windows. In such cases even a moderate number
of objects may be a significant issue I/O-wise. But then again, I can't
really think of reasonable a system creating that many objects in a
single database (except for e.g. a shared database using schemas instead
of databases).

Tomas

Re: [rfc] overhauling pgstat.stat

От

Satoshi Nagayasu

Дата:

09 сентября 2013 г., 07:38:04


(2013/09/09 8:19), Tomas Vondra wrote:
> On 8.9.2013 23:04, Jeff Janes wrote:
>> On Tue, Sep 3, 2013 at 10:09 PM, Satoshi Nagayasu <snaga@uptime.jp>
>> wrote:
>>> Hi,
>>>
>>>
>>> (2013/09/04 13:07), Alvaro Herrera wrote:
>>>>
>>>> Satoshi Nagayasu wrote:
>>>>
>>>>> As you may know, this file could be handreds of MB in size,
>>>>> because pgstat.stat holds all access statistics in each
>>>>> database, and it needs to read/write an entire pgstat.stat
>>>>> frequently.
>>>>>
>>>>> As a result, pgstat.stat often generates massive I/O operation,
>>>>> particularly when having a large number of tables in the
>>>>> database.
>>>>
>>>>
>>>> We already changed it:
>>>
>>>>
>>>> commit 187492b6c2e8cafc5b39063ca3b67846e8155d24 Author: Alvaro
>>>> Herrera <alvherre@alvh.no-ip.org> Date:   Mon Feb 18 17:56:08
>>>> 2013 -0300
>>>>
>>>> Split pgstat file in smaller pieces
>>>
>>> Thanks for the comments. I forgot to mention that.
>>>
>>> Yes, we have already split single pgstat.stat file into several
>>> pieces.
>>>
>>> However, we still need to read/write large amount of statistics
>>> data when we have a large number of tables in single database or
>>> multiple databases being accessed. Right?
>>
>> Do you have a test case for measuring this?  I vaguely remember from
>>   when I was testing the split patch, that I thought that after that
>> improvement the load that was left was so low that there was little
>> point in optimizing it further.
>
> This is actually a pretty good point. Creating a synthetic test case is
> quite simple - just create 1.000.000 tables in a single database, but
> I'm wondering if it's actually realistic. Do we have a real-world
> example where the current "one stat file per db" is not enough?

I have several assumptions for that.

- Single shared database contains thousands of customers.
- Each customer has hundreds of tables and indexes.
- Customers are separated by schemas (namespaces) in single database.
- Application server uses connection pooling for performance reason.
- Workload (locality in the table access) can not be predicted.

Looks reasonable?

> The reason why I worked on the split patch is that our application is
> slightly crazy and creates a lot of tables (+ indexes) on the fly, and
> as we have up to a thousand databases on each host, we often ended up
> with a huge stat file.
>
> Splitting the stat file improved that considerably, although that's
> partially because we have the stats on a tmpfs, so I/O is not a problem,
> and the CPU overhead is negligible thanks to splitting the stats per
> database.

I agree that splitting a single large database into several pieces,
like thousands of tiny databases, could be an option in some cases.

However, what I intend here is eliminating those limitations on
database design.

In fact, when considering connection pooling, splitting a database
is not a good idea, because AFAIK, many connection poolers manage
connections per database.

So, I'd like to support 100k tables in single database.

Any comments?

Regards,

> But AFAIK there are operating systems where creating a filesystem in RAM
> is not that simple - e.g. Windows. In such cases even a moderate number
> of objects may be a significant issue I/O-wise. But then again, I can't
> really think of reasonable a system creating that many objects in a
> single database (except for e.g. a shared database using schemas instead
> of databases).
>
> Tomas
>
>

-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

Re: [rfc] overhauling pgstat.stat

От

Robert Haas

Дата:

09 сентября 2013 г., 16:50:38

On Sat, Sep 7, 2013 at 6:57 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> Don't forget the stats are written only by the postmaster, all the
> regular backends only read it (and eventually send updates back).

The postmaster, or the stats collector?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [rfc] overhauling pgstat.stat

От

"Tomas Vondra"

Дата:

09 сентября 2013 г., 18:07:13

On 9 Září 2013, 18:50, Robert Haas wrote:
> On Sat, Sep 7, 2013 at 6:57 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> Don't forget the stats are written only by the postmaster, all the
>> regular backends only read it (and eventually send updates back).
>
> The postmaster, or the stats collector?

Stats collector, of course. I meant to point out that the write activity
comes from a single dedicated process, which may not be that obvious, and
I somehow managed to name the incorrect one.

Tomas

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [rfc] overhauling pgstat.stat