Re: Proposal for disk quota feature

Поиск
Список
Период
Сортировка
От Hubert Zhang
Тема Re: Proposal for disk quota feature
Дата
Msg-id CAB0yrenncDYBD624HSDKhKqZYTwpXPetjy7WKFRuDLYb1YoB3g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Proposal for disk quota feature  (Pavel Stehule <pavel.stehule@gmail.com>)
Ответы Re: Proposal for disk quota feature  (Pavel Stehule <pavel.stehule@gmail.com>)
Список pgsql-hackers
But it looks like redundant to current GUC configuration and limits
what do you mean by current GUC configuration? Is that the general block number limit in your patch? If yes, the difference between GUC and pg_diskquota catalog is that pg_diskquota will store different quota limit for the different role, schema or table instead of a single GUC value.

On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:
just fast reaction - why QUOTA object?
Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a separate pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class, pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also store in pg_diskquota and vice versa.)

I have not a problem with new special table for storing this information. But it looks like redundant to current GUC configuration and limits. Can be messy do some work with ALTER ROLE, and some work via CREATE QUOTE.

Regards

Pavel


Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to tracker the user's disk usage in the whole clusters(considering 1000+ databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota limit without adding too many fields in pg_class, e.g. red zone to give the user a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

Hi all,

We redesign disk quota feature based on the comments from Pavel Stehule and Chapman Flack. Here are the new design.

Overview

Basically,  disk quota feature is used to support multi-tenancy environment, different level of database objects could be set a quota limit to avoid over use of disk space. A common case could be as follows: DBA could enable disk quota on a specified database list. DBA could set disk quota limit for tables/schemas/roles in these databases. Separate disk quota worker process will monitor the disk usage for these objects and detect the objects which exceed their quota limit. Queries loading data into these “out of disk quota” tables/schemas/roles will be cancelled.

We are currently working at init implementation stage. We would like to propose our idea firstly and get feedbacks from community to do quick iteration.

SQL Syntax (How to use disk quota)

1 Specify the databases with disk quota enabled in GUC “diskquota_databases” in postgresql.conf and restart the database.

2 DBA could set disk quota limit for table/schema/role.

CREATE DISK QUOTA tablea1 ON TABLE a1 with (quota = ‘1MB’);

CREATE DISK QUOTA roleu1 ON USER u1 with (quota = ‘1GB’);

CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota = ‘3MB’);


just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .



3 Simulate a schema out of quota limit case: suppose table a1 and table a2 are both under schema s1.

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT INTO a2 SELECT generate_series(1,3000000);

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

ERROR:  schema's disk space quota exceeded

DROP TABLE a2;

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT 0 1000

Architecture

Disk quota has the following components.

1. Quota Setting Store is where the disk quota setting to be stored and accessed. We plan to use catalog table pg_diskquota to store these information. pg_diskquota is like:

CATALOG(pg_diskquota,6122,DiskQuotaRelationId)

{

NameData quotaname; /* diskquota name */       

int16 quotatype; /* diskquota type name */

Oid quotatargetoid; /* diskquota target db object oid*/

int32 quotalimit; /* diskquota size limit in MB*/

int32 quotaredzone; /* diskquota redzone in MB*/

} FormData_pg_diskquota;

2. Quota Change Detector is the monitor of size change of database objects. We plan to use stat collector to detect the ‘active’ table list at initial stage. But stat collector has some limitation on finding the active table which is in a running transaction. Details see TODO section.

3. Quota Size Checker is where to calculate the size and compare with quota limit for database objects. According to Pavel’s comment, autovacuum launcher and worker process could be a good reference to disk quota. So we plan to use a disk quota launcher daemon process and several disk quota worker process to finish this work. Launcher process is responsible for starting worker process based on a user defined database list from GUC. Worker process will connect to its target database and monitor the disk usage for objects in this database. In init stage of worker process, it will call calculate_total_relation_size() to calculate the size for each user table. After init stage, worker process will refresh the disk model every N seconds. Refreshing will only recalculate the size of tables in ‘active’ table list, which is generated by Quata Change Detector to minimize the cost.

4. Quota Enforcement Operator is where to check for the quota limitation at postgres backend side. We will firstly implement it in ExecCheckRTPerms() as pre-running enforcement. It will check the disk quota of tables being inserted or updated, and report error if table’s or table’s schema’s or table’s owner’s quota limit is exceeded. As a native feature, we plan to add more checkpoint to do running query enforcement. For example, if a disk quota lefts 10MB quota, a query could insert 1GB data. This query could be allowed in pre-running enforcement check, but will be cancelled in running query enforcement check. Therefore, it can improve the accurate of disk quota usage. To achieve this, we plan to add a checkpoint in lower API such as smgr_extened. Hence, the Quota Enforcement Operator will check the disk quota usage when smgr_extened is called. If the quota is over limited, current query will be cancelled.

Highlight

1. Native feature.

Support native Create/Drop Disk Quota SQL statement.

New catalog table pg_diskquota to store disk quota setting.

2. Auto DML/DDL detection.

Table create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,  Schema create/drop and Role create/drop will be detected by disk quota automatically.

3. Low cost disk quota checker.

Worker process of disk quota need to refresh the disk usage model every N seconds. Since recalculate the file size using stat() system call is expensive for a large number of files, we use an ‘active’ table list to reduce the real work at each iteration. A basic experiment on our init stage implementation on database with 20K tables shows that the refresh cost is 1% cpu usage and will be finished within 50ms.


Todo/Limitation

Before we propose our patch, we plan to enhance it with the following ideas:

1. Setting database list with disk quota enabled dynamically without restart database. Since we have the disk quota launcher process, it could detect the new ‘diskquota_databases’ list and start/stop the corresponding disk quota worker process.

2. Enforcement when query is running. Considering the case when there is 10MB quota left, but next query will insert 10GB data. Current enforcement design will allow this query to be executed. This is limited by the ‘active’ table detection is generated by stat collector. Postgres backend will only send table stat information to collector only when the transaction ends. We need a new way to detect the ‘active’ table even when this table is being modified inside a running transaction.

3. Monitor unlimited number of databases. Current we set the max number of disk quota worker process to be 10 to reduce the affection normal workload. But how about  if we want to monitor the disk quota of more than 10 databases? Our solution is to let disk quota launcher to manage a queue of database need to be monitored. And disk quota worker process consuming the queue and refresh the disk usage/quota for this database. After some periods, worker will return the database to the queue, and fetch the top database from queue to process. The period determine the delay of detecting disk quota change. To implement this feature, we need to support a subprocess of postmaster to rebind to another database instead of the database binded in InitPostgres().

4. Support active table detection on vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze are not tracked by stat collector.

Thanks to Heikki, Pavel Stehule,Chapman Flack for the former comments on disk quota feature. Any comments on how to improve disk quota feature are appreciated.


On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following objectives:
1 set/alter disk quota setting on different database objects, e.g. user, database, schema etc. not only a general GUC, but we could set separate quota limit for a specific objects. 
2 enforcement operator should work at two positions: before query is running and when query is running. The latter one's implementation maybe similar to your patch.

The patch was just example. The resource quotes should be more complex - per partition, table, schema, database, user - so GUC are possible, but not very user friendly.

Our case is specific, but not too much. The servers are used for multidimensional analyses - and some tables can grow too fast (COPY, INSERT SELECT). We need to solve limits immediately. The implementation is simple, so I did it. Same implementation on database level, or schema level needs some more locks, so it will not be too effective. The resource management can be complex very complex, and I expect so it will be hard work.

Regards

Pavel


On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

 
We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases)  
And you are right, we could share ideas with autovacuum process, fork worker processes in need. 
"autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility"
diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or extension as the POC? 


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



-Chap





--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Changing the setting of wal_sender_timeout per standby
Следующее
От: Edmund Horner
Дата:
Сообщение: Re: PATCH: psql tab completion for SELECT