Proposal for disk quota feature

Поиск
Список
Период
Сортировка
От Hubert Zhang
Тема Proposal for disk quota feature
Дата
Msg-id CAB0yre=w_E2c=N6oz1t=hxtC85r0QrEBeFFrxJZw1Jw7bp2uWg@mail.gmail.com
обсуждение исходный текст
Ответы Re: Proposal for disk quota feature
Список pgsql-hackers
Hi all,
We want to introduce disk quota feature into Postgres.

Why disk quota
In a multi-tenant environment, there is a requirement to limit the disk quota that database/schema/table can be written or a user can consume for different organizations.
Meanwhile, other databases such as Oracle, Teradata, DB2 have already supported disk quota feature.

Heikki has already implemented disk quota feature in Postgres as an extension pg_quota. We plan to enhance disk quota feature based on Heikki's implementation.

Scope

The scope of disk quota feature is to limit the disk usage of database/schema/table objects and users.

Here table means heap table, ao table, index table, partition table and associated table( toast table, visible table, large object etc.). Schema disk quota is the disk quota of all the tables in this schema. Database disk quota is the disk quota of all the tables in this database.

User's quota is the size of all the tables whose owner are this user.Out of Scope: Note that spill files, xlogs, clogs and logs are not considered for database object level disk quota at this stage.



Design
We propose disk quota with the following components:

1. Quota Setting Store is where the disk quota setting to be stored and accessed. DBA or object owner uses SQL queries to configure the disk quota for each database objects.


2. Quota Change Detector is the monitor of size change of database objects in Postgres. It will write change information to shared memory to notify Quota Size Checker. The implementation of Change Detector could be hooks in smgr_extend/smgr_unlink/smgr_truncate when modifying the size of a heap table. The hooks will write to shared memory in a batched way to reduce the impact on OLTP performance.

3. Quota Size Checker is implemented as a worker process. It maintains the current disk usage for each database objects and users, and compare them with settings in Quota Setting Store. If it detects the disk usage hit the quota redzone(either upon or below), it will notify Quota Enforcement Operator.


4. Quota Enforcement Operator has two roles: one is to check the disk quota permission before queries are executed(QueryBeforeRun Check), the other is to cancel the running queries when it reaches the disk quota limit dynamically(QueryRunning Check). Quota Enforcement Operator uses shared memory to store the enforcement information to guarantee a quick check.



To implement the right proof of concept, we want to receive feedback from the community from the following aspects:
Q1. User Interface: when setting a disk quota, 
    Option 1 is to use insert into quota.config(user1, 10G)
    Option 2 is to use UDF select set_quota("role","user1",10G)
    Option 3 is to use native SQL syntax create disk quota on ROLE user1 10G, or create disk quota on SCHEMA s1 25G;
Q2. Quota Setting Store using user table or catalog?
    Option 1 is to create a schema called quota for each database and write quota settings into quota.config table, only DBA could modify it. This corresponds to Q1.option1
    Option 2 is to store quota settings into the catalog. For Schema and Table write them to database level catalog. For database or user, write them to either database level or global catalog.  

I personally prefer Q1's option3 and Q2's option2, since it makes disk quota more like a native feature. We could support the quota worker process implementation as an extension for now, but in the long term, disk quota is like a fundamental feature of a database and should be a native feature just like other databases. So store quota conf into catalog and supply native syntax is better.

Open Problem
We prepare to implement Quota Size Checker as a worker process. Worker process needs to connect to a database to build the disk usage map and quota map(e.g. a user/shcema's disk usage on a given database) But one worker process can only bind to one database(InitPostgres(dbname)) at the initialization stage. It results in we need a separate worker process for each database. The solution to this problem is not straightforward, here are some ideas:
1 To make worker process could retrieve and cache information from all the databases.  As Tom Lane pointed out that it needs to flush all the database specific thing, like relcache syscache etc.
2 Keep one worker process for each database. But using a parent/global quota worker process to manage the lifecycle of database level worker processes. It could handle the newly created database(avoid restart database) and save resource when a database is not used. But this needs to change worker process to be hierarchical. Postmaster becomes the grandfather of database level worker processes in this case.

Any better ideas on it?



--
Thanks

Hubert Zhang

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Startup cost of sequential scan
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Dimension limit in contrib/cube (dump/restore hazard?)