Обсуждение: RFC: PostgreSQL Storage I/O Transformation Hooks
RFC: PostgreSQL Storage I/O Transformation Hooks
Infrastructure for a Technical Protocol Between RDBMS Core and Data Security Experts
Author: Henson Choi assam258@gmail.com
Date: 2025-12-28
PostgreSQL Version: master (Development)
1. Summary & Motivation
This RFC proposes the introduction of minimal hooks into the PostgreSQL storage layer and the addition of a Transformation ID field to the PageHeader.
A Diplomatic Protocol Between Expert Groups
The core motivation of this proposal is “Separation of Concerns and Mutual Respect.”
Historically, discussions around Transparent Data Encryption (TDE) have often felt like putting security experts on trial in a foreign court—specifically, the “Court of RDBMS.” It is time to treat them not as defendants to be judged by database-specific rules, but as an equal neighboring community with their own specialized sovereignty.
The issue has never been a failure of technology, but rather a misplacement of the focal point. While previous discussions were mired in the technicalities of “how to hardcode encryption into the core,” this proposal shifts the debate toward an architectural solution: “what interface the core should provide to external experts.”
- RDBMS Experts provide a trusted pipeline responsible for data I/O paths and consistency.
- Security Experts take responsibility for the specialized domain of encryption algorithms and key management.
This hook system functions as a Technical Protocol—a high-level agreement that allows these two expert groups to exchange data securely without encroaching on each other’s territory.
2. Design Principles
- Delegation of Authority: The core remains independent of specific encryption standards, providing a “free territory” where security experts can respond to an ever-changing security landscape.
- Diplomatic Convention: The Transformation ID acts as a communication protocol between the engine and the extension. The engine uses this ID to identify the state of the data and hands over control to the appropriate expert (the extension).
- Minimal Interference: Overhead is kept near zero when hooks are not in use, ensuring the native performance of the PostgreSQL engine.
3. Proposal Specifications
3.1 The Interface (Hook Points)
We allow intervention by security experts through five contact points along the I/O path:
- Read/Write Hooks:
mdread_post,mdwrite_pre,mdextend_pre(Transformation of the data area) - WAL Hooks:
xlog_insert_pre,xlog_decode_pre(Transformation of transaction logs)
3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of pd_flags to define the “Security State” of a page. This serves as a Status Message sent by the security expert to the engine, utilized for key versioning and as a migration marker.
4. Reference Implementation: contrib/test_tde
A Standard Code of Conduct for Security Experts
This reference implementation exists not as a commercial product, but to define the Standards of the Diplomatic Protocol that encryption/decryption experts must follow when entering the PostgreSQL domain.
- Deterministic IV Derivation: Demonstrates how to achieve cryptographic safety by trusting unique values provided by the engine (e.g., LSN).
- Critical Section Safety: Defines memory management regulations that security logic must follow within “Critical Sections” to maintain system stability.
- Hook Chaining: Demonstrates a cooperative structure that allows peaceful coexistence with other expert tools (e.g., compression, auditing).
5. Scope
- In-Scope: Backend hook infrastructure, Transformation ID field, and reference code demonstrating diplomatic protocol compliance.
- Out-of-Scope: Specific Key Management Systems (KMS), selection of specific cryptographic algorithms, and integration with external tools.
This proposal represents a strategic diplomatic choice: rather than the PostgreSQL core assuming all security responsibilities, it grants security experts a sovereign territory through extensions where they can perform at their best.
Following up on the RFC, I am submitting the initial patch set for the proposed infrastructure. These patches introduce a minimal hook-based protocol to allow extensions to handle data transformation, such as TDE, while keeping the PostgreSQL core independent of specific cryptographic implementations.
Implementation Details:
Hook Points in Storage I/O Path
The patch introduces five strategic hook points:
mdread_post_hook: Called after blocks are read from disk. The extension can reverse-transform data in place.
mdwrite_pre_hook & mdextend_pre_hook: Called before writing or extending blocks. These hooks return a pointer to transformed buffers.
xlog_insert_pre_hook & xlog_decode_pre_hook: Handle transformation for WAL records during insertion and replay.
Data Integrity and Checksum Protocol
To ensure robust error detection, the hooks follow a specific verification protocol:
On Write: The extension transforms the page, sets the Transform ID, then recalculates the checksum on the transformed data.
On Read: The extension verifies the on-disk checksum of the transformed data first. After reverse-transformation, it clears the Transform ID and recalculates the checksum for the plaintext data. This ensures corruption is detected regardless of the transformation state.
WAL Safety via XLR_BLOCK_ID_TRANSFORMED (251)
For WAL records, I have introduced a specific block ID (251) to mark transformed data. If the decryption extension is not loaded, the WAL reader will encounter this unknown block ID and fail-fast, preventing the system from incorrectly interpreting encrypted data as valid WAL records.
PageHeader Transform ID (5-bit)
I have allocated bits 3-7 of pd_flags in the PageHeader for a Transform ID. This allows the engine and extensions to identify the transformation state of a page (e.g., key versioning or algorithm type) without attempting decryption. It ensures backward compatibility: pages with Transform ID 0 are treated as standard untransformed pages.
Memory and Critical Section Safety
As demonstrated in the contrib/test_tde reference implementation, cipher contexts are pre-allocated in _PG_init to avoid memory allocation during critical sections. For WAL transformation, MemoryContextAllowInCriticalSection() is used to allow buffer reallocation within critical sections; if OOM occurs during buffer growth, it results in a controlled PANIC.
Performance Considerations
When hooks are not set (default), the overhead is limited to a single NULL pointer comparison per I/O operation. This is architecturally consistent with existing PostgreSQL hooks and is designed to have a negligible impact on performance.
Attached Patches:
v20251228-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patch: Core infrastructure.
v20251228-0002-Add-test_tde-extension-for-TDE-testing.patch: Reference implementation using AES-256-CTR.
I look forward to your comments and feedback.
Regards,
Henson Choi
RFC: PostgreSQL Storage I/O Transformation Hooks
Infrastructure for a Technical Protocol Between RDBMS Core and Data Security Experts
Author: Henson Choi assam258@gmail.com
Date: 2025-12-28
PostgreSQL Version: master (Development)
1. Summary & Motivation
This RFC proposes the introduction of minimal hooks into the PostgreSQL storage layer and the addition of a Transformation ID field to the
PageHeader.A Diplomatic Protocol Between Expert Groups
The core motivation of this proposal is “Separation of Concerns and Mutual Respect.”
Historically, discussions around Transparent Data Encryption (TDE) have often felt like putting security experts on trial in a foreign court—specifically, the “Court of RDBMS.” It is time to treat them not as defendants to be judged by database-specific rules, but as an equal neighboring community with their own specialized sovereignty.
The issue has never been a failure of technology, but rather a misplacement of the focal point. While previous discussions were mired in the technicalities of “how to hardcode encryption into the core,” this proposal shifts the debate toward an architectural solution: “what interface the core should provide to external experts.”
- RDBMS Experts provide a trusted pipeline responsible for data I/O paths and consistency.
- Security Experts take responsibility for the specialized domain of encryption algorithms and key management.
This hook system functions as a Technical Protocol—a high-level agreement that allows these two expert groups to exchange data securely without encroaching on each other’s territory.
2. Design Principles
- Delegation of Authority: The core remains independent of specific encryption standards, providing a “free territory” where security experts can respond to an ever-changing security landscape.
- Diplomatic Convention: The Transformation ID acts as a communication protocol between the engine and the extension. The engine uses this ID to identify the state of the data and hands over control to the appropriate expert (the extension).
- Minimal Interference: Overhead is kept near zero when hooks are not in use, ensuring the native performance of the PostgreSQL engine.
3. Proposal Specifications
3.1 The Interface (Hook Points)
We allow intervention by security experts through five contact points along the I/O path:
- Read/Write Hooks:
mdread_post,mdwrite_pre,mdextend_pre(Transformation of the data area)- WAL Hooks:
xlog_insert_pre,xlog_decode_pre(Transformation of transaction logs)3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of
pd_flagsto define the “Security State” of a page. This serves as a Status Message sent by the security expert to the engine, utilized for key versioning and as a migration marker.4. Reference Implementation:
contrib/test_tdeA Standard Code of Conduct for Security Experts
This reference implementation exists not as a commercial product, but to define the Standards of the Diplomatic Protocol that encryption/decryption experts must follow when entering the PostgreSQL domain.
- Deterministic IV Derivation: Demonstrates how to achieve cryptographic safety by trusting unique values provided by the engine (e.g., LSN).
- Critical Section Safety: Defines memory management regulations that security logic must follow within “Critical Sections” to maintain system stability.
- Hook Chaining: Demonstrates a cooperative structure that allows peaceful coexistence with other expert tools (e.g., compression, auditing).
5. Scope
- In-Scope: Backend hook infrastructure, Transformation ID field, and reference code demonstrating diplomatic protocol compliance.
- Out-of-Scope: Specific Key Management Systems (KMS), selection of specific cryptographic algorithms, and integration with external tools.
This proposal represents a strategic diplomatic choice: rather than the PostgreSQL core assuming all security responsibilities, it grants security experts a sovereign territory through extensions where they can perform at their best.
Вложения
v2:
- Added meson.build for test_tde extension
- Added test_tde to contrib/meson.build
Regards,
Henson Choi
Hello,
Following up on the RFC, I am submitting the initial patch set for the proposed infrastructure. These patches introduce a minimal hook-based protocol to allow extensions to handle data transformation, such as TDE, while keeping the PostgreSQL core independent of specific cryptographic implementations.
Implementation Details:
Hook Points in Storage I/O Path
The patch introduces five strategic hook points:
mdread_post_hook: Called after blocks are read from disk. The extension can reverse-transform data in place.
mdwrite_pre_hook & mdextend_pre_hook: Called before writing or extending blocks. These hooks return a pointer to transformed buffers.
xlog_insert_pre_hook & xlog_decode_pre_hook: Handle transformation for WAL records during insertion and replay.
Data Integrity and Checksum Protocol
To ensure robust error detection, the hooks follow a specific verification protocol:
On Write: The extension transforms the page, sets the Transform ID, then recalculates the checksum on the transformed data.
On Read: The extension verifies the on-disk checksum of the transformed data first. After reverse-transformation, it clears the Transform ID and recalculates the checksum for the plaintext data. This ensures corruption is detected regardless of the transformation state.
WAL Safety via XLR_BLOCK_ID_TRANSFORMED (251)
For WAL records, I have introduced a specific block ID (251) to mark transformed data. If the decryption extension is not loaded, the WAL reader will encounter this unknown block ID and fail-fast, preventing the system from incorrectly interpreting encrypted data as valid WAL records.
PageHeader Transform ID (5-bit)
I have allocated bits 3-7 of pd_flags in the PageHeader for a Transform ID. This allows the engine and extensions to identify the transformation state of a page (e.g., key versioning or algorithm type) without attempting decryption. It ensures backward compatibility: pages with Transform ID 0 are treated as standard untransformed pages.
Memory and Critical Section Safety
As demonstrated in the contrib/test_tde reference implementation, cipher contexts are pre-allocated in _PG_init to avoid memory allocation during critical sections. For WAL transformation, MemoryContextAllowInCriticalSection() is used to allow buffer reallocation within critical sections; if OOM occurs during buffer growth, it results in a controlled PANIC.
Performance Considerations
When hooks are not set (default), the overhead is limited to a single NULL pointer comparison per I/O operation. This is architecturally consistent with existing PostgreSQL hooks and is designed to have a negligible impact on performance.
Attached Patches:
v20251228-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patch: Core infrastructure.
v20251228-0002-Add-test_tde-extension-for-TDE-testing.patch: Reference implementation using AES-256-CTR.
I look forward to your comments and feedback.
Regards,
Henson Choi2025년 12월 28일 (일) PM 4:49, Henson Choi <assam258@gmail.com>님이 작성:RFC: PostgreSQL Storage I/O Transformation Hooks
Infrastructure for a Technical Protocol Between RDBMS Core and Data Security Experts
Author: Henson Choi assam258@gmail.com
Date: 2025-12-28
PostgreSQL Version: master (Development)
1. Summary & Motivation
This RFC proposes the introduction of minimal hooks into the PostgreSQL storage layer and the addition of a Transformation ID field to the
PageHeader.A Diplomatic Protocol Between Expert Groups
The core motivation of this proposal is “Separation of Concerns and Mutual Respect.”
Historically, discussions around Transparent Data Encryption (TDE) have often felt like putting security experts on trial in a foreign court—specifically, the “Court of RDBMS.” It is time to treat them not as defendants to be judged by database-specific rules, but as an equal neighboring community with their own specialized sovereignty.
The issue has never been a failure of technology, but rather a misplacement of the focal point. While previous discussions were mired in the technicalities of “how to hardcode encryption into the core,” this proposal shifts the debate toward an architectural solution: “what interface the core should provide to external experts.”
- RDBMS Experts provide a trusted pipeline responsible for data I/O paths and consistency.
- Security Experts take responsibility for the specialized domain of encryption algorithms and key management.
This hook system functions as a Technical Protocol—a high-level agreement that allows these two expert groups to exchange data securely without encroaching on each other’s territory.
2. Design Principles
- Delegation of Authority: The core remains independent of specific encryption standards, providing a “free territory” where security experts can respond to an ever-changing security landscape.
- Diplomatic Convention: The Transformation ID acts as a communication protocol between the engine and the extension. The engine uses this ID to identify the state of the data and hands over control to the appropriate expert (the extension).
- Minimal Interference: Overhead is kept near zero when hooks are not in use, ensuring the native performance of the PostgreSQL engine.
3. Proposal Specifications
3.1 The Interface (Hook Points)
We allow intervention by security experts through five contact points along the I/O path:
- Read/Write Hooks:
mdread_post,mdwrite_pre,mdextend_pre(Transformation of the data area)- WAL Hooks:
xlog_insert_pre,xlog_decode_pre(Transformation of transaction logs)3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of
pd_flagsto define the “Security State” of a page. This serves as a Status Message sent by the security expert to the engine, utilized for key versioning and as a migration marker.4. Reference Implementation:
contrib/test_tdeA Standard Code of Conduct for Security Experts
This reference implementation exists not as a commercial product, but to define the Standards of the Diplomatic Protocol that encryption/decryption experts must follow when entering the PostgreSQL domain.
- Deterministic IV Derivation: Demonstrates how to achieve cryptographic safety by trusting unique values provided by the engine (e.g., LSN).
- Critical Section Safety: Defines memory management regulations that security logic must follow within “Critical Sections” to maintain system stability.
- Hook Chaining: Demonstrates a cooperative structure that allows peaceful coexistence with other expert tools (e.g., compression, auditing).
5. Scope
- In-Scope: Backend hook infrastructure, Transformation ID field, and reference code demonstrating diplomatic protocol compliance.
- Out-of-Scope: Specific Key Management Systems (KMS), selection of specific cryptographic algorithms, and integration with external tools.
This proposal represents a strategic diplomatic choice: rather than the PostgreSQL core assuming all security responsibilities, it grants security experts a sovereign territory through extensions where they can perform at their best.
Вложения
RFC: PostgreSQL Storage I/O Transformation Hooks
Infrastructure for a Technical Protocol Between RDBMS Core and Data Security Experts
Author: Henson Choi assam258@gmail.com
Date: 2025-12-28
PostgreSQL Version: master (Development)
1. Summary & Motivation
This RFC proposes the introduction of minimal hooks into the PostgreSQL storage layer and the addition of a Transformation ID field to the
PageHeader.A Diplomatic Protocol Between Expert Groups
The core motivation of this proposal is “Separation of Concerns and Mutual Respect.”
Historically, discussions around Transparent Data Encryption (TDE) have often felt like putting security experts on trial in a foreign court—specifically, the “Court of RDBMS.” It is time to treat them not as defendants to be judged by database-specific rules, but as an equal neighboring community with their own specialized sovereignty.
The issue has never been a failure of technology, but rather a misplacement of the focal point. While previous discussions were mired in the technicalities of “how to hardcode encryption into the core,” this proposal shifts the debate toward an architectural solution: “what interface the core should provide to external experts.”
- RDBMS Experts provide a trusted pipeline responsible for data I/O paths and consistency.
- Security Experts take responsibility for the specialized domain of encryption algorithms and key management.
This hook system functions as a Technical Protocol—a high-level agreement that allows these two expert groups to exchange data securely without encroaching on each other’s territory.
2. Design Principles
- Delegation of Authority: The core remains independent of specific encryption standards, providing a “free territory” where security experts can respond to an ever-changing security landscape.
- Diplomatic Convention: The Transformation ID acts as a communication protocol between the engine and the extension. The engine uses this ID to identify the state of the data and hands over control to the appropriate expert (the extension).
- Minimal Interference: Overhead is kept near zero when hooks are not in use, ensuring the native performance of the PostgreSQL engine.
3. Proposal Specifications
3.1 The Interface (Hook Points)
We allow intervention by security experts through five contact points along the I/O path:
- Read/Write Hooks:
mdread_post,mdwrite_pre,mdextend_pre(Transformation of the data area)- WAL Hooks:
xlog_insert_pre,xlog_decode_pre(Transformation of transaction logs)3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of
pd_flagsto define the “Security State” of a page. This serves as a Status Message sent by the security expert to the engine, utilized for key versioning and as a migration marker.4. Reference Implementation:
contrib/test_tdeA Standard Code of Conduct for Security Experts
This reference implementation exists not as a commercial product, but to define the Standards of the Diplomatic Protocol that encryption/decryption experts must follow when entering the PostgreSQL domain.
- Deterministic IV Derivation: Demonstrates how to achieve cryptographic safety by trusting unique values provided by the engine (e.g., LSN).
- Critical Section Safety: Defines memory management regulations that security logic must follow within “Critical Sections” to maintain system stability.
- Hook Chaining: Demonstrates a cooperative structure that allows peaceful coexistence with other expert tools (e.g., compression, auditing).
5. Scope
- In-Scope: Backend hook infrastructure, Transformation ID field, and reference code demonstrating diplomatic protocol compliance.
- Out-of-Scope: Specific Key Management Systems (KMS), selection of specific cryptographic algorithms, and integration with external tools.
This proposal represents a strategic diplomatic choice: rather than the PostgreSQL core assuming all security responsibilities, it grants security experts a sovereign territory through extensions where they can perform at their best.
I wonder if instead of support a lot of extra hooks it will be better to provide extensible SMGR API:
https://www.postgresql.org/message-id/flat/CAPP%3DHha_wV1MV9yR70QZ5pk5dtNP%2BbOyBiFxPmrMKqnQeKMAwQ%40mail.gmail.com#ab0da3412525c7501ea17f3d4c602bbf
It seems to be much more straightforward, convenient and flexible mechanism than adding hooks, which can be used for many other purposes except transparent encryption.
Hello! I am glad to see that there are multiple TDE extension proposals being worked on. For context, I am one of the developers working on the pg_tde[1] extension, as well as on the extensible SMGR proposal that Konstantin already linked. This patch/proposal contains two distinct parts of encryption/extensibility, WAL and buffer manager/table data. Based on earlier discussions, the opinions of adding extension points to these two are quite different, and because of that I'm not sure if bundling them together is helpful. It also appears to be missing some extension points that would be required for a more complete encryption solution, such as encrypting temporary files or system tables, or handling command-line utilities like pg_waldump. Do you have ideas or patches in mind for those areas as well? I have the same question as Konstantin, why did you choose custom hooks for the buffer manager instead of the already existing smgr interface / extensibility patch? While that patch is not part of the core (but I hope it will be), it is already used by multiple companies as it supports other use cases, not only encryption. We plan to focus more on that thread early next year, we would appreciate any feedback/suggestions that could make it better for others. I also noticed that you added additional flags to the page header. Initially we were thinking about something like this, but decided that the fork files are better for any encryption (or other storage related) extra data. These few bits try to be generic, while also restrictive because of the limited amount of data. (and that data is specifically per page, if I want something per file or per page range, I still need a custom solution) Regarding the WAL encryption part, we took a completely different approach, similar to how we handle normal table data (page-based). I will need to think more about this before I can provide meaningful feedback on that part of the patch. One initial question, however, is whether you have run detailed benchmarks with different workloads. That seems to be the trickiest part there, since most of the code runs in a critical section. (Not the "unused"/"empty hook" path, but the overhead caused by a real encryption plugin using this hook in practice) [1]: https://github.com/percona/pg_tde
Hi,
Here is v3 of the Storage I/O Transform Hooks patch.
Changes from v2:
- Fix -Wincompatible-pointer-types error in bufmgr.c by casting
&bufdata to (void **) for mdread_post_hook call
v2 changes were:
- Add meson.build test configuration for test_tde extension
--
Best regards,
Sungkyun Park
Updated patches with meson build support:
v2:
- Added meson.build for test_tde extension
- Added test_tde to contrib/meson.build
Regards,
Henson Choi2025년 12월 28일 (일) PM 6:47, Henson Choi <assam258@gmail.com>님이 작성:Hello,
Following up on the RFC, I am submitting the initial patch set for the proposed infrastructure. These patches introduce a minimal hook-based protocol to allow extensions to handle data transformation, such as TDE, while keeping the PostgreSQL core independent of specific cryptographic implementations.
Implementation Details:
Hook Points in Storage I/O Path
The patch introduces five strategic hook points:
mdread_post_hook: Called after blocks are read from disk. The extension can reverse-transform data in place.
mdwrite_pre_hook & mdextend_pre_hook: Called before writing or extending blocks. These hooks return a pointer to transformed buffers.
xlog_insert_pre_hook & xlog_decode_pre_hook: Handle transformation for WAL records during insertion and replay.
Data Integrity and Checksum Protocol
To ensure robust error detection, the hooks follow a specific verification protocol:
On Write: The extension transforms the page, sets the Transform ID, then recalculates the checksum on the transformed data.
On Read: The extension verifies the on-disk checksum of the transformed data first. After reverse-transformation, it clears the Transform ID and recalculates the checksum for the plaintext data. This ensures corruption is detected regardless of the transformation state.
WAL Safety via XLR_BLOCK_ID_TRANSFORMED (251)
For WAL records, I have introduced a specific block ID (251) to mark transformed data. If the decryption extension is not loaded, the WAL reader will encounter this unknown block ID and fail-fast, preventing the system from incorrectly interpreting encrypted data as valid WAL records.
PageHeader Transform ID (5-bit)
I have allocated bits 3-7 of pd_flags in the PageHeader for a Transform ID. This allows the engine and extensions to identify the transformation state of a page (e.g., key versioning or algorithm type) without attempting decryption. It ensures backward compatibility: pages with Transform ID 0 are treated as standard untransformed pages.
Memory and Critical Section Safety
As demonstrated in the contrib/test_tde reference implementation, cipher contexts are pre-allocated in _PG_init to avoid memory allocation during critical sections. For WAL transformation, MemoryContextAllowInCriticalSection() is used to allow buffer reallocation within critical sections; if OOM occurs during buffer growth, it results in a controlled PANIC.
Performance Considerations
When hooks are not set (default), the overhead is limited to a single NULL pointer comparison per I/O operation. This is architecturally consistent with existing PostgreSQL hooks and is designed to have a negligible impact on performance.
Attached Patches:
v20251228-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patch: Core infrastructure.
v20251228-0002-Add-test_tde-extension-for-TDE-testing.patch: Reference implementation using AES-256-CTR.
I look forward to your comments and feedback.
Regards,
Henson Choi2025년 12월 28일 (일) PM 4:49, Henson Choi <assam258@gmail.com>님이 작성:RFC: PostgreSQL Storage I/O Transformation Hooks
Infrastructure for a Technical Protocol Between RDBMS Core and Data Security Experts
Author: Henson Choi assam258@gmail.com
Date: 2025-12-28
PostgreSQL Version: master (Development)
1. Summary & Motivation
This RFC proposes the introduction of minimal hooks into the PostgreSQL storage layer and the addition of a Transformation ID field to the
PageHeader.A Diplomatic Protocol Between Expert Groups
The core motivation of this proposal is “Separation of Concerns and Mutual Respect.”
Historically, discussions around Transparent Data Encryption (TDE) have often felt like putting security experts on trial in a foreign court—specifically, the “Court of RDBMS.” It is time to treat them not as defendants to be judged by database-specific rules, but as an equal neighboring community with their own specialized sovereignty.
The issue has never been a failure of technology, but rather a misplacement of the focal point. While previous discussions were mired in the technicalities of “how to hardcode encryption into the core,” this proposal shifts the debate toward an architectural solution: “what interface the core should provide to external experts.”
- RDBMS Experts provide a trusted pipeline responsible for data I/O paths and consistency.
- Security Experts take responsibility for the specialized domain of encryption algorithms and key management.
This hook system functions as a Technical Protocol—a high-level agreement that allows these two expert groups to exchange data securely without encroaching on each other’s territory.
2. Design Principles
- Delegation of Authority: The core remains independent of specific encryption standards, providing a “free territory” where security experts can respond to an ever-changing security landscape.
- Diplomatic Convention: The Transformation ID acts as a communication protocol between the engine and the extension. The engine uses this ID to identify the state of the data and hands over control to the appropriate expert (the extension).
- Minimal Interference: Overhead is kept near zero when hooks are not in use, ensuring the native performance of the PostgreSQL engine.
3. Proposal Specifications
3.1 The Interface (Hook Points)
We allow intervention by security experts through five contact points along the I/O path:
- Read/Write Hooks:
mdread_post,mdwrite_pre,mdextend_pre(Transformation of the data area)- WAL Hooks:
xlog_insert_pre,xlog_decode_pre(Transformation of transaction logs)3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of
pd_flagsto define the “Security State” of a page. This serves as a Status Message sent by the security expert to the engine, utilized for key versioning and as a migration marker.4. Reference Implementation:
contrib/test_tdeA Standard Code of Conduct for Security Experts
This reference implementation exists not as a commercial product, but to define the Standards of the Diplomatic Protocol that encryption/decryption experts must follow when entering the PostgreSQL domain.
- Deterministic IV Derivation: Demonstrates how to achieve cryptographic safety by trusting unique values provided by the engine (e.g., LSN).
- Critical Section Safety: Defines memory management regulations that security logic must follow within “Critical Sections” to maintain system stability.
- Hook Chaining: Demonstrates a cooperative structure that allows peaceful coexistence with other expert tools (e.g., compression, auditing).
5. Scope
- In-Scope: Backend hook infrastructure, Transformation ID field, and reference code demonstrating diplomatic protocol compliance.
- Out-of-Scope: Specific Key Management Systems (KMS), selection of specific cryptographic algorithms, and integration with external tools.
This proposal represents a strategic diplomatic choice: rather than the PostgreSQL core assuming all security responsibilities, it grants security experts a sovereign territory through extensions where they can perform at their best.
Вложения
Hi Konstantin,
I have great respect for the work being done on the extensible SMGR API.
It is a powerful tool for use cases that require replacing the entire
storage layer (like Neon's architecture).
However, I believe we should distinguish between Storage Management
(where/how data is stored) and Data Transformation (what the data looks
like). I see a strong case for both approaches to coexist for the
following practical reasons:
1. Separation of Concerns and Safety
Is it reasonable to ask cryptography experts to clone the entire SMGR
implementation and maintain code they don't fully understand just to
insert encryption logic? If an extension developer clones md.c to add
encryption, they become responsible for the fundamental integrity of
PostgreSQL's file I/O. Any bug in their cloned storage logic could lead
to data loss unrelated to encryption itself.
2. The Maintenance Debt of "Cloning"
When md.c receives critical security patches or bug fixes in the core,
every TDE extension maintainer would need to manually backport those
changes to their specific SMGR implementation. This creates a fragmented
ecosystem where security extensions might actually introduce storage
vulnerabilities by running outdated cloned logic.
3. Minimalist Integration
The hook approach allows crypto experts to focus strictly on transform()
and reverse_transform(). The complex storage orchestration remains with
the PostgreSQL core where it is most rigorously tested. This is a cleaner
separation of responsibilities: the core provides the trusted pipeline,
and the extension provides the specialized transformation.
Conclusion:
I believe these hooks provide a "low-barrier, high-safety" path for data
transformation that the SMGR API—by its very nature of being a full
replacement—cannot easily provide. Let's provide the SMGR for those who
want to reinvent the storage, and hooks for those who simply want to
secure the data.
Best regards,
Henson Choi
On 28/12/2025 9:49 AM, Henson Choi wrote:RFC: PostgreSQL Storage I/O Transformation Hooks
Infrastructure for a Technical Protocol Between RDBMS Core and Data Security Experts
Author: Henson Choi assam258@gmail.com
Date: 2025-12-28
PostgreSQL Version: master (Development)
1. Summary & Motivation
This RFC proposes the introduction of minimal hooks into the PostgreSQL storage layer and the addition of a Transformation ID field to the
PageHeader.A Diplomatic Protocol Between Expert Groups
The core motivation of this proposal is “Separation of Concerns and Mutual Respect.”
Historically, discussions around Transparent Data Encryption (TDE) have often felt like putting security experts on trial in a foreign court—specifically, the “Court of RDBMS.” It is time to treat them not as defendants to be judged by database-specific rules, but as an equal neighboring community with their own specialized sovereignty.
The issue has never been a failure of technology, but rather a misplacement of the focal point. While previous discussions were mired in the technicalities of “how to hardcode encryption into the core,” this proposal shifts the debate toward an architectural solution: “what interface the core should provide to external experts.”
- RDBMS Experts provide a trusted pipeline responsible for data I/O paths and consistency.
- Security Experts take responsibility for the specialized domain of encryption algorithms and key management.
This hook system functions as a Technical Protocol—a high-level agreement that allows these two expert groups to exchange data securely without encroaching on each other’s territory.
2. Design Principles
- Delegation of Authority: The core remains independent of specific encryption standards, providing a “free territory” where security experts can respond to an ever-changing security landscape.
- Diplomatic Convention: The Transformation ID acts as a communication protocol between the engine and the extension. The engine uses this ID to identify the state of the data and hands over control to the appropriate expert (the extension).
- Minimal Interference: Overhead is kept near zero when hooks are not in use, ensuring the native performance of the PostgreSQL engine.
3. Proposal Specifications
3.1 The Interface (Hook Points)
We allow intervention by security experts through five contact points along the I/O path:
- Read/Write Hooks:
mdread_post,mdwrite_pre,mdextend_pre(Transformation of the data area)- WAL Hooks:
xlog_insert_pre,xlog_decode_pre(Transformation of transaction logs)3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of
pd_flagsto define the “Security State” of a page. This serves as a Status Message sent by the security expert to the engine, utilized for key versioning and as a migration marker.4. Reference Implementation:
contrib/test_tdeA Standard Code of Conduct for Security Experts
This reference implementation exists not as a commercial product, but to define the Standards of the Diplomatic Protocol that encryption/decryption experts must follow when entering the PostgreSQL domain.
- Deterministic IV Derivation: Demonstrates how to achieve cryptographic safety by trusting unique values provided by the engine (e.g., LSN).
- Critical Section Safety: Defines memory management regulations that security logic must follow within “Critical Sections” to maintain system stability.
- Hook Chaining: Demonstrates a cooperative structure that allows peaceful coexistence with other expert tools (e.g., compression, auditing).
5. Scope
- In-Scope: Backend hook infrastructure, Transformation ID field, and reference code demonstrating diplomatic protocol compliance.
- Out-of-Scope: Specific Key Management Systems (KMS), selection of specific cryptographic algorithms, and integration with external tools.
This proposal represents a strategic diplomatic choice: rather than the PostgreSQL core assuming all security responsibilities, it grants security experts a sovereign territory through extensions where they can perform at their best.
I wonder if instead of support a lot of extra hooks it will be better to provide extensible SMGR API:
https://www.postgresql.org/message-id/flat/CAPP%3DHha_wV1MV9yR70QZ5pk5dtNP%2BbOyBiFxPmrMKqnQeKMAwQ%40mail.gmail.com#ab0da3412525c7501ea17f3d4c602bbf
It seems to be much more straightforward, convenient and flexible mechanism than adding hooks, which can be used for many other purposes except transparent encryption.
> Is it reasonable to ask cryptography experts to clone the entire SMGR > implementation and maintain code they don't fully understand just to > insert encryption logic? You don't have to clone the md.c logic with the recent smgr extension patch, it does the same thing your patch does: it lets you hook into it while still keeping the original md.c implementation. The difference is that it doesn't add additional hooks to the API, instead it makes all of the existing smgr/md.c functions hooks. This also means that it lets different extensions work together in a more generic way. For example an extension that wants to retrieve data files from cloud storage when needed (prepending the original md.c logic), and an encryption extension that wants to decrypt data after loading it (appending to the original md.c logic) can both work together while keeping the original logic in place. Or if it's about mdwritev, in this patch you added a new mdwrite_pre_hook - but it is executed at a specific point during mdwrite. In the generic smgr patch, mdwritev itself (or smgr_writev more specifically) is a hook, you can change it, and then call the previous implementation (typically mdwritev) when you want it, either before or after your custom code. (the latest submitted version of the smgr patch doesn't use typical postgres-style hooks, but that's one of the things we probably should change. The intention is the same) There's no maintenance fee of cloning, because neither extension cloned the original md.c logic, both extended it.
Hi Zsolt,
Thank you for your detailed questions. I'll address each point:
1. Bundling WAL and Buffer Manager
WAL and heap pages are simply different representations of the same
underlying data. Protecting only one side would be cryptographically
incomplete; an attacker could bypass encryption by reading the
unprotected side. Therefore, they must be treated as a single atomic
unit of protection.
2. Scope: Temporary Files, System Tables, and Frontend Tools
I intentionally kept the scope focused. Past TDE proposals often stalled
because they tried to solve everything at once, becoming too large to
review. I prefer a "divide-and-conquer" approach:
- Temporary files: Out of scope for this initial infrastructure proposal.
- System tables: While they cannot be encrypted during bootstrap (since
extensions aren't loaded), they can be transformed page-by-page during
normal operation.
- Frontend tools (pg_waldump, etc.): I am aware of this and have modified
versions. Currently, there is no standard mechanism for frontend hooks,
making this a broader challenge. For production, extensions could ship
their own modified frontend tools temporarily. Long-term, we may need
initdb-time configurations to unify backend/frontend hook behavior
that are fixed for the lifetime of the cluster.
3. Why Hooks Instead of SMGR
Please see my response to Konstantin in this thread regarding maintenance
debt and the "Separation of Concerns" between storage management and data
transformation.
4. Page Header Flags vs. Fork Files
My primary concern with using fork files for encryption metadata is crash
recovery. If a fork file and the actual data page become inconsistent
(e.g., during a crash), recovery becomes problematic because fork files
are not typically protected by WAL.
Storing the Transform ID in the header flags ensures that the metadata
travels with the page. This is essential for incremental key rotation,
where pages are gradually re-encrypted with newer keys over time. The
oldest key's pages are force-rotated, allowing continuous key rotation
without service interruption. I plan to propose a separate RFC for this
"gradual rotation" mechanism.
5. Benchmarks and Critical Section Overhead
Transformation happens inside the critical section but before acquiring
the WAL lock. On consumer-grade SSDs, the encryption latency is largely
masked by I/O wait times with negligible performance impact. On
high-performance storage (production SSDs, Apple Silicon, etc.), the
reduced I/O wait exposes the encryption overhead, which is visible but
modest. Detailed benchmarks require company approval - I will follow up
later.
Best regards,
Henson Choi
Hello!
I am glad to see that there are multiple TDE extension proposals being
worked on. For context, I am one of the developers working on the
pg_tde[1] extension, as well as on the extensible SMGR proposal that
Konstantin already linked.
This patch/proposal contains two distinct parts of
encryption/extensibility, WAL and buffer manager/table data. Based on
earlier discussions, the opinions of adding extension points to these
two are quite different, and because of that I'm not sure if bundling
them together is helpful.
It also appears to be missing some extension points that would be
required for a more complete encryption solution, such as encrypting
temporary files or system tables, or handling command-line utilities
like pg_waldump. Do you have ideas or patches in mind for those areas
as well?
I have the same question as Konstantin, why did you choose custom
hooks for the buffer manager instead of the already existing smgr
interface / extensibility patch? While that patch is not part of the
core (but I hope it will be), it is already used by multiple companies
as it supports other use cases, not only encryption. We plan to focus
more on that thread early next year, we would appreciate any
feedback/suggestions that could make it better for others.
I also noticed that you added additional flags to the page header.
Initially we were thinking about something like this, but decided that
the fork files are better for any encryption (or other storage
related) extra data. These few bits try to be generic, while also
restrictive because of the limited amount of data. (and that data is
specifically per page, if I want something per file or per page range,
I still need a custom solution)
Regarding the WAL encryption part, we took a completely different
approach, similar to how we handle normal table data (page-based). I
will need to think more about this before I can provide meaningful
feedback on that part of the patch. One initial question, however, is
whether you have run detailed benchmarks with different workloads.
That seems to be the trickiest part there, since most of the code runs
in a critical section. (Not the "unused"/"empty hook" path, but the
overhead caused by a real encryption plugin using this hook in
practice)
[1]: https://github.com/percona/pg_tde
On 28/12/2025 4:53 PM, Henson Choi wrote: > Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks > > Hi Konstantin, > > I have great respect for the work being done on the extensible SMGR API. > It is a powerful tool for use cases that require replacing the entire > storage layer (like Neon's architecture). > > However, I believe we should distinguish between Storage Management > (where/how data is stored) and Data Transformation (what the data looks > like). I see a strong case for both approaches to coexist for the > following practical reasons: > > 1. Separation of Concerns and Safety > > Is it reasonable to ask cryptography experts to clone the entire SMGR > implementation and maintain code they don't fully understand just to > insert encryption logic? If an extension developer clones md.c to add > encryption, they become responsible for the fundamental integrity of > PostgreSQL's file I/O. Any bug in their cloned storage logic could lead > to data loss unrelated to encryption itself. > > 2. The Maintenance Debt of "Cloning" > > When md.c receives critical security patches or bug fixes in the core, > every TDE extension maintainer would need to manually backport those > changes to their specific SMGR implementation. This creates a fragmented > ecosystem where security extensions might actually introduce storage > vulnerabilities by running outdated cloned logic. > > 3. Minimalist Integration > > The hook approach allows crypto experts to focus strictly on transform() > and reverse_transform(). The complex storage orchestration remains with > the PostgreSQL core where it is most rigorously tested. This is a cleaner > separation of responsibilities: the core provides the trusted pipeline, > and the extension provides the specialized transformation. > > Conclusion: > > I believe these hooks provide a "low-barrier, high-safety" path for data > transformation that the SMGR API—by its very nature of being a full > replacement—cannot easily provide. Let's provide the SMGR for those who > want to reinvent the storage, and hooks for those who simply want to > secure the data. > > Best regards, > Henson Choi I do not think that custom SMGR API contradicts to the idea of Data Transformation. Do you know about decorator pattern? If you want to implement i.e. data encryption, you definitely do not need to write your storage manager from the scratch. Obviously you can (and should) use standard storage manager (md.c) for actually performing IO. But your storage manager can perform some extra action prior of after IO, for example encrypt data before write and decrypt it after read. So any pre/post/instead hooks can be easily implemented using custom SMGR. Opposite unfortunately is not possible. You can not for example implement encryption+compression using hooks. But you can easily do it using custom SMGR: this is how compressed file system (CFS) was implemented in PgPro.
I understand the decorator pattern, and yes, it can work for some cases.
But decorators can only intercept at the beginning and end of functions.
Looking at the actual hook locations in md.c:
- mdextend_pre_hook: after error checks, before file open → Decorator possible
- mdwrite_pre_hook: after assertions, before I/O loop → Decorator possible
- mdread_post_hook: inside the segment loop → Decorator NOT possible
The mdreadv() function, introduced in PostgreSQL 17 as part of the
vectored I/O API, processes multiple blocks in a loop that respects
segment boundaries. The decryption hook must be called inside this loop,
after each segment's FileReadV() completes. A decorator wrapping mdreadv()
from the outside cannot access this internal loop timing.
With the SMGR decorator approach, the extension developer must:
- Track upstream md.c changes
- Replicate the internal loop logic to find the right decryption point
With hooks, the extension developer only needs to:
- Implement encrypt() and decrypt()
Regarding encryption+compression: that's a valid use case for SMGR,
but our primary concern is different. In South Korea, government
regulations require the use of nationally-approved cryptographic
algorithms (such as ARIA, SEED). This means organizations often cannot
adopt foreign TDE solutions, regardless of their technical merit.
We need a simple, stable hook interface that allows local security
experts to integrate these required algorithms - experts who understand
cryptography but not PostgreSQL storage internals.
If both approaches can coexist, why not provide hooks for the simple
case and SMGR for the complex case?
Best regards,
Henson Choi
On 28/12/2025 4:53 PM, Henson Choi wrote:
> Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
>
> Hi Konstantin,
>
> I have great respect for the work being done on the extensible SMGR API.
> It is a powerful tool for use cases that require replacing the entire
> storage layer (like Neon's architecture).
>
> However, I believe we should distinguish between Storage Management
> (where/how data is stored) and Data Transformation (what the data looks
> like). I see a strong case for both approaches to coexist for the
> following practical reasons:
>
> 1. Separation of Concerns and Safety
>
> Is it reasonable to ask cryptography experts to clone the entire SMGR
> implementation and maintain code they don't fully understand just to
> insert encryption logic? If an extension developer clones md.c to add
> encryption, they become responsible for the fundamental integrity of
> PostgreSQL's file I/O. Any bug in their cloned storage logic could lead
> to data loss unrelated to encryption itself.
>
> 2. The Maintenance Debt of "Cloning"
>
> When md.c receives critical security patches or bug fixes in the core,
> every TDE extension maintainer would need to manually backport those
> changes to their specific SMGR implementation. This creates a fragmented
> ecosystem where security extensions might actually introduce storage
> vulnerabilities by running outdated cloned logic.
>
> 3. Minimalist Integration
>
> The hook approach allows crypto experts to focus strictly on transform()
> and reverse_transform(). The complex storage orchestration remains with
> the PostgreSQL core where it is most rigorously tested. This is a cleaner
> separation of responsibilities: the core provides the trusted pipeline,
> and the extension provides the specialized transformation.
>
> Conclusion:
>
> I believe these hooks provide a "low-barrier, high-safety" path for data
> transformation that the SMGR API—by its very nature of being a full
> replacement—cannot easily provide. Let's provide the SMGR for those who
> want to reinvent the storage, and hooks for those who simply want to
> secure the data.
>
> Best regards,
> Henson Choi
I do not think that custom SMGR API contradicts to the idea of Data
Transformation.
Do you know about decorator pattern?
If you want to implement i.e. data encryption, you definitely do not
need to write your storage manager from the scratch.
Obviously you can (and should) use standard storage manager (md.c) for
actually performing IO.
But your storage manager can perform some extra action prior of after
IO, for example encrypt data before write and decrypt it after read.
So any pre/post/instead hooks can be easily implemented using custom SMGR.
Opposite unfortunately is not possible. You can not for example
implement encryption+compression using hooks.
But you can easily do it using custom SMGR: this is how compressed file
system (CFS) was implemented in PgPro.
On 28/12/2025 5:51 PM, Henson Choi wrote: > Hi Konstantin, > > I understand the decorator pattern, and yes, it can work for some cases. > But decorators can only intercept at the beginning and end of functions. > > Looking at the actual hook locations in md.c: > > - mdextend_pre_hook: after error checks, before file open → Decorator > possible > - mdwrite_pre_hook: after assertions, before I/O loop → Decorator possible > - mdread_post_hook: inside the segment loop → Decorator NOT possible > > The mdreadv() function, introduced in PostgreSQL 17 as part of the > vectored I/O API, processes multiple blocks in a loop that respects > segment boundaries. The decryption hook must be called inside this loop, > after each segment's FileReadV() completes. A decorator wrapping mdreadv() > from the outside cannot access this internal loop timing. > > With the SMGR decorator approach, the extension developer must: > - Track upstream md.c changes > - Replicate the internal loop logic to find the right decryption point > > With hooks, the extension developer only needs to: > - Implement encrypt() and decrypt() > > Regarding encryption+compression: that's a valid use case for SMGR, > but our primary concern is different. In South Korea, government > regulations require the use of nationally-approved cryptographic > algorithms (such as ARIA, SEED). This means organizations often cannot > adopt foreign TDE solutions, regardless of their technical merit. > > We need a simple, stable hook interface that allows local security > experts to integrate these required algorithms - experts who understand > cryptography but not PostgreSQL storage internals. > > If both approaches can coexist, why not provide hooks for the simple > case and SMGR for the complex case? > > Best regards, > Henson Choi Hi Henson, Thank you for explanations. I personally do not like hooks, I considered them as some kind of crutches which are needed to fix some problems with existed APIs:) But them are quite popular in Postgres and really make it extensible. The task of transparent data encryption is really very important for Postgres (if for some reasons it can not be done at file system level). If we need to add more hooks to make it possible to add to Postgres, then dozen of yet another hooks may be acceptable... I have not investigated it precisely, may be you are right that it is possible to implement transparent encryption using using decorator approach and custom SMGR. Frankly speaking I am quite upset how AIO was added to PG18. It introduces orthogonal hierarchy to SMGR and cause some tight dependencies between this two modules which makes extension of any of them problematic if ever possible (i.e. if I want to add my storage manager and make AIO use it to access files system, rather than calling pread/pwrite directly). I am not sure that AIO can not be added through SMGR hierarchy (certainly by extending this interface), but it is certainly separate store having no relation to the topic of this discussion. So I can assume that current coupling of AIO with SMGR makes it not possible to plugin transparent encryption rather than adding this hooks. Still not quite sure that proposed set of hooks is absolutely necessary and sufficient...
> - mdread_post_hook: inside the segment loop → Decorator NOT possible > The mdreadv() function, introduced in PostgreSQL 17 as part of the > vectored I/O API, processes multiple blocks in a loop that respects > segment boundaries. The decryption hook must be called inside this loop, > after each segment's FileReadV() completes. A decorator wrapping mdreadv() > from the outside cannot access this internal loop timing. It is possible - or rather, we plan to propose a different patch for that. There are already some discussions about extendibility of AIO, which is currently quite minimal, and this is another point for that. If you look into the AIO sources, it already uses an array of callbacks, and there's only a small missing piece there - making it possible for extensions to add entries to that array. With that patch, it is possible to decorate smgr_startreadv, add your own callback, and then call the original mdstartreadv function. Since aio callbacks are executed in the opposite order, this will work out exactly as needed, as the AIO handler will first call the md completion handler, then yours. My logic here is similar to the previous argument: this AIO extensibility for startreadv is also needed for other uses of the smgr extension, most likely for everyone who uses the current patch. It shouldn't be specific to encryption. > With the SMGR decorator approach, the extension developer must: > - Track upstream md.c changes > - Replicate the internal loop logic to find the right decryption point > With hooks, the extension developer only needs to: > - Implement encrypt() and decrypt() > We need a simple, stable hook interface that allows local security > experts to integrate these required algorithms - experts who understand > cryptography but not PostgreSQL storage internals. Extension developers still have to understand the multiprocess nature of postgres (with AIO you also have to remember that it is possible for the completion to happen in a different process, possibly in a worker process), or its unusual memory management patterns, critical sections, and so on. You most likely also have to deal with shared memory caches, locks, and so on. (And as I said above, you don't have to replicate/track md.c, we only need a good, generic extension point usable for many extensions) > In South Korea, government > regulations require the use of nationally-approved cryptographic > algorithms (such as ARIA, SEED). This means organizations often cannot > adopt foreign TDE solutions, regardless of their technical merit. Have you considered contributing to existing solutions? Adding support to multiple algorithms to an existing library is easier than developing your own from scratch. > WAL and heap pages are simply different representations of the same > underlying data. Protecting only one side would be cryptographically > incomplete; an attacker could bypass encryption by reading the > unprotected side. Therefore, they must be treated as a single atomic > unit of protection. From a security point of view, I agree. From a practical one, it's a bit more complicated. As you mentioned South Korean regulations, we also have regulations in the European Union, and you can conform to the current regulations by only encrypting your data files (at least that's what I heard, I'm not a lawyer). So from a practical point of view, for us, even getting support for table encryption hooks into the core would be a success. > My primary concern with using fork files for encryption metadata is crash > recovery. If a fork file and the actual data page become inconsistent > (e.g., during a crash), recovery becomes problematic because fork files > are not typically protected by WAL. Custom WAL records about encryption events (key rotation/change/etc) should solve this problem? > I plan to propose a separate RFC for this > "gradual rotation" mechanism. Would this gradual rotation mechanism be useful for anything else other than encryption extensions? While I also had the same idea, I don't see how it would be useful for anything else, so I didn't plan to submit any patches related to this. This is something that can be easily implemented as a background worker in a tde extension, and doesn't really require core support.
On 28/12/2025 5:25 PM, Henson Choi wrote: > Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks > > Hi Zsolt, > > Thank you for your detailed questions. I'll address each point: > > 1. Bundling WAL and Buffer Manager > > WAL and heap pages are simply different representations of the same > underlying data. Protecting only one side would be cryptographically > incomplete; an attacker could bypass encryption by reading the > unprotected side. Therefore, they must be treated as a single atomic > unit of protection. I am not expert in cryptography, better say I even dummy in this area. But I have one concern about proposed WAL encryption (record level encryption). Content of some WAL records can be almost completely predicated (it contains no user data, just some Postgres internal data which can be easily reconstructed). I wonder if this fact can significantly simplify task of cracking cypher? May be it is safer to use page level encryption for WAL also?
On 12/28/25 08:49, Henson Choi wrote: > > 3. Proposal Specifications > > > 3.1 The Interface (Hook Points) > > We allow intervention by security experts through five contact points > along the I/O path: > > * *Read/Write Hooks:* |mdread_post|, |mdwrite_pre|, |mdextend_pre| > (Transformation of the data area) > * *WAL Hooks:* |xlog_insert_pre|, |xlog_decode_pre| (Transformation of > transaction logs) > > > 3.2 The Protocol Identifier (PageHeader Transformation ID) > > We allocate 5 bits of |pd_flags| to define the “Security State” of a > page. This serves as a *Status Message* sent by the security expert to > the engine, utilized for key versioning and as a migration marker. > Isn't this rather problematic? This seems to be meant to be extensible, which means there can be multiple extensions setting the hooks. Which we generally allow, and the custom is to call the previous hook. What happens if there are multiple extensions implementing the hook? Would that be allowed or prohibited in this case? Maybe it doesn't make sense, but then why wouldn't it be possible? FWIW I find it very unlikely we'd allow reserving pd_flags bits for an extension. These bits are meant to be used by core, there's very limited number of such bits. In general, I'm somewhat skeptical of the claim a collection of hooks is "low-barrier, high-safety". It seems pretty fragile to me, and I can envision a lot of maintenance difficulties in the future. Not just for the extension developers, but for the project too - adding a bunch of random hooks is not free for us, we'll need to keep it working in future releases, etc. Perhaps the current SMGR code is not extensible/flexible enough, but then we need to improve that. I'd imagine a simple SMGR doing the encryption, but federating most of the work to a "full" SMGR. But I haven't thought about that too much. regards -- Tomas Vondra
Hi Zsolt,
Thank you for the detailed technical feedback. Let me address each point.
1. AIO Extensibility and SMGR Approach
I think the SMGR extensibility approach is equally valid. In fact, when I realized in PG18 that buffer page reads are split between md.c (mdreadv) and bufmgr.c (buffer_readv_complete_one), I felt some discomfort about where to place the decryption hook. "Does this really belong in both places?" was my first thought.
The SMGR approach could provide a cleaner, more unified integration point for data transformation.
The main difference is timing and current availability:
- The hook approach is working today and can be used immediately
- Your SMGR extensibility work provides a more comprehensive long-term solution
I don't see these as competing proposals. Both approaches are valid and serve different needs. The hook infrastructure can serve as an interim solution for organizations that need TDE now, while the community develops the more comprehensive SMGR extensibility.
In the long term, if SMGR extensibility provides better integration points, extensions could migrate to that approach.
2. Understanding PostgreSQL Internals
You're absolutely right that extension developers need to understand multiprocess architecture, memory management, critical sections, and so on.
This is precisely why test_tde exists as a reference implementation. It documents the "dance steps" with the core - showing where memory must be pre-allocated, how to handle critical sections safely, when AIO completion might happen in a different process, and so on.
The goal isn't to hide PostgreSQL's complexity, but to provide a working example that shows cryptography experts exactly where and how to integrate their algorithms within PostgreSQL's constraints.
3. Contributing to Existing Solutions vs Korean Regulations
I appreciate the suggestion about contributing to existing solutions. I personally prefer the OpenSSL Provider approach for algorithm extensibility.
However, the reality is more complex.
Cryptography experts often have their own libraries developed over decades. While it might look like "just encryption code" to me, I don't have the authority to force them to adopt specific frameworks.
ARIA and SEED are already implemented in OpenSSL. However, Korean law requires certified implementations. Specifically, companies must use nationally-certified builds and provide the hash codes of those specific library binaries to regulators. You cannot simply use the OpenSSL version, even if the algorithm is identical.
This is why we need an extension mechanism rather than hardcoding specific libraries into core. Different jurisdictions have different certification requirements.
4. WAL vs Data File Encryption
You mentioned that EU regulations might be satisfied by encrypting only data files. That's a valid practical consideration.
In Korea, regulations require the introduction of approved cryptographic algorithms, but in practice most systems run AES due to lack of CPU acceleration for ARIA/SEED. It's largely a legal compliance checkbox.
Regarding what to protect (WAL vs heap vs both), there's flexibility depending on the organization and jurisdiction. The hook approach allows extensions to choose - you can implement only the buffer hooks if that satisfies your requirements, or add WAL hooks if needed.
5. Fork Files vs Page Header for Metadata
You asked whether custom WAL records about encryption events could solve the crash recovery problem with fork files.
That's a reasonable approach for SMGR-based solutions where you control the storage layer. However, with the hook approach, we don't have the ability to inject custom WAL records for encryption events.
Currently, in a replication environment, the reference implementation requires the same key to be configured in the settings on both primary and replicas (shared key model). For future KMS integration, I'm considering mechanisms to propagate keys to replicas through external channels rather than WAL.
The page header approach was chosen because it keeps the encryption state self-contained within each page, avoiding the need for separate metadata synchronization.
6. Gradual Rotation Mechanism
I agree with you - I don't think core support is necessary for gradual rotation either.
I mentioned it in my earlier email response only as a potential reference implementation concept to guide encryption developers. It's something that can and should be implemented in the extension's background worker, not in core.
Summary
I see the hook approach and SMGR extensibility as equally valid, addressing different timelines and use cases:
- Hooks: Available now, lighter-weight, sufficient for compliance-driven TDE
- SMGR extensibility: More comprehensive, cleaner architecture, better long-term solution
Both should coexist. Organizations can use hooks today while SMGR extensibility matures, then migrate if the SMGR approach better fits their needs.
I'm very interested in your experience with pg_tde and the SMGR extensibility work. If there are specific design considerations from that work that would inform these hooks, I'd appreciate your input.
Best regards,
Henson
> - mdread_post_hook: inside the segment loop → Decorator NOT possible
> The mdreadv() function, introduced in PostgreSQL 17 as part of the
> vectored I/O API, processes multiple blocks in a loop that respects
> segment boundaries. The decryption hook must be called inside this loop,
> after each segment's FileReadV() completes. A decorator wrapping mdreadv()
> from the outside cannot access this internal loop timing.
It is possible - or rather, we plan to propose a different patch for
that. There are already some discussions about extendibility of AIO,
which is currently quite minimal, and this is another point for that.
If you look into the AIO sources, it already uses an array of
callbacks, and there's only a small missing piece there - making it
possible for extensions to add entries to that array. With that patch,
it is possible to decorate smgr_startreadv, add your own callback, and
then call the original mdstartreadv function. Since aio callbacks are
executed in the opposite order, this will work out exactly as needed,
as the AIO handler will first call the md completion handler, then
yours.
My logic here is similar to the previous argument: this AIO
extensibility for startreadv is also needed for other uses of the smgr
extension, most likely for everyone who uses the current patch. It
shouldn't be specific to encryption.
> With the SMGR decorator approach, the extension developer must:
> - Track upstream md.c changes
> - Replicate the internal loop logic to find the right decryption point
> With hooks, the extension developer only needs to:
> - Implement encrypt() and decrypt()
> We need a simple, stable hook interface that allows local security
> experts to integrate these required algorithms - experts who understand
> cryptography but not PostgreSQL storage internals.
Extension developers still have to understand the multiprocess nature
of postgres (with AIO you also have to remember that it is possible
for the completion to happen in a different process, possibly in a
worker process), or its unusual memory management patterns, critical
sections, and so on. You most likely also have to deal with shared
memory caches, locks, and so on.
(And as I said above, you don't have to replicate/track md.c, we only
need a good, generic extension point usable for many extensions)
> In South Korea, government
> regulations require the use of nationally-approved cryptographic
> algorithms (such as ARIA, SEED). This means organizations often cannot
> adopt foreign TDE solutions, regardless of their technical merit.
Have you considered contributing to existing solutions? Adding support
to multiple algorithms to an existing library is easier than
developing your own from scratch.
> WAL and heap pages are simply different representations of the same
> underlying data. Protecting only one side would be cryptographically
> incomplete; an attacker could bypass encryption by reading the
> unprotected side. Therefore, they must be treated as a single atomic
> unit of protection.
From a security point of view, I agree. From a practical one, it's a
bit more complicated. As you mentioned South Korean regulations, we
also have regulations in the European Union, and you can conform to
the current regulations by only encrypting your data files (at least
that's what I heard, I'm not a lawyer).
So from a practical point of view, for us, even getting support for
table encryption hooks into the core would be a success.
> My primary concern with using fork files for encryption metadata is crash
> recovery. If a fork file and the actual data page become inconsistent
> (e.g., during a crash), recovery becomes problematic because fork files
> are not typically protected by WAL.
Custom WAL records about encryption events (key rotation/change/etc)
should solve this problem?
> I plan to propose a separate RFC for this
> "gradual rotation" mechanism.
Would this gradual rotation mechanism be useful for anything else
other than encryption extensions? While I also had the same idea, I
don't see how it would be useful for anything else, so I didn't plan
to submit any patches related to this. This is something that can be
easily implemented as a background worker in a tde extension, and
doesn't really require core support.
Hi Tomas,
Thank you for this critical feedback. Your concerns go to the heart of the proposal's viability, and I appreciate your directness.
1. Multiple Extensions and Hook Chaining
You're right to question this. To be honest, I have significant doubts about allowing multiple transformation extensions simultaneously.
The Transform ID coordination problem is real: without a registry or protocol between extensions, they cannot cooperate safely. Hook chaining for read/write operations might work (extension A encrypts, extension B compresses), but the Transform ID field creates conflicts.
Perhaps I should be more direct: transformation hook chaining is not realistically possible with the current design. TDE extensions would need exclusive use of these hooks. This is a fundamental limitation I should have stated clearly in the RFC.
2. pd_flags Reservation - I Hope You'll Consider This
I understand your concern about reserving pd_flags bits for extensions. However, I'd like to ask you to consider the reasoning behind this choice.
The 5-bit Transform ID serves a critical purpose: it allows the core to identify the page's transformation state without attempting decryption. This is important for:
- Error reporting: "This page is encrypted with transform ID 5, but no extension is loaded to handle it"
- Migration safety: Distinguishing between untransformed pages (ID=0) and transformed pages during gradual encryption
- Crash recovery: The core can detect transformation state inconsistencies
That said, I recognize pd_flags is precious and limited. Let me propose an alternative approach that might better align with core principles:
Instead of extension-specific Transform IDs, what if we allow extensions to reserve space at pd_upper (similar to how special space works at pd_special)?
The core could manage a small flag (2-3 bits) indicating "N bytes at pd_upper are reserved for transformation metadata". By encoding N as multiples of 2 or 4 bytes, we maximize the flag's efficiency:
- 2 bits encoding 4-byte multiples: 0-12 bytes (sufficient for most cases)
- 3 bits encoding 4-byte multiples: 0-28 bytes (covers all reasonable needs)
- 3 bits encoding 2-byte multiples: 0-14 bytes (finer granularity)
This approach uses minimal pd_flags bits while providing substantial metadata space. It would:
- Keep the flag in core control (not extension-specific)
- Allow extensions to store IV, authentication tags, key version, etc. in a standardized location
- Be self-describing (the flag tells you how much space is reserved)
- Generalize beyond encryption (compression, checksums, etc. could use it)
In our internal implementation, we actually add opaque bytes to PageHeader for encryption metadata. This pd_upper approach could formalize that pattern for extensions.
I believe some form of page-level metadata for transformations is necessary. Would either approach (Transform ID or pd_upper reservation) be acceptable with the right design, or do you see fundamental issues with page-level transformation metadata itself?
3. Maintenance Burden and Test Coverage
I deeply appreciate this concern. Having worked across various DBMS implementations, I've seen solution vendors ship without comprehensive regression testing - but never a database vendor. DBMS maintenance is extraordinarily difficult, and storage errors are catastrophic.
This is precisely why test_tde exists as a reference implementation. But you've identified the real issue: we need much stronger test coverage for the hooks themselves.
The test cases should:
- Detect when core changes break hook contracts
- Verify hook behavior under all I/O paths (sync, async, error cases)
- Validate critical section safety
- Test interaction with checksums, crash recovery, replication
I agree the current test coverage is insufficient for core inclusion. Would expanding the test suite to cover these scenarios address your maintenance concerns, or do you see fundamental fragility beyond what testing can solve?
4. Hooks vs Transform Layer - Pragmatic Timeline
You suggested improving SMGR extensibility rather than adding hooks. I think you're architecturally right about the long-term direction.
However, I want to be pragmatic about timelines:
The hook and pd_flags approach, despite its limitations, can deliver working TDE in the shortest time. Organizations facing regulatory deadlines need something that works now, not in 2-3 years.
That said, your feedback has sparked a better idea: what if we think of this not as "SMGR extension" or "hooks" but as a pluggable Transform Layer that SMGR and WAL subsystems delegate to?
Conceptually:
Application Layer
|
Buffer Manager
|
+------------------+
| Transform Layer | <-- Encryption, etc.
+------------------+
|
SMGR / WAL
|
File I/O
This is architecturally cleaner than scattered hooks, and more focused than full SMGR extensibility. The Transform Layer would:
- Provide a unified interface for data transformation
- Work across backend, frontend tools, and replication
- Handle metadata management in a standardized way
- Support encryption, compression, or other transformations
I think this deserves its own discussion thread rather than conflating it with the current hook proposal. Would you be interested in starting a separate conversation about designing a Transform Layer interface for PostgreSQL?
In the meantime, the hook approach could serve organizations with immediate needs, and extensions could migrate to the Transform Layer once it's stabilized.
5. Frontend Tool Access
Both SMGR and hook approaches face a shared limitation: frontend tools (pg_checksums, pg_basebackup, etc.) that read files directly.
I previously suggested allowing initdb to specify a shared library that both backend and frontend can load for transformation. But as I reconsider this, it feels like it converges toward the Transform Layer idea: a well-defined interface that any PostgreSQL component can use.
This might be the real architectural question: not "hooks vs SMGR" but "how should PostgreSQL provide transformation points that work across backend, frontend, and replication boundaries?"
Summary
Your feedback has clarified three important points:
1. The current hook design has real limitations (multiple extension conflicts, pd_flags concerns)
2. Test coverage needs to be much more comprehensive
3. A cleaner abstraction might be needed long-term
I propose a dual approach:
Short-term: Move forward with the hook proposal for organizations with immediate regulatory needs. I commit to:
- Stating clearly that hook chaining is not supported
- Significantly expanding test coverage
- Treating this as a pragmatic solution with known limitations
Long-term: I'd like to start a separate discussion about a Transform Layer abstraction - a unified interface that could handle data transformation across backend, frontend tools, and replication. This would be architecturally cleaner than scattered hooks, and could eventually supersede this approach.
Would you be willing to review a Transform Layer proposal in a separate thread? I think it addresses the architectural concerns you've raised, while the hook approach serves immediate practical needs.
Best regards,
Henson
On 12/28/25 08:49, Henson Choi wrote:
>
> 3. Proposal Specifications
>
>
> 3.1 The Interface (Hook Points)
>
> We allow intervention by security experts through five contact points
> along the I/O path:
>
> * *Read/Write Hooks:* |mdread_post|, |mdwrite_pre|, |mdextend_pre|
> (Transformation of the data area)
> * *WAL Hooks:* |xlog_insert_pre|, |xlog_decode_pre| (Transformation of
> transaction logs)
>
>
> 3.2 The Protocol Identifier (PageHeader Transformation ID)
>
> We allocate 5 bits of |pd_flags| to define the “Security State” of a
> page. This serves as a *Status Message* sent by the security expert to
> the engine, utilized for key versioning and as a migration marker.
>
Isn't this rather problematic?
This seems to be meant to be extensible, which means there can be
multiple extensions setting the hooks. Which we generally allow, and the
custom is to call the previous hook.
What happens if there are multiple extensions implementing the hook?
Would that be allowed or prohibited in this case? Maybe it doesn't make
sense, but then why wouldn't it be possible?
FWIW I find it very unlikely we'd allow reserving pd_flags bits for an
extension. These bits are meant to be used by core, there's very limited
number of such bits.
In general, I'm somewhat skeptical of the claim a collection of hooks is
"low-barrier, high-safety". It seems pretty fragile to me, and I can
envision a lot of maintenance difficulties in the future. Not just for
the extension developers, but for the project too - adding a bunch of
random hooks is not free for us, we'll need to keep it working in future
releases, etc.
Perhaps the current SMGR code is not extensible/flexible enough, but
then we need to improve that. I'd imagine a simple SMGR doing the
encryption, but federating most of the work to a "full" SMGR. But I
haven't thought about that too much.
regards
--
Tomas Vondra
This is the fourth version of the Storage I/O Transformation Hooks patch series for implementing Transparent Data Encryption (TDE) in PostgreSQL.
Changes in v4:
This version fixes cross-platform compatibility issues found in CI testing that caused failures on BSD and Windows:
- Fixed BSD regression test warning about tablespace naming conventions (renamed to "regress_tde_tblspc")
- Fixed Windows test failures caused by platform-specific shell commands (mkdir -p)
- Replaced filesystem-based tablespace tests with allow_in_place_tablespaces approach for cross-platform compatibility
The core hook infrastructure (patch 0001) and reference TDE implementation (patch 0002) remain unchanged from v3. Patch 0003 contains only the test compatibility fixes.
Patch series:
0001: Core hook infrastructure for I/O transformation
0002: Reference TDE implementation using AES-256-CTR
0003: Cross-platform test fixes for BSD and Windows
Testing:
The test_tde extension demonstrates:
- Page-level encryption/decryption with AES-256-CTR
- IV derivation using LSN, block number, and relation file number
- Tablespace-level encryption configuration
- WAL encryption support
These fixes resolve the BSD and Windows test failures.
Best regards,
Hi,
Here is v3 of the Storage I/O Transform Hooks patch.
Changes from v2:
- Fix -Wincompatible-pointer-types error in bufmgr.c by casting
&bufdata to (void **) for mdread_post_hook call
v2 changes were:
- Add meson.build test configuration for test_tde extension
--
Best regards,
Sungkyun Park2025년 12월 28일 (일) PM 7:44, Henson Choi <assam258@gmail.com>님이 작성:Updated patches with meson build support:
v2:
- Added meson.build for test_tde extension
- Added test_tde to contrib/meson.build
Regards,
Henson Choi2025년 12월 28일 (일) PM 6:47, Henson Choi <assam258@gmail.com>님이 작성:Hello,
Following up on the RFC, I am submitting the initial patch set for the proposed infrastructure. These patches introduce a minimal hook-based protocol to allow extensions to handle data transformation, such as TDE, while keeping the PostgreSQL core independent of specific cryptographic implementations.
Implementation Details:
Hook Points in Storage I/O Path
The patch introduces five strategic hook points:
mdread_post_hook: Called after blocks are read from disk. The extension can reverse-transform data in place.
mdwrite_pre_hook & mdextend_pre_hook: Called before writing or extending blocks. These hooks return a pointer to transformed buffers.
xlog_insert_pre_hook & xlog_decode_pre_hook: Handle transformation for WAL records during insertion and replay.
Data Integrity and Checksum Protocol
To ensure robust error detection, the hooks follow a specific verification protocol:
On Write: The extension transforms the page, sets the Transform ID, then recalculates the checksum on the transformed data.
On Read: The extension verifies the on-disk checksum of the transformed data first. After reverse-transformation, it clears the Transform ID and recalculates the checksum for the plaintext data. This ensures corruption is detected regardless of the transformation state.
WAL Safety via XLR_BLOCK_ID_TRANSFORMED (251)
For WAL records, I have introduced a specific block ID (251) to mark transformed data. If the decryption extension is not loaded, the WAL reader will encounter this unknown block ID and fail-fast, preventing the system from incorrectly interpreting encrypted data as valid WAL records.
PageHeader Transform ID (5-bit)
I have allocated bits 3-7 of pd_flags in the PageHeader for a Transform ID. This allows the engine and extensions to identify the transformation state of a page (e.g., key versioning or algorithm type) without attempting decryption. It ensures backward compatibility: pages with Transform ID 0 are treated as standard untransformed pages.
Memory and Critical Section Safety
As demonstrated in the contrib/test_tde reference implementation, cipher contexts are pre-allocated in _PG_init to avoid memory allocation during critical sections. For WAL transformation, MemoryContextAllowInCriticalSection() is used to allow buffer reallocation within critical sections; if OOM occurs during buffer growth, it results in a controlled PANIC.
Performance Considerations
When hooks are not set (default), the overhead is limited to a single NULL pointer comparison per I/O operation. This is architecturally consistent with existing PostgreSQL hooks and is designed to have a negligible impact on performance.
Attached Patches:
v20251228-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patch: Core infrastructure.
v20251228-0002-Add-test_tde-extension-for-TDE-testing.patch: Reference implementation using AES-256-CTR.
I look forward to your comments and feedback.
Regards,
Henson Choi2025년 12월 28일 (일) PM 4:49, Henson Choi <assam258@gmail.com>님이 작성:RFC: PostgreSQL Storage I/O Transformation Hooks
Infrastructure for a Technical Protocol Between RDBMS Core and Data Security Experts
Author: Henson Choi assam258@gmail.com
Date: 2025-12-28
PostgreSQL Version: master (Development)
1. Summary & Motivation
This RFC proposes the introduction of minimal hooks into the PostgreSQL storage layer and the addition of a Transformation ID field to the
PageHeader.A Diplomatic Protocol Between Expert Groups
The core motivation of this proposal is “Separation of Concerns and Mutual Respect.”
Historically, discussions around Transparent Data Encryption (TDE) have often felt like putting security experts on trial in a foreign court—specifically, the “Court of RDBMS.” It is time to treat them not as defendants to be judged by database-specific rules, but as an equal neighboring community with their own specialized sovereignty.
The issue has never been a failure of technology, but rather a misplacement of the focal point. While previous discussions were mired in the technicalities of “how to hardcode encryption into the core,” this proposal shifts the debate toward an architectural solution: “what interface the core should provide to external experts.”
- RDBMS Experts provide a trusted pipeline responsible for data I/O paths and consistency.
- Security Experts take responsibility for the specialized domain of encryption algorithms and key management.
This hook system functions as a Technical Protocol—a high-level agreement that allows these two expert groups to exchange data securely without encroaching on each other’s territory.
2. Design Principles
- Delegation of Authority: The core remains independent of specific encryption standards, providing a “free territory” where security experts can respond to an ever-changing security landscape.
- Diplomatic Convention: The Transformation ID acts as a communication protocol between the engine and the extension. The engine uses this ID to identify the state of the data and hands over control to the appropriate expert (the extension).
- Minimal Interference: Overhead is kept near zero when hooks are not in use, ensuring the native performance of the PostgreSQL engine.
3. Proposal Specifications
3.1 The Interface (Hook Points)
We allow intervention by security experts through five contact points along the I/O path:
- Read/Write Hooks:
mdread_post,mdwrite_pre,mdextend_pre(Transformation of the data area)- WAL Hooks:
xlog_insert_pre,xlog_decode_pre(Transformation of transaction logs)3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of
pd_flagsto define the “Security State” of a page. This serves as a Status Message sent by the security expert to the engine, utilized for key versioning and as a migration marker.4. Reference Implementation:
contrib/test_tdeA Standard Code of Conduct for Security Experts
This reference implementation exists not as a commercial product, but to define the Standards of the Diplomatic Protocol that encryption/decryption experts must follow when entering the PostgreSQL domain.
- Deterministic IV Derivation: Demonstrates how to achieve cryptographic safety by trusting unique values provided by the engine (e.g., LSN).
- Critical Section Safety: Defines memory management regulations that security logic must follow within “Critical Sections” to maintain system stability.
- Hook Chaining: Demonstrates a cooperative structure that allows peaceful coexistence with other expert tools (e.g., compression, auditing).
5. Scope
- In-Scope: Backend hook infrastructure, Transformation ID field, and reference code demonstrating diplomatic protocol compliance.
- Out-of-Scope: Specific Key Management Systems (KMS), selection of specific cryptographic algorithms, and integration with external tools.
This proposal represents a strategic diplomatic choice: rather than the PostgreSQL core assuming all security responsibilities, it grants security experts a sovereign territory through extensions where they can perform at their best.
Вложения
> Content of some WAL records can be almost completely predicated (it > contains no user data, > just some Postgres internal data which can be easily reconstructed). > I wonder if this fact can significantly simplify task of cracking cypher? AES is designed to resist known plaintext attacks, this isn't an issue as long as the code doesn't reuse the same IV twice. The example code uses a random iv for each WAL record, so that's unlikely. This is a quite nice solution to keep the encryption of WAL as parallel as possible. The downside is that it increases the size of WAL a bit, uses MemoryContextAllowInCriticalSection, and this approach is definitely slower during recovery than full page decryption. On the other hand, per page WAL encryption can cause performance issues with some workloads that write huge amounts of WAL with many parallel clients. Both have pros and cons. One thing that seems tricky is wal key rotation. The example code ignores this, which is fine for a demo, but real extensions should be able to handle it. We can't simply write a wal record about changing the wal key, because without holding the write lock things could get written out of order. The only safe solution I see is to also add the id of the wal key to the additional wal record data, increasing the record size even more.
> The main difference is timing and current availability: > > - The hook approach is working today and can be used immediately . - Your SMGR extensibility work provides a more comprehensive long-term solution I disagree with this. The SMGR patch is available since 2023/PG16 as a patch, and it is already used by at least 3 companies I know of (Neon, Nile, Percona), and probably also by others I don't know of. It is available immediately. Compared to that this proposal is something new, and more limited. The actual advantage of this proposal is that it includes WAL, but I still think the two should be separate discussions. > Regarding what to protect (WAL vs heap vs both), there's flexibility depending on the organization and jurisdiction. Thehook approach allows extensions to choose - you can implement only the buffer hooks if that satisfies your requirements,or add WAL hooks if needed. My concern is that these two separate discussion about 2 extensibility points, with different concerns by different people. One part shouldn't stall the other, as for some, even getting half of it into the core for PG19 would be useful. > You're absolutely right that extension developers need to understand multiprocess architecture, memory management, criticalsections, and so on. > This is precisely why test_tde exists as a reference implementation. The reference implementation ignores the tricky steps, like key rotation, caching, configuration, providing a user interface, etc, which all require knowledge of postgres internals. > ARIA and SEED are already implemented in OpenSSL. However, Korean law requires certified implementations. Specifically,companies must use nationally-certified builds and provide the hash codes of those specific library binariesto regulators. You cannot simply use the OpenSSL version, even if the algorithm is identical. That could be still solved by introducing an abstraction layer in the encryption code of a TDE extension :) Encryption is only a small part of an extension, the other parts (user interface, rotation, key storage integrations, etc) are a much bigger part. It is still questionable to reimplement everything because of an encryption library difference. But I see your point, that is a bit more difficult. > That's a reasonable approach for SMGR-based solutions where you control the storage layer. However, with the hook approach,we don't have the ability to inject custom WAL records for encryption events. > Currently, in a replication environment, the reference implementation requires the same key to be configured in the settingson both primary and replicas (shared key model). For future KMS integration, I'm considering mechanisms to propagatekeys to replicas through external channels rather than WAL. I originally wrote a long answer about how I don't think this is related to where the hooks are, and then I realized that the problem is probably completely different - and this also shows why adding a few bits to the pages is not a good generic solution for all extensions. Our extension uses a 2 level key architecture, as used by most database servers (there's a master key, and it encodes separate internal keys, one for each database file). The proposed sample code in your patch uses a single key, with the IV encoding the database file. That means you want to encode which key is used for each page instead of for each file. So we approach how we map data/pages to keys completely differently. But I don't think the page header addition is a good solution, because it is specific to your implementation, not for encryption solutions in general. (Also, I just noticed that you forgot about timelineid in derive_iv, you probably want to include that somehow)
Thank you for the detailed feedback.
## SMGR Patch
You're right - I shouldn't argue about SMGR without actually reviewing
the patch. Let me step back from the SMGR discussion for now and focus
this proposal on WAL hooks only.
Could you point me to where I can access the SMGR extensibility patch?
I'd like to review it properly before any further discussion.
For SMGR, I'm also thinking about a different approach that could cover
Bootstrap and Frontend processes as well - but that's a separate
conversation after I understand the current SMGR proposal better.
## Reference Implementation Scope
As mentioned in my earlier messages, the reference implementation
(test_tde) intentionally doesn't cover key rotation and other
production concerns. Its purpose is to demonstrate the
hook API, not to be a production-ready TDE solution.
## Encryption Library Abstraction
I agree in principle that an abstraction layer would be ideal.
Personally, I prefer developing with OpenSSL and getting an OpenSSL
Provider certified at the company level.
However, our CTO (who comes from a cryptography company background)
insists on using their long-maintained proprietary encryption library.
It's a complex C++ implementation that cannot follow critical section
constraints at all. :)
## 2-Level Key Architecture
Our production implementation also uses a 2-level key architecture
(master key → separate smgr/wal keys). The reference implementation
uses a simplified single-key approach just for demonstration purposes.
I've been considering further key granularity (e.g., per-tablespace
or per-database keys), but there are unresolved challenges:
- Key distribution to replicas
- Some DDL operations that complete by simple file copying
Until these are solved, we're keeping the smgr key at a coarser
granularity.
I'm also exploring TPM integration for auto-login master key
protection. How does pg_tde handle master key storage and auto-login
scenarios?
## Timeline ID in IV
Good catch - I hadn't considered that. Including timeline ID would
make the IV more robust. Thank you for sharing this insight.
Best regards,
Henson
> The main difference is timing and current availability:
>
> - The hook approach is working today and can be used immediately
. - Your SMGR extensibility work provides a more comprehensive
long-term solution
I disagree with this. The SMGR patch is available since 2023/PG16 as a
patch, and it is already used by at least 3 companies I know of (Neon,
Nile, Percona), and probably also by others I don't know of. It is
available immediately.
Compared to that this proposal is something new, and more limited.
The actual advantage of this proposal is that it includes WAL, but I
still think the two should be separate discussions.
> Regarding what to protect (WAL vs heap vs both), there's flexibility depending on the organization and jurisdiction. The hook approach allows extensions to choose - you can implement only the buffer hooks if that satisfies your requirements, or add WAL hooks if needed.
My concern is that these two separate discussion about 2 extensibility
points, with different concerns by different people. One part
shouldn't stall the other, as for some, even getting half of it into
the core for PG19 would be useful.
> You're absolutely right that extension developers need to understand multiprocess architecture, memory management, critical sections, and so on.
> This is precisely why test_tde exists as a reference implementation.
The reference implementation ignores the tricky steps, like key
rotation, caching, configuration, providing a user interface, etc,
which all require knowledge of postgres internals.
> ARIA and SEED are already implemented in OpenSSL. However, Korean law requires certified implementations. Specifically, companies must use nationally-certified builds and provide the hash codes of those specific library binaries to regulators. You cannot simply use the OpenSSL version, even if the algorithm is identical.
That could be still solved by introducing an abstraction layer in the
encryption code of a TDE extension :) Encryption is only a small part
of an extension, the other parts (user interface, rotation, key
storage integrations, etc) are a much bigger part. It is still
questionable to reimplement everything because of an encryption
library difference. But I see your point, that is a bit more
difficult.
> That's a reasonable approach for SMGR-based solutions where you control the storage layer. However, with the hook approach, we don't have the ability to inject custom WAL records for encryption events.
> Currently, in a replication environment, the reference implementation requires the same key to be configured in the settings on both primary and replicas (shared key model). For future KMS integration, I'm considering mechanisms to propagate keys to replicas through external channels rather than WAL.
I originally wrote a long answer about how I don't think this is
related to where the hooks are, and then I realized that the problem
is probably completely different - and this also shows why adding a
few bits to the pages is not a good generic solution for all
extensions.
Our extension uses a 2 level key architecture, as used by most
database servers (there's a master key, and it encodes separate
internal keys, one for each database file). The proposed sample code
in your patch uses a single key, with the IV encoding the database
file. That means you want to encode which key is used for each page
instead of for each file.
So we approach how we map data/pages to keys completely differently.
But I don't think the page header addition is a good solution, because
it is specific to your implementation, not for encryption solutions in
general.
(Also, I just noticed that you forgot about timelineid in derive_iv,
you probably want to include that somehow)
Please don't top-post. We generally prefer to reply in-line, which makes it easier to follow the discussion. With top-posting I have to seek what are you responding to. On 12/29/25 03:35, Henson Choi wrote: > Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks > > Hi Tomas, > > Thank you for this critical feedback. Your concerns go to the heart of > the proposal's viability, and I appreciate your directness. > > > 1. Multiple Extensions and Hook Chaining > > You're right to question this. To be honest, I have significant doubts > about allowing multiple transformation extensions simultaneously. > > The Transform ID coordination problem is real: without a registry or > protocol between extensions, they cannot cooperate safely. Hook chaining > for read/write operations might work (extension A encrypts, extension B > compresses), but the Transform ID field creates conflicts. > > Perhaps I should be more direct: transformation hook chaining is not > realistically possible with the current design. TDE extensions would > need exclusive use of these hooks. This is a fundamental limitation I > should have stated clearly in the RFC. > Isn't that just another argument against using hooks? Chaining is what hooks do, and there's no protection against a hook being set by multiple extensions. > > 2. pd_flags Reservation - I Hope You'll Consider This > > I understand your concern about reserving pd_flags bits for extensions. > However, I'd like to ask you to consider the reasoning behind this choice. > > The 5-bit Transform ID serves a critical purpose: it allows the core to > identify the page's transformation state without attempting decryption. > This is important for: > > - Error reporting: "This page is encrypted with transform ID 5, but no > extension is loaded to handle it" > - Migration safety: Distinguishing between untransformed pages (ID=0) > and transformed pages during gradual encryption > - Crash recovery: The core can detect transformation state inconsistencies > > That said, I recognize pd_flags is precious and limited. Let me propose > an alternative approach that might better align with core principles: > The information may be crucial, but pd_flags is simply not meant to be used by extensions to store custom data. > Instead of extension-specific Transform IDs, what if we allow extensions > to reserve space at pd_upper (similar to how special space works at > pd_special)? > > The core could manage a small flag (2-3 bits) indicating "N bytes at > pd_upper are reserved for transformation metadata". By encoding N as > multiples of 2 or 4 bytes, we maximize the flag's efficiency: > > - 2 bits encoding 4-byte multiples: 0-12 bytes (sufficient for most cases) > - 3 bits encoding 4-byte multiples: 0-28 bytes (covers all reasonable needs) > - 3 bits encoding 2-byte multiples: 0-14 bytes (finer granularity) > > This approach uses minimal pd_flags bits while providing substantial > metadata space. It would: > > - Keep the flag in core control (not extension-specific) > - Allow extensions to store IV, authentication tags, key version, etc. > in a standardized location > - Be self-describing (the flag tells you how much space is reserved) > - Generalize beyond encryption (compression, checksums, etc. could use it) > > In our internal implementation, we actually add opaque bytes to > PageHeader for encryption metadata. This pd_upper approach could > formalize that pattern for extensions. > > I believe some form of page-level metadata for transformations is > necessary. Would either approach (Transform ID or pd_upper reservation) > be acceptable with the right design, or do you see fundamental issues > with page-level transformation metadata itself? > AFAICS this is pretty much exactly what this patch aimed to do (also to allow implementing TDE): https://commitfest.postgresql.org/patch/3986/ Clearly, it's not as simple as it may seem, otherwise the patch would not be WIP for 3 years. > > 3. Maintenance Burden and Test Coverage > > I deeply appreciate this concern. Having worked across various DBMS > implementations, I've seen solution vendors ship without comprehensive > regression testing - but never a database vendor. DBMS maintenance is > extraordinarily difficult, and storage errors are catastrophic. > > This is precisely why test_tde exists as a reference implementation. But > you've identified the real issue: we need much stronger test coverage > for the hooks themselves. > > The test cases should: > - Detect when core changes break hook contracts > - Verify hook behavior under all I/O paths (sync, async, error cases) > - Validate critical section safety > - Test interaction with checksums, crash recovery, replication > > I agree the current test coverage is insufficient for core inclusion. > Would expanding the test suite to cover these scenarios address your > maintenance concerns, or do you see fundamental fragility beyond what > testing can solve? > I wasn't talking about test coverage. My point is we'd have to keep this working forever, even if we choose to change how the SMGR works. Which is not entirely theoretical. > > 4. Hooks vs Transform Layer - Pragmatic Timeline > > You suggested improving SMGR extensibility rather than adding hooks. I > think you're architecturally right about the long-term direction. > > However, I want to be pragmatic about timelines: > > The hook and pd_flags approach, despite its limitations, can deliver > working TDE in the shortest time. Organizations facing regulatory > deadlines need something that works now, not in 2-3 years. > Others may see it differently, but my opinion is using pd_flags is a dead end. I realize users may wish for a solution "soon", but we're not going to accept a flawed approach because of that. Exchanging short-term benefit for long-term pain does not seem like a good trade off. > That said, your feedback has sparked a better idea: what if we think of > this not as "SMGR extension" or "hooks" but as a pluggable Transform > Layer that SMGR and WAL subsystems delegate to? > > Conceptually: > > Application Layer > | > Buffer Manager > | > +------------------+ > | Transform Layer | <-- Encryption, etc. > +------------------+ > | > SMGR / WAL > | > File I/O > > This is architecturally cleaner than scattered hooks, and more focused > than full SMGR extensibility. The Transform Layer would: > > - Provide a unified interface for data transformation > - Work across backend, frontend tools, and replication > - Handle metadata management in a standardized way > - Support encryption, compression, or other transformations > > I think this deserves its own discussion thread rather than conflating > it with the current hook proposal. Would you be interested in starting a > separate conversation about designing a Transform Layer interface for > PostgreSQL? > Maybe. But I'm not convinced it'd be great to have many parallel thread discussing approaches for the same ultimate end goal. > In the meantime, the hook approach could serve organizations with > immediate needs, and extensions could migrate to the Transform Layer > once it's stabilized. > It's not like there are no alternatives, though. We have FDE/LUKS, application-level encryption, etc. Now there's also pg_tde. FWIW the hypothetical migration would be far from trivial. > > 5. Frontend Tool Access > > Both SMGR and hook approaches face a shared limitation: frontend tools > (pg_checksums, pg_basebackup, etc.) that read files directly. > I'm not a TDE expert, but I don't see why would tools like pg_basebackup need to be aware of this at all. A basebackup is just a filesystem copy. > I previously suggested allowing initdb to specify a shared library that > both backend and frontend can load for transformation. But as I > reconsider this, it feels like it converges toward the Transform Layer > idea: a well-defined interface that any PostgreSQL component can use. > > This might be the real architectural question: not "hooks vs SMGR" but > "how should PostgreSQL provide transformation points that work across > backend, frontend, and replication boundaries?" > Maybe. I was not proposing a new "transformation" layer, though. My suggestion was entirely within the current SMGR architecture. regards -- Tomas Vondra
Please don't top-post. We generally prefer to reply in-line, which makes
it easier to follow the discussion. With top-posting I have to seek what
are you responding to.
On 12/29/25 03:35, Henson Choi wrote:
> Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
>
> Hi Tomas,
>
> Thank you for this critical feedback. Your concerns go to the heart of
> the proposal's viability, and I appreciate your directness.
>
>
> 1. Multiple Extensions and Hook Chaining
>
> You're right to question this. To be honest, I have significant doubts
> about allowing multiple transformation extensions simultaneously.
>
> The Transform ID coordination problem is real: without a registry or
> protocol between extensions, they cannot cooperate safely. Hook chaining
> for read/write operations might work (extension A encrypts, extension B
> compresses), but the Transform ID field creates conflicts.
>
> Perhaps I should be more direct: transformation hook chaining is not
> realistically possible with the current design. TDE extensions would
> need exclusive use of these hooks. This is a fundamental limitation I
> should have stated clearly in the RFC.
>
Isn't that just another argument against using hooks? Chaining is what
hooks do, and there's no protection against a hook being set by multiple
extensions.
from the hook approach to study the SMGR extensibility work first.
The chaining limitation you pointed out is fundamental - if TDE requires
exclusive access, then hooks are the wrong mechanism. I should have reviewed
existing SMGR extensibility efforts before proposing hooks.
>
> 2. pd_flags Reservation - I Hope You'll Consider This
>
> I understand your concern about reserving pd_flags bits for extensions.
> However, I'd like to ask you to consider the reasoning behind this choice.
>
> The 5-bit Transform ID serves a critical purpose: it allows the core to
> identify the page's transformation state without attempting decryption.
> This is important for:
>
> - Error reporting: "This page is encrypted with transform ID 5, but no
> extension is loaded to handle it"
> - Migration safety: Distinguishing between untransformed pages (ID=0)
> and transformed pages during gradual encryption
> - Crash recovery: The core can detect transformation state inconsistencies
>
> That said, I recognize pd_flags is precious and limited. Let me propose
> an alternative approach that might better align with core principles:
>
The information may be crucial, but pd_flags is simply not meant to be
used by extensions to store custom data.
> Instead of extension-specific Transform IDs, what if we allow extensions
> to reserve space at pd_upper (similar to how special space works at
> pd_special)?
>
> The core could manage a small flag (2-3 bits) indicating "N bytes at
> pd_upper are reserved for transformation metadata". By encoding N as
> multiples of 2 or 4 bytes, we maximize the flag's efficiency:
>
> - 2 bits encoding 4-byte multiples: 0-12 bytes (sufficient for most cases)
> - 3 bits encoding 4-byte multiples: 0-28 bytes (covers all reasonable needs)
> - 3 bits encoding 2-byte multiples: 0-14 bytes (finer granularity)
>
> This approach uses minimal pd_flags bits while providing substantial
> metadata space. It would:
>
> - Keep the flag in core control (not extension-specific)
> - Allow extensions to store IV, authentication tags, key version, etc.
> in a standardized location
> - Be self-describing (the flag tells you how much space is reserved)
> - Generalize beyond encryption (compression, checksums, etc. could use it)
>
> In our internal implementation, we actually add opaque bytes to
> PageHeader for encryption metadata. This pd_upper approach could
> formalize that pattern for extensions.
>
> I believe some form of page-level metadata for transformations is
> necessary. Would either approach (Transform ID or pd_upper reservation)
> be acceptable with the right design, or do you see fundamental issues
> with page-level transformation metadata itself?
>
AFAICS this is pretty much exactly what this patch aimed to do (also to
allow implementing TDE):
https://commitfest.postgresql.org/patch/3986/
Clearly, it's not as simple as it may seem, otherwise the patch would
not be WIP for 3 years.
Thank you - this is exactly what I needed to see. Combined with Zsolt's pointer to
>
> 3. Maintenance Burden and Test Coverage
>
> I deeply appreciate this concern. Having worked across various DBMS
> implementations, I've seen solution vendors ship without comprehensive
> regression testing - but never a database vendor. DBMS maintenance is
> extraordinarily difficult, and storage errors are catastrophic.
>
> This is precisely why test_tde exists as a reference implementation. But
> you've identified the real issue: we need much stronger test coverage
> for the hooks themselves.
>
> The test cases should:
> - Detect when core changes break hook contracts
> - Verify hook behavior under all I/O paths (sync, async, error cases)
> - Validate critical section safety
> - Test interaction with checksums, crash recovery, replication
>
> I agree the current test coverage is insufficient for core inclusion.
> Would expanding the test suite to cover these scenarios address your
> maintenance concerns, or do you see fundamental fragility beyond what
> testing can solve?
>
I wasn't talking about test coverage. My point is we'd have to keep this
working forever, even if we choose to change how the SMGR works. Which
is not entirely theoretical.
core, they become an API contract that limits PostgreSQL's ability to
refactor SMGR.
This is exactly why SMGR extensibility is the right approach - it makes
the extension points explicit and architectural, rather than scattering
>
> 4. Hooks vs Transform Layer - Pragmatic Timeline
>
> You suggested improving SMGR extensibility rather than adding hooks. I
> think you're architecturally right about the long-term direction.
>
> However, I want to be pragmatic about timelines:
>
> The hook and pd_flags approach, despite its limitations, can deliver
> working TDE in the shortest time. Organizations facing regulatory
> deadlines need something that works now, not in 2-3 years.
>
Others may see it differently, but my opinion is using pd_flags is a
dead end.
I realize users may wish for a solution "soon", but we're not going to
accept a flawed approach because of that. Exchanging short-term benefit
for long-term pain does not seem like a good trade off.
> That said, your feedback has sparked a better idea: what if we think of
> this not as "SMGR extension" or "hooks" but as a pluggable Transform
> Layer that SMGR and WAL subsystems delegate to?
>
> Conceptually:
>
> Application Layer
> |
> Buffer Manager
> |
> +------------------+
> | Transform Layer | <-- Encryption, etc.
> +------------------+
> |
> SMGR / WAL
> |
> File I/O
>
> This is architecturally cleaner than scattered hooks, and more focused
> than full SMGR extensibility. The Transform Layer would:
>
> - Provide a unified interface for data transformation
> - Work across backend, frontend tools, and replication
> - Handle metadata management in a standardized way
> - Support encryption, compression, or other transformations
>
> I think this deserves its own discussion thread rather than conflating
> it with the current hook proposal. Would you be interested in starting a
> separate conversation about designing a Transform Layer interface for
> PostgreSQL?
>
Maybe. But I'm not convinced it'd be great to have many parallel thread
discussing approaches for the same ultimate end goal.
I do wonder where bootstrap and frontend tool encryption should be
discussed - whether that belongs in the 3986 discussion or elsewhere -
but I should study that patch thoroughly first before raising the
question.
> In the meantime, the hook approach could serve organizations with
> immediate needs, and extensions could migrate to the Transform Layer
> once it's stabilized.
>
It's not like there are no alternatives, though. We have FDE/LUKS,
application-level encryption, etc. Now there's also pg_tde.
FWIW the hypothetical migration would be far from trivial.
>
> 5. Frontend Tool Access
>
> Both SMGR and hook approaches face a shared limitation: frontend tools
> (pg_checksums, pg_basebackup, etc.) that read files directly.
>
I'm not a TDE expert, but I don't see why would tools like pg_basebackup
need to be aware of this at all. A basebackup is just a filesystem copy.
mentioned was actually specific to our implementation (key storage
under PGDATA with symlinks), not a general TDE concern.
However, tools like pg_checksums that directly read buffer pages,
or tools that read WAL pages, do present a broader question: SMGR
extensibility handles backend I/O, but these frontend tools operate
outside that architecture.
This makes me wonder if a more comprehensive layer might be needed
to cover both backend (SMGR) and frontend tools. But I should study
the existing SMGR work first to see how this is currently addressed.
> I previously suggested allowing initdb to specify a shared library that> Maybe. I was not proposing a new "transformation" layer, though. My
> both backend and frontend can load for transformation. But as I
> reconsider this, it feels like it converges toward the Transform Layer
> idea: a well-defined interface that any PostgreSQL component can use.
>
> This might be the real architectural question: not "hooks vs SMGR" but
> "how should PostgreSQL provide transformation points that work across
> backend, frontend, and replication boundaries?"
>
Maybe. I was not proposing a new "transformation" layer, though. My
suggestion was entirely within the current SMGR architecture.
> suggestion was entirely within the current SMGR architecture.
Understood.
Though I wonder if WAL encryption should be part of the same
discussion, or separate. SMGR handles pages, but WAL has different
characteristics.
regards
--
Tomas Vondra
> Could you point me to where I can access the SMGR extensibility patch? > I'd like to review it properly before any further discussion. There's the hadkers discussion thread[1], the PG18 branch of our fork [2] (it has less than <20 commits on top of PG 18.1, smgr/aio changes are easy to find), and of course you can look at how pg_tde uses it[3]. But please note that none of them is 100% up to date. The hackers thread is for PG17 (no AIO part yet). And we also had some in person discussions about the patch during PgConf.Eu, which is not yet reflected even in our fork. We plan to update the mailing list thread in January. > Though I wonder if WAL encryption should be part of the same > discussion, or separate. SMGR handles pages, but WAL has different > characteristics. I think we should keep it separate, the SMGR question is much simpler than WAL. > Do you think this is a reasonable direction? Or would you prefer a > different approach? I have no preferred approach for WAL yet. Our solution in pg_tde has some good and bad points, and the approach you used here similarly has some good and bad. The main reason why we kept delaying opening a "let's add WAL hooks" discussion on the mailing list is because we weren't confident enough in our current approach. Is it good for a fork? Definitely. Is it good enough for getting it accepted into the core? Probably not. Personally I tried to come up with an approach that could be useful for something else other than tde, including some proof of concept implementation of that something. (for example wal compression, or enabling an extension to split wal into separate streams for each database) But that's not easy to do, I didn't spend too much time on it so far, and maybe not even necessary, maybe simpler is better in this case. Starting a discussion about it is definitely a good idea, but maybe the focus should be on debating/trying out different approaches instead of proposing specific solutions to be included in pg? From this point it is great that your implementation is different, because we can talk about pros/cons, maybe figure out something even better? [1]: https://www.postgresql.org/message-id/flat/CAEze2WgMySu2suO_TLvFyGY3URa4mAx22WeoEicnK%3DPCNWEMrA%40mail.gmail.com [2]: https://github.com/percona/postgres/commits/PSP_REL_18_STABLE/ [3]: https://github.com/percona/pg_tde/blob/main/src/smgr/pg_tde_smgr.c