Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance
От | Pierre Barre |
---|---|
Тема | Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance |
Дата | |
Msg-id | 8188513c-e089-4273-b2be-16dd0a5a0a80@app.fastmail.com обсуждение исходный текст |
Ответ на | Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance (Seref Arikan <serefarikan@gmail.com>) |
Ответы |
Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance
Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance |
Список | pgsql-general |
Hi Seref,
For the benchmarks, I used Hetzner's cloud service with the following setup:
- A Hetzner s3 bucket in the FSN1 region
- A virtual machine of type ccx63 48 vCPU 192 GB memory
- 3 ZeroFS nbd devices (same s3 bucket)
- A ZFS stripped pool with the 3 devices
- 200GB zfs L2ARC
- Postgres configured accordingly memory-wise as well as with synchronous_commit = off, wal_init_zero = off and wal_recycle = off.
Best,
Pierre
On Fri, Jul 18, 2025, at 12:42, Seref Arikan wrote:
Sorry, this was meant to go to the whole group:Very interesting!. Great work. Can you clarify how exactly you're running postgres in your tests? A specific AWS service? What's the test infrastructure that sits above the file system?On Thu, Jul 17, 2025 at 11:59 PM Pierre Barre <pierre@barre.sh> wrote:Hi everyone,I wanted to share a project I've been working on that enables PostgreSQL to run on S3 storage while maintaining performance comparable to local NVMe. The approach uses block-level access rather than trying to map filesystem operations to S3 objects.ZeroFS: https://github.com/Barre/ZeroFS# The ArchitectureZeroFS provides NBD (Network Block Device) servers that expose S3 storage as raw block devices. PostgreSQL runs unmodified on ZFS pools built on these block devices:PostgreSQL -> ZFS -> NBD -> ZeroFS -> S3By providing block-level access and leveraging ZFS's caching capabilities (L2ARC), we can achieve microsecond latencies despite the underlying storage being in S3.## Performance ResultsHere are pgbench results from PostgreSQL running on this setup:### Read/Write Workload```postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 100000 examplepgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1))starting vacuum...end.transaction type: <builtin: TPC-B (sort of)>scaling factor: 50query mode: simplenumber of clients: 50number of threads: 15maximum number of tries: 1number of transactions per client: 100000number of transactions actually processed: 5000000/5000000number of failed transactions: 0 (0.000%)latency average = 0.943 msinitial connection time = 48.043 mstps = 53041.006947 (without initial connection time)```### Read-Only Workload```postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 100000 -S examplepgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1))starting vacuum...end.transaction type: <builtin: select only>scaling factor: 50query mode: simplenumber of clients: 50number of threads: 15maximum number of tries: 1number of transactions per client: 100000number of transactions actually processed: 5000000/5000000number of failed transactions: 0 (0.000%)latency average = 0.121 msinitial connection time = 53.358 mstps = 413436.248089 (without initial connection time)```These numbers are with 50 concurrent clients and the actual data stored in S3. Hot data is served from ZFS L2ARC and ZeroFS's memory caches, while cold data comes from S3.## How It Works1. ZeroFS exposes NBD devices (e.g., /dev/nbd0) that PostgreSQL/ZFS can use like any other block device2. Multiple cache layers hide S3 latency:a. ZFS ARC/L2ARC for frequently accessed blocksb. ZeroFS memory cache for metadata and hot dataZeroFS exposes NBD devices (e.g., /dev/nbd0) that PostgreSQL/ZFS can use like any other block devicec. Optional local disk cache3. All data is encrypted (ChaCha20-Poly1305) before hitting S34. Files are split into 128KB chunks for insertion into ZeroFS' LSM-tree## Geo-Distributed PostgreSQLSince each region can run its own ZeroFS instance, you can create geographically distributed PostgreSQL setups.Example architectures:Architecture 1PostgreSQL Client|| SQL queries|+--------------+| PG Proxy || (HAProxy/ || PgBouncer) |+--------------+/ \/ \Synchronous SynchronousReplication Replication/ \/ \+---------------+ +---------------+| PostgreSQL 1 | | PostgreSQL 2 || (Primary) |◄------►| (Standby) |+---------------+ +---------------+| || POSIX filesystem ops || |+---------------+ +---------------+| ZFS Pool 1 | | ZFS Pool 2 || (3-way mirror)| | (3-way mirror)|+---------------+ +---------------+/ | \ / | \/ | \ / | \NBD:10809 NBD:10810 NBD:10811 NBD:10812 NBD:10813 NBD:10814| | | | | |+--------++--------++--------++--------++--------++--------+|ZeroFS 1||ZeroFS 2||ZeroFS 3||ZeroFS 4||ZeroFS 5||ZeroFS 6|+--------++--------++--------++--------++--------++--------+| | | | | || | | | | |S3-Region1 S3-Region2 S3-Region3 S3-Region4 S3-Region5 S3-Region6(us-east) (eu-west) (ap-south) (us-west) (eu-north) (ap-east)Architecture 2:PostgreSQL Primary (Region 1) ←→ PostgreSQL Standby (Region 2)\ /\ /Same ZFS Pool (NBD)|6 Global ZeroFS|S3 RegionsThe main advantages I see are:1. Dramatic cost reduction for large datasets2. Simplified geo-distribution3. Infinite storage capacity4. Built-in encryption and compressionLooking forward to your feedback and questions!Best,PierreP.S. The full project includes a custom NFS filesystem too.
В списке pgsql-general по дате отправления: