PostgreSQL on S3-backed Block Storage with Near-Local Performance
От | Pierre Barre |
---|---|
Тема | PostgreSQL on S3-backed Block Storage with Near-Local Performance |
Дата | |
Msg-id | a9fe5ddb-9685-4139-bc1f-88161a7a4da3@app.fastmail.com обсуждение исходный текст |
Ответы |
Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance
Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance |
Список | pgsql-general |
Hi everyone, I wanted to share a project I've been working on that enables PostgreSQL to run on S3 storage while maintaining performancecomparable to local NVMe. The approach uses block-level access rather than trying to map filesystem operationsto S3 objects. ZeroFS: https://github.com/Barre/ZeroFS # The Architecture ZeroFS provides NBD (Network Block Device) servers that expose S3 storage as raw block devices. PostgreSQL runs unmodifiedon ZFS pools built on these block devices: PostgreSQL -> ZFS -> NBD -> ZeroFS -> S3 By providing block-level access and leveraging ZFS's caching capabilities (L2ARC), we can achieve microsecond latencies despitethe underlying storage being in S3. ## Performance Results Here are pgbench results from PostgreSQL running on this setup: ### Read/Write Workload ``` postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 100000 example pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1)) starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 50 query mode: simple number of clients: 50 number of threads: 15 maximum number of tries: 1 number of transactions per client: 100000 number of transactions actually processed: 5000000/5000000 number of failed transactions: 0 (0.000%) latency average = 0.943 ms initial connection time = 48.043 ms tps = 53041.006947 (without initial connection time) ``` ### Read-Only Workload ``` postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 100000 -S example pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1)) starting vacuum...end. transaction type: <builtin: select only> scaling factor: 50 query mode: simple number of clients: 50 number of threads: 15 maximum number of tries: 1 number of transactions per client: 100000 number of transactions actually processed: 5000000/5000000 number of failed transactions: 0 (0.000%) latency average = 0.121 ms initial connection time = 53.358 ms tps = 413436.248089 (without initial connection time) ``` These numbers are with 50 concurrent clients and the actual data stored in S3. Hot data is served from ZFS L2ARC and ZeroFS'smemory caches, while cold data comes from S3. ## How It Works 1. ZeroFS exposes NBD devices (e.g., /dev/nbd0) that PostgreSQL/ZFS can use like any other block device 2. Multiple cache layers hide S3 latency: a. ZFS ARC/L2ARC for frequently accessed blocks b. ZeroFS memory cache for metadata and hot dataZeroFS exposes NBD devices (e.g., /dev/nbd0) that PostgreSQL/ZFS can uselike any other block device c. Optional local disk cache 3. All data is encrypted (ChaCha20-Poly1305) before hitting S3 4. Files are split into 128KB chunks for insertion into ZeroFS' LSM-tree ## Geo-Distributed PostgreSQL Since each region can run its own ZeroFS instance, you can create geographically distributed PostgreSQL setups. Example architectures: Architecture 1 PostgreSQL Client | | SQL queries | +--------------+ | PG Proxy | | (HAProxy/ | | PgBouncer) | +--------------+ / \ / \ Synchronous Synchronous Replication Replication / \ / \ +---------------+ +---------------+ | PostgreSQL 1 | | PostgreSQL 2 | | (Primary) |◄------►| (Standby) | +---------------+ +---------------+ | | | POSIX filesystem ops | | | +---------------+ +---------------+ | ZFS Pool 1 | | ZFS Pool 2 | | (3-way mirror)| | (3-way mirror)| +---------------+ +---------------+ / | \ / | \ / | \ / | \ NBD:10809 NBD:10810 NBD:10811 NBD:10812 NBD:10813 NBD:10814 | | | | | | +--------++--------++--------++--------++--------++--------+ |ZeroFS 1||ZeroFS 2||ZeroFS 3||ZeroFS 4||ZeroFS 5||ZeroFS 6| +--------++--------++--------++--------++--------++--------+ | | | | | | | | | | | | S3-Region1 S3-Region2 S3-Region3 S3-Region4 S3-Region5 S3-Region6 (us-east) (eu-west) (ap-south) (us-west) (eu-north) (ap-east) Architecture 2: PostgreSQL Primary (Region 1) ←→ PostgreSQL Standby (Region 2) \ / \ / Same ZFS Pool (NBD) | 6 Global ZeroFS | S3 Regions The main advantages I see are: 1. Dramatic cost reduction for large datasets 2. Simplified geo-distribution 3. Infinite storage capacity 4. Built-in encryption and compression Looking forward to your feedback and questions! Best, Pierre P.S. The full project includes a custom NFS filesystem too.
В списке pgsql-general по дате отправления: