Re: Performance implications of partitioning by UUIDv7 range in PostgreSQL v18
| От | Jonathan Reis |
|---|---|
| Тема | Re: Performance implications of partitioning by UUIDv7 range in PostgreSQL v18 |
| Дата | |
| Msg-id | CAE_7N376toqc-RBpER=p4goYvjb57xinEoCGHJ2Y70aDQJwn5w@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Performance implications of partitioning by UUIDv7 range in PostgreSQL v18 (David Rowley <dgrowleyml@gmail.com>) |
| Ответы |
Re: Performance implications of partitioning by UUIDv7 range in PostgreSQL v18
|
| Список | pgsql-performance |
Thank you all for your input on this. Here is a summary of what I have learned from you all.
Approach 1: partition on uuid_extract_timestamp(id)
Pros: No need for custom function to convert from timestamptz to uuidv7
Partitions are human-readable
Can use pg_partman
Cons: Cannot have a primary key on id
Cannot use foreign keys
Approach 2: partition on id
Pros: Can have a primary key on id
Can have foreign key constraints
Cons Require custom function to convert from timestamptz to uuidv7
Partitions ranges must be decoded to be human-readable
Can't use pg_partman (this is true?)
From a cursory glance, they both seem to perform the same with partition pruning and lookups. So, I can't decide which approach is better. I am planning on doing some load testing within our product to see if either performs better.
Approach 1: partition on uuid_extract_timestamp(id)
Pros: No need for custom function to convert from timestamptz to uuidv7
Partitions are human-readable
Can use pg_partman
Cons: Cannot have a primary key on id
Cannot use foreign keys
Approach 2: partition on id
Pros: Can have a primary key on id
Can have foreign key constraints
Cons Require custom function to convert from timestamptz to uuidv7
Partitions ranges must be decoded to be human-readable
Can't use pg_partman (this is true?)
From a cursory glance, they both seem to perform the same with partition pruning and lookups. So, I can't decide which approach is better. I am planning on doing some load testing within our product to see if either performs better.
The main thing I don't like about Approach 2 is the need for a custom uuidv7_floor function to calculate a uuidv7 from a timestamptz. It would be nice to have this baked in.
Approach 1: (thank you Greg for this)
create table message (
id uuid
-- ... plus other columns
) partition by range (uuid_extract_timestamp(id));
create table message_2025_10_22 partition of message for values from ('2025-10-22') to ('2025-10-23');
create table message_2025_10_23 partition of message for values from ('2025-10-23') to ('2025-10-24');
create table message_2025_10_24 partition of message for values from ('2025-10-24') to ('2025-10-25');
create index m_2025_10_22_id on message_2025_10_22 (uuid_extract_timestamp(id));
create index m_2025_10_23_id on message_2025_10_23 (uuid_extract_timestamp(id));
create index m_2025_10_24_id on message_2025_10_24 (uuid_extract_timestamp(id));
-- Today:
insert into message select uuidv7() from generate_series(1, 111_000);
-- Yesterday:
insert into message select uuidv7('-1 day') from generate_series(1, 222_000);
-- Tomorrow:
insert into message select uuidv7('+1 day') from generate_series(1, 333_000);
set random_page_cost = 1.1; -- SSD rulez
vacuum analyze message;
select count(id) from only message;
select count(id) from message_2025_10_22;
select count(id) from message_2025_10_23;
select count(id) from message_2025_10_24;
explain select * from message where uuid_extract_timestamp(id) = '2025-10-23 10:23:45';
explain select * from message where uuid_extract_timestamp(id)
between '2025-10-23 23:00:00' and '2025-10-24 03:00:00';
Approach 2: (thank you Greg for this, too)
create table message2 (
id uuid PRIMARY KEY
-- ... plus other columns
) partition by range (id);
create table message_2025_10_22 partition of message2 for values from (uuidv7_floor('2025-10-22')) to (uuidv7_floor('2025-10-23'));
create table message_2025_10_23 partition of message2 for values from (uuidv7_floor('2025-10-23')) to (uuidv7_floor('2025-10-24'));
create table message_2025_10_24 partition of message2 for values from (uuidv7_floor('2025-10-24')) to (uuidv7_floor('2025-10-25'));
--Primary key handles this
--create index m_2025_10_22_id on message_2025_10_22 (id);
--create index m_2025_10_23_id on message_2025_10_23 (id);
--create index m_2025_10_24_id on message_2025_10_24 (id);
-- Today:
insert into message2 select uuidv7() from generate_series(1, 111_000);
-- Yesterday:
insert into message2 select uuidv7('-1 day') from generate_series(1, 222_000);
-- Tomorrow:
insert into message2 select uuidv7('+1 day') from generate_series(1, 333_000);
set random_page_cost = 1.1; -- SSD rulez
vacuum analyze message2;
explain select * from message2 where id
between uuidv7_floor('2025-10-23 23:00:00') and uuidv7_floor('2025-10-23 23:59:59');
-- is there a standard function for this?
CREATE OR REPLACE FUNCTION uuidv7_floor(ts timestamptz)
RETURNS uuid
LANGUAGE sql
IMMUTABLE
AS $$
WITH ms AS (
SELECT floor(extract(epoch FROM ts) * 1000)::bigint AS ms
),
h AS (
SELECT lpad(to_hex(ms), 12, '0') AS h FROM ms
)
SELECT (
substr(h.h,1,8) || '-' ||
substr(h.h,9,4) || '-' ||
'7000' || '-' || -- version 7 + rand_a all zero
'8000' || '-' || -- variant '10' + rest zero
'000000000000' -- zero node
)::uuid
FROM h;
$$;
On Thu, Oct 23, 2025 at 3:54 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 24 Oct 2025 at 09:38, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> I recommend that you create a primary key on each partition rather than having one
> on the partitioned table.
It might be worth mentioning that doing that would forego having the
ability to reference the partitioned table in a foreign key
constraint.
David
В списке pgsql-performance по дате отправления: