Обсуждение: pg12 - partition by column that might have null values
Table "public.object_revision"
Column | Type | Collation | Nullable | Default
-------------+-----------------------------+-----------+----------+-----------------------------------------------
id | integer | | not null | nextval('mariel_dates_test_id_seq'::regclass)
end_time | timestamp without time zone | | |
object_hash | text | | |
Indexes:
"id_pk" PRIMARY KEY, btree (id)
ERROR: column "end_time" contains null values
Mariel Cherkassky wrote:
> In PG12 I'm trying to create partitions by range on a date column
> that might be null (indicates it is the most recent version of the
> object). My PK has to include the partition column, therefore I'm
> getting an error that I cant create a primary key with the specific
> column because it has null values.
>
> For example :
> \d object_revision
> Table
> "public.object_revision"
> Column | Type | Collation | Nullable |
> Default
> -------------+-----------------------------+-----------+----------+
> -----------------------------------------------
> id | integer | | not null |
> nextval('mariel_dates_test_id_seq'::regclass)
> end_time | timestamp without time zone | | |
> object_hash | text | | |
> Indexes:
> "id_pk" PRIMARY KEY, btree (id)
>
> Lets say that the same object (object_hash) can have many revisions,
> the end_time is the time it was last updated. I'm trying to create
> this table as a range partition on the end_time. However, when I try
> to add the pk I'm getting an error :
> ALTER TABLE object_revision ADD CONSTRAINT
> object_revision_id_end_time PRIMARY KEY (id,end_time);
> ERROR: column "end_time" contains null values
>
> does someone familiar with a workaround ? I know that in postgresql
> as part of the primary key definition unique and not null constraints
> are enforced on each column and not on both of them. However, this
> might be problematic with pg12 partitions..
Can "end_time" ever be modified?
If yes, it is a bad choice for a partitioning column.
If the NULL in "end_time" is meant to signify "no end", use the
value "infinity" instead of NULL. Then you can define the column
as NOT NULL.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
From: Mariel Cherkassky <mariel.cherkassky@gmail.com>
Sent: Wednesday, October 02, 2019 12:37 AM
Whenever I have a new revision of that object, I update the end_time of the latest revision to be now() and I add a new record of that object with end_date null.
The null value is used to find most recent revisions of objects..
Thanks for the suggestion of infinity ! I'll try it.
My partitioning table design model always uses a partitioning column that is 100% static since that guarantees that rows are not constantly moving between partitions (with index update overhead etc). In this scenario I’d use a “StartTime” column to anchor the row in a partition. The relatively few rows with a null EndTime don’t need the power of partitioning, just an index to find them.
Mike Sofen
>but the start time doesnt indicates that the object is the most recent, it just indicates when the object was added to your table…
>on each partition I'll create range partition on the end_date so that I can search for revisions faster.
I believe you are confusing data storage with query optimization. Rarely would there be more updated rows than aged/stable rows…in the normal system, having even 3% of the data in churn (updateable) state would be unusual and your description of the data dynamics on this table said that a row updated once, gets the end_date set and then a new row is created.
To me, that says, put an index on end_date so you can find/query them quickly, and create partitions on a static date so the rows (and indexes) aren’t always being updated.
Mike Sofen