Обсуждение: Query planner instability on large append-only table with heavy same-day inserts
Query planner instability on large append-only table with heavy same-day inserts
От
Giriraj Sharma
Дата:
Environment
PostgreSQL 14 / 15 tested.
RDS / Aurora Postgres in production.
Primary key index (effective_date, idempotency_id).
Autovacuum enabled.
SQLC (Go) for query generation.
________________________________
Context
We are running into query planner instability on an append-only table that records daily commit information.
The workload is heavily skewed toward the current day’s data, and query performance deteriorates periodically until we run ANALYZE.
________________________________
Table Definition
CREATE TABLE sample_table (
idempotency_id transactions.string NOT NULL,
effective_date date NOT NULL,
created_at timestamptz DEFAULT now() NOT NULL,
CONSTRAINT commits_v3_pk PRIMARY KEY (effective_date, idempotency_id)
);
________________________________
Query Pattern
All our lookups are parameterized (SQLC, Go):
SELECT idempotency_id
FROM sample_table
WHERE effective_date = @effective_date
AND idempotency_id IN (
SELECT unnest(@idempotency_ids::transactions.string[])
);
Typical characteristics:
@effective_date almost always = current date.
@idempotency_ids varies between 2–50 items off-peak and 800–1000 items at peak hours.
Query is latency-sensitive (expected 5–10 ms).
________________________________
Workload Characteristics
~10–12 million inserts per weekday, almost entirely for the current date.
Practically append-only — no updates or deletes.
No weekend inserts.
Occasional rare inserts for past or future dates (late or early trades).
Retention: ~3 years of data (~1000 days × 10 M = 10+ billion rows).
PostgreSQL 14+.
________________________________
Observed Behavior
Immediately after ANALYZE, this query uses an Index Only Scan on the primary key (effective_date, idempotency_id) with stable latency around 5–10 ms.
After several days (5–7 typically), the query planner flips to a Sequential Scan, and latency jumps to 2–30 seconds.
Running ANALYZE transactions.commits_v3; restores performance instantly.
We currently run manual ANALYZE twice a day via pg_cron, which helps but doesn’t fully guarantee stability (especially in test environments where insert patterns are more random).
________________________________
What We’ve Tried
Manual and scheduled ANALYZE runs (twice a day, each run takes ~30s) → improves performance but not sustainable long-term.
Verified query execution plans before and after ANALYZE — planner switches from Index Only Scan to Seq Scan as statistics become stale.
Confirmed table’s autovacuum is running (last one occurred 15 days ago), but its frequency isn’t sufficient to keep stats current during high insert periods. Could this be an issue at all given that we do run ANALYZE at-least twice ?
_______________________________
Problem Summary
The planner’s row-count estimates for effective_date and idempotency_id become inaccurate as we continuously append to “today’s” date.
The result is plan instability (index scan ↔ sequential scan) until statistics are refreshed.
We’re looking for a solution that keeps plans stable without manual ANALYZE as data volume scales.
________________________________
Questions for the Community
1. Partitioning
Would daily range partitioning by effective_date (≈ 1000 partitions for 3 years) be the right long-term approach here?
Given that inserts and queries almost always target today’s partition, will partition pruning and per-partition statistics fully eliminate the stale-statistics problem? Are there known performance implications of maintaining ~1000 daily partitions at this scale (10 M/day)? We occasionally receive backdated or future-dated inserts — can such out-of-range values be handled efficiently (e.g., by creating partitions on the fly)?
2. Autovacuum / Analyze Tuning
If we stay with a single table, what are practical per-table autovacuum settings to ensure frequent ANALYZE even as total row count grows into billions?
Would it make sense to use:
ALTER TABLE transactions.commits_v3
SET (
autovacuum_analyze_scale_factor = 0,
autovacuum_analyze_threshold = 50000,
autovacuum_vacuum_scale_factor = 0.05,
autovacuum_vacuum_threshold = 10000
);
to decouple analyze frequency from table size?
Should we also experiment with raising the statistics target for effective_date from 100 to 1000 using:
ALTER TABLE transactions.commits_v3
ALTER COLUMN effective_date SET STATISTICS 1000;
to improve the planner’s histogram accuracy for the date distribution?
3. Best Practices
Are there best practices or proven patterns for append-only, time-series–like workloads that insert heavily into one day and read from the same day?
Is there a known best way to make Postgres’s planner more resilient to temporary statistic drift for parameterized queries like ours?
________________________________
Goal
We’d like a “set it and forget it” architecture — either through partitioning or robust autovacuum tuning — where:
The planner always chooses the index scan for same-day queries.
We no longer need manual ANALYZE (that runs via pg_cron).
Query latency remains in the 5–10 ms range, even as total data volume grows into billions of rows.
________________________________
Any recommendations, benchmark references, or production-proven strategies for this workload would be highly appreciated.
--
Giriraj Sharma
about.me/girirajsharma
PostgreSQL 14 / 15 tested.
RDS / Aurora Postgres in production.
Primary key index (effective_date, idempotency_id).
Autovacuum enabled.
SQLC (Go) for query generation.
________________________________
Context
We are running into query planner instability on an append-only table that records daily commit information.
The workload is heavily skewed toward the current day’s data, and query performance deteriorates periodically until we run ANALYZE.
________________________________
Table Definition
CREATE TABLE sample_table (
idempotency_id transactions.string NOT NULL,
effective_date date NOT NULL,
created_at timestamptz DEFAULT now() NOT NULL,
CONSTRAINT commits_v3_pk PRIMARY KEY (effective_date, idempotency_id)
);
________________________________
Query Pattern
All our lookups are parameterized (SQLC, Go):
SELECT idempotency_id
FROM sample_table
WHERE effective_date = @effective_date
AND idempotency_id IN (
SELECT unnest(@idempotency_ids::transactions.string[])
);
Typical characteristics:
@effective_date almost always = current date.
@idempotency_ids varies between 2–50 items off-peak and 800–1000 items at peak hours.
Query is latency-sensitive (expected 5–10 ms).
________________________________
Workload Characteristics
~10–12 million inserts per weekday, almost entirely for the current date.
Practically append-only — no updates or deletes.
No weekend inserts.
Occasional rare inserts for past or future dates (late or early trades).
Retention: ~3 years of data (~1000 days × 10 M = 10+ billion rows).
PostgreSQL 14+.
________________________________
Observed Behavior
Immediately after ANALYZE, this query uses an Index Only Scan on the primary key (effective_date, idempotency_id) with stable latency around 5–10 ms.
After several days (5–7 typically), the query planner flips to a Sequential Scan, and latency jumps to 2–30 seconds.
Running ANALYZE transactions.commits_v3; restores performance instantly.
We currently run manual ANALYZE twice a day via pg_cron, which helps but doesn’t fully guarantee stability (especially in test environments where insert patterns are more random).
________________________________
What We’ve Tried
Manual and scheduled ANALYZE runs (twice a day, each run takes ~30s) → improves performance but not sustainable long-term.
Verified query execution plans before and after ANALYZE — planner switches from Index Only Scan to Seq Scan as statistics become stale.
Confirmed table’s autovacuum is running (last one occurred 15 days ago), but its frequency isn’t sufficient to keep stats current during high insert periods. Could this be an issue at all given that we do run ANALYZE at-least twice ?
_______________________________
Problem Summary
The planner’s row-count estimates for effective_date and idempotency_id become inaccurate as we continuously append to “today’s” date.
The result is plan instability (index scan ↔ sequential scan) until statistics are refreshed.
We’re looking for a solution that keeps plans stable without manual ANALYZE as data volume scales.
________________________________
Questions for the Community
1. Partitioning
Would daily range partitioning by effective_date (≈ 1000 partitions for 3 years) be the right long-term approach here?
Given that inserts and queries almost always target today’s partition, will partition pruning and per-partition statistics fully eliminate the stale-statistics problem? Are there known performance implications of maintaining ~1000 daily partitions at this scale (10 M/day)? We occasionally receive backdated or future-dated inserts — can such out-of-range values be handled efficiently (e.g., by creating partitions on the fly)?
2. Autovacuum / Analyze Tuning
If we stay with a single table, what are practical per-table autovacuum settings to ensure frequent ANALYZE even as total row count grows into billions?
Would it make sense to use:
ALTER TABLE transactions.commits_v3
SET (
autovacuum_analyze_scale_factor = 0,
autovacuum_analyze_threshold = 50000,
autovacuum_vacuum_scale_factor = 0.05,
autovacuum_vacuum_threshold = 10000
);
to decouple analyze frequency from table size?
Should we also experiment with raising the statistics target for effective_date from 100 to 1000 using:
ALTER TABLE transactions.commits_v3
ALTER COLUMN effective_date SET STATISTICS 1000;
to improve the planner’s histogram accuracy for the date distribution?
3. Best Practices
Are there best practices or proven patterns for append-only, time-series–like workloads that insert heavily into one day and read from the same day?
Is there a known best way to make Postgres’s planner more resilient to temporary statistic drift for parameterized queries like ours?
________________________________
Goal
We’d like a “set it and forget it” architecture — either through partitioning or robust autovacuum tuning — where:
The planner always chooses the index scan for same-day queries.
We no longer need manual ANALYZE (that runs via pg_cron).
Query latency remains in the 5–10 ms range, even as total data volume grows into billions of rows.
________________________________
Any recommendations, benchmark references, or production-proven strategies for this workload would be highly appreciated.
--
Giriraj Sharma
about.me/girirajsharma
Re: Query planner instability on large append-only table with heavy same-day inserts
От
Laurenz Albe
Дата:
On Tue, 2025-10-28 at 12:06 +0530, Giriraj Sharma wrote: > We are running into query planner instability on an append-only table that records daily commit information. > The workload is heavily skewed toward the current day’s data, and query performance deteriorates periodically until werun ANALYZE. The best approach is to run autoanalyze often enough, either by reducing autovacuum_analyze_scale_factor for the table or (as you suggested) by setting the parameter to 0 and using only autovacuum_analyze_threshold. There is no way to force a certain plan in PostgreSQL, unless you are using the pg_hint_plan extension. Yours, Laurenz Albe
Re: Query planner instability on large append-only table with heavy same-day inserts
От
Greg Sabino Mullane
Дата:
On Tue, Oct 28, 2025 at 2:36 AM Giriraj Sharma <giriraj.sharma27@gmail.com> wrote:
Would daily range partitioning by effective_date (≈ 1000 partitions for 3 years) be the right long-term approach here?
If you are querying based on dates, yes. Does not need to be daily, could do monthly for example. A retention policy of 3 years is a great candidate for partitioning.
Given that inserts and queries almost always target today’s partition, will partition pruning and per-partition statistics fully eliminate the stale-statistics problem? Are there known performance implications of maintaining ~1000 daily partitions at this scale (10 M/day)? We occasionally receive backdated or future-dated inserts — can such out-of-range values be handled efficiently (e.g., by creating partitions on the fly)?
It will help, don't know about eliminate. I would not do daily unless it is really needed, that's a lot of partitions. Monthly to start with. You cannot create partitions on the fly, but you can have them go to a default partition and sort them out later. Or pre-create a bunch of partitions.
If we stay with a single table, what are practical per-table autovacuum settings to ensure frequent ANALYZE even as total row count grows into billions?
Turn off autovacuum for that table. Use cron to run vacuum and analyze hourly (or some frequency that ensures good plans). If analyze is taking too long, it can be done per-column as well, although that won't help much if your sample table is representative. But if you have a lot of other columns with stable values, you could analyze those less often. Measure and see.
Should we also experiment with raising the statistics target for effective_date from 100 to 1000 using:
I don't think that will matter if your analyze is already giving you index-only scans.
Are there best practices or proven patterns for append-only, time-series–like workloads that insert heavily into one day and read from the same day?
Partitioning. Ensure fillfactor is 100%. Minimal indexes.
Is there a known best way to make Postgres’s planner more resilient to temporary statistic drift for parameterized queries like ours?
Not really, other than giving Postgres updated stats via frequent analyze.
PostgreSQL 14 / 15 tested.
Moving to a newer version is always helpful too.
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support