Helping planner to chose sequential scan when it improves performance

Поиск
Список
Период
Сортировка
Hi all,

I recently started at a new firm and have been trying to help to grok certain planner behavior. A strip-down example of the sort of join we do in the database looks like this, wherein we join two tables that have about 1 million rows:

-- VACUUM (FULL, VERBOSE, ANALYZE), run the query twice first, then...
EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, SUMMARY)
SELECT
    ci.conversation_uuid,
    ci.item_uuid,
    ci.tenant_id,
    it.item_subject,
    it.item_body_start
FROM
    conversation_item AS ci
INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;

-- The necessary DDL that creates these tables and indexes is attached. I've commented out some extra stuff that isn't directly related to the above query.

Depending on config, we get different results in terms of performance (EXPLAIN output attached):

PLAN A (default config, effective cache size just shy of 15GB): 3.829 seconds. A nested loop is used to probe the hash index `conversation_item_item_hash_index` for each row of item_text. Although the cost of probing once is low, a fair amount of time passes because the operation is repeated ~1.3 million times.

PLAN B (enable_indexscan off, effective cache same as before): 3.254 seconds (~15% speedup, sometimes 30%). Both tables are scanned sequentially and conversation_item is hashed before results are combined with a hash join.

PLAN C: (random_page_cost = 8.0, instead of default 4, effective cache same as before): 2.959 (~23% speedup, sometimes 38%). Same overall plan as PLAN B, some differences in buffers and I/O. I'll note we had to get to 8.0 before we saw a change to planner behavior; 5.0, 6.0, and 7.0 were too low to make a difference.

Environment:

Postgres 15.2
Amazon RDS — db.m6g.2xlarge


Questions:
  1. In Plan A, what factors are causing the planner to select a substantially slower plan despite having recent stats about number of rows?
  2. Is there a substantial difference between the on-the-fly hash done in Plan B and Plan C compared to the hash-index used in Plan A? Can I assume they are essentially the same? Perhaps there are there differences in how they're applied?
  3. Is it common to see values for random_page_cost set as high as 8.0? We would of course need to investigate whether we see a net positive or net negative impact on other queries, to adopt this as a general setting, but is it a proposal we should actually consider?
  4. Maybe we are barking up the wrong tree with the previous questions. Are there other configuration parameters we should consider first to improve performance in situations like the one illustrated?
  5. Are there other problems with our schema, query, or plans shown here? Other approaches (or tools/analyses) we should consider?
Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: Torsten Förtsch
Дата:
Сообщение: Exclusion constraint with negated operator?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Exclusion constraint with negated operator?