Slow recursive CTE query questions, with row estimate and n_distinct issues

Поиск
Список
Период
Сортировка
От Christopher Baines
Тема Slow recursive CTE query questions, with row estimate and n_distinct issues
Дата
Msg-id 87sg7q9e7k.fsf@cbaines.net
обсуждение исходный текст
Ответы Re: Slow recursive CTE query questions, with row estimate and n_distinct issues  (Michael Lewis <mlewis@entrata.com>)
Re: Slow recursive CTE query questions, with row estimate and n_distinct issues  (Greg Spiegelberg <gspiegelberg@gmail.com>)
Список pgsql-performance
Hi,

I have a performance problem with a query. I've uploaded it, along with
the EXPLAIN ANALYZE output here [1].

1: https://explain.depesz.com/s/w5vP

I think the query is taking longer than I'd like, as PostgreSQL is not
generating a great plan, in particular the estimated rows in parts of
the plan are way off from the actual number of rows produced, and I
think PostgreSQL might pick a better approach if it made more reasonable
estimates.

Looking at the row numbers on [1], I think things start to go wrong at
row 11. The first part of the recursive CTE is estimated to generate
22,122 rows, and this isn't far off the 15,670 rows it generates. The
WorkTable scan expects to generate 10x that though, at 221,220. Maybe
that's reasonable, I'm unsure?

Things seem to continue getting out of hand from this point. The nested
loop on line 10 generates 232x less rows than expected, and this results
in an overall overestimate of 4,317x.

My theory as to why the row estimates is way off is that the planner is
using the n_distinct counts for the derivation_inputs table, and they
are way off [2]. I've set the stats target to the highest value for this
table, so I'm unsure what I can do to improve these estimates?

I've included relevant numbers for the two important tables below
[2][3], as well as descriptions of the tables [4][5].

Does anyone know if my theory as to why the row estimates in the plan is
right, or have any ideas as how I can make the query more performant?

Thanks,

Chris


2:
derivation_inputs:
  COUNT(*):  285422539
  reltuples: 285422528

  derivation_id:
    COUNT(DISTINCT): 7508610
    n_distinct:      4336644 (~57% of the true value)

  derivation_output_id:
    COUNT(DISTINCT): 5539406
    n_distinct:      473762 (~8% of the true value)

derivation_outputs:
  COUNT(*):  8206019
  reltuples: 8205873

3:
guix_data_service=> SELECT attname, n_distinct::integer FROM pg_stats WHERE tablename = 'derivation_outputs';
           attname            | n_distinct
------------------------------+------------
 derivation_id                |         -1
 name                         |         54
 derivation_output_details_id |     372225
 id                           |         -1
(4 rows)


4:
guix_data_service@[local]:5432/patches_guix_data_service> \d+ derivation_inputs
                               Table "guix_data_service.derivation_inputs"
┌──────────────────────┬─────────┬───────────┬──────────┬─────────┬─────────┬──────────────┬─────────────┐
│        Column        │  Type   │ Collation │ Nullable │ Default │ Storage │ Stats target │ Description │
├──────────────────────┼─────────┼───────────┼──────────┼─────────┼─────────┼──────────────┼─────────────┤
│ derivation_id        │ integer │           │ not null │         │ plain   │ 10000        │             │
│ derivation_output_id │ integer │           │ not null │         │ plain   │ 10000        │             │
└──────────────────────┴─────────┴───────────┴──────────┴─────────┴─────────┴──────────────┴─────────────┘
Indexes:
    "derivation_inputs_pkey" PRIMARY KEY, btree (derivation_id, derivation_output_id) WITH (fillfactor='100')
    "derivation_inputs_derivation_output_id_idx" btree (derivation_output_id) WITH (fillfactor='100')
Foreign-key constraints:
    "derivation_id_fk" FOREIGN KEY (derivation_id) REFERENCES derivations(id)
    "derivation_output_id_fk" FOREIGN KEY (derivation_output_id) REFERENCES derivation_outputs(id)
Access method: heap
Options: autovacuum_vacuum_scale_factor=0.01, autovacuum_analyze_scale_factor=0.01


5:
guix_data_service@[local]:5432/patches_guix_data_service> \d+ derivation_outputs
                                                   Table "guix_data_service.derivation_outputs"

┌──────────────────────────────┬───────────────────┬───────────┬──────────┬──────────────────────────────┬──────────┬──────────────┬─────────────┐
│            Column            │       Type        │ Collation │ Nullable │           Default            │ Storage  │
Statstarget │ Description │ 

├──────────────────────────────┼───────────────────┼───────────┼──────────┼──────────────────────────────┼──────────┼──────────────┼─────────────┤
│ derivation_id                │ integer           │           │ not null │                              │ plain    │
          │             │ 
│ name                         │ character varying │           │ not null │                              │ extended │
          │             │ 
│ derivation_output_details_id │ integer           │           │ not null │                              │ plain    │
          │             │ 
│ id                           │ integer           │           │ not null │ generated always as identity │ plain    │
          │             │ 

└──────────────────────────────┴───────────────────┴───────────┴──────────┴──────────────────────────────┴──────────┴──────────────┴─────────────┘
Indexes:
    "derivation_outputs_pkey" PRIMARY KEY, btree (derivation_id, name)
    "derivation_outputs_derivation_id_idx" btree (derivation_id)
    "derivation_outputs_derivation_output_details_id_idx" btree (derivation_output_details_id)
    "derivation_outputs_unique_id" UNIQUE CONSTRAINT, btree (id)
Foreign-key constraints:
    "derivation_outputs_derivation_id_fk" FOREIGN KEY (derivation_id) REFERENCES derivations(id)
    "derivation_outputs_derivation_output_details_id_fk" FOREIGN KEY (derivation_output_details_id) REFERENCES
derivation_output_details(id)
Referenced by:
    TABLE "derivation_inputs" CONSTRAINT "derivation_output_id_fk" FOREIGN KEY (derivation_output_id) REFERENCES
derivation_outputs(id)
Access method: heap
Options: autovacuum_vacuum_scale_factor=0.01, autovacuum_analyze_scale_factor=0.01

Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Conundrum with scaling out of bottleneck with hot standby, PgPool-II, etc.
Следующее
От: Michael Lewis
Дата:
Сообщение: Re: Slow recursive CTE query questions, with row estimate and n_distinct issues