Re: Query Performance / Planner estimate off
От | Mats Olsen |
---|---|
Тема | Re: Query Performance / Planner estimate off |
Дата | |
Msg-id | 7f586824-0d53-0f84-15f1-f460dbc414b5@duneanalytics.com обсуждение исходный текст |
Ответ на | Re: Query Performance / Planner estimate off (Justin Pryzby <pryzby@telsasoft.com>) |
Список | pgsql-performance |
On 10/22/20 8:37 AM, Justin Pryzby wrote: > On Wed, Oct 21, 2020 at 04:42:02PM +0200, Mats Olsen wrote: >> On 10/21/20 2:38 PM, Sebastian Dressler wrote: >>>> On 20. Oct 2020, at 11:37, Mats Julian Olsen <mats@duneanalytics.com >>>> >>>> [...] >>>> >>>> 1) Vanilla plan (16 min) : https://explain.depesz.com/s/NvDR >>>> <https://explain.depesz.com/s/NvDR> >>>> 2) enable_nestloop=off (4 min): https://explain.depesz.com/s/buKK >>>> <https://explain.depesz.com/s/buKK> >>>> 3) enable_nestloop=off; enable_seqscan=off (2 min): >>>> https://explain.depesz.com/s/0WXx >>>> <https://explain.depesz.com/s/0WXx> >>>> >>>> How can I get Postgres not to loop over 12M rows? >>> I looked at the plans and your config and there are some thoughts I'm >>> having: >>> >>> - The row estimate is off, as you possibly noticed. This can be possibly >>> solved by raising `default_statistics_target` to e.g. 2500 (we typically >>> use that) and run ANALYZE >> I've `set default_statistics_target=2500` and ran analyze on both tables >> involved, unfortunately the plan is the same. The columns we use for joining >> here are hashes and we expect very few duplicates in the tables. Hence I >> think extended statistics (storing most common values and histogram bounds) >> aren't useful for this kind of data. Would you say the same thing? > In postgres, extended statistics means "MV stats objects", not MCV+histogram, > which are "simple statistics", like ndistinct. > > Your indexes maybe aren't ideal for this query, as mentioned. > The indexes that do exist might also be inefficient, due to being unclustered, > or bloated, or due to multiple columns. This table is append-only, i.e. no updates. The partitions are clustered on a btree index on block_time ` "transactions_p500000_block_time_idx" btree (block_time) CLUSTER > > These look redundant (which doesn't matter for this the query): > > Partition key: RANGE (block_number) > Indexes: > "transactions_block_number_btree" btree (block_number DESC) > "transactions_block_number_hash_key" UNIQUE CONSTRAINT, btree (block_number, hash) > "transactions_block_number_time" btree (hash, block_number) > > Maybe that would be an index just on "hash", which might help here. > > Possibly you'd want to try to use a BRIN index on timestamp (or maybe > block_number?). Yeah this could be a good idea, but the size of this table doesn't let me add any indexes while it's online. I'll revisit these the next time we redeploy the database. > > Maybe you'd want to VACUUM the table to allow index-only scan on the hash > columns ? > > Maybe you'd want to check if reindexing reduces the index size ? We don't know > if the table gets lots of UPDATE/DELETE or if any of the columns have high > logical vs physical "correlation". > https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram > > Have you ANALYZED the partitioned parent recently ? > This isn't handled by autovacuum. As mentioned above there aren't any updates or deletes to this table. Both tables have been ANALYZEd. I ran that query and the output is here https://gist.github.com/mewwts/86ef43ff82120e104a654cd7fbb5ec06. I ran it for the two specific columns and all partitions for the transactions table, and for all columns on "Pair_evt_Mint". Does these values tell you anything?
В списке pgsql-performance по дате отправления: