Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
От | Andrei Lepikhov |
---|---|
Тема | Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment |
Дата | |
Msg-id | 11c709e1-9b17-4c00-9ead-93609a09b9db@gmail.com обсуждение исходный текст |
Ответ на | Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment (David Rowley <dgrowleyml@gmail.com>) |
Ответы |
Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
|
Список | pgsql-hackers |
On 3/23/25 22:16, David Rowley wrote: > On Fri, 21 Mar 2025 at 22:02, Andrei Lepikhov <lepihov@gmail.com> wrote: > Can you explain why "Estimated Capacity" and "Estimated Distinct > Lookup Keys" don't answer that? If there are more distinct lookup > keys than there is capacity to store them, then some will be evicted. I wouldn't say these parameters don't answer. I try to debate how usable they are. To be more practical, let me demonstrate the example: EXPLAIN (COSTS OFF) SELECT * FROM t1,t2 WHERE t1.x=t2.x; Nested Loop (cost=0.44..7312.65 rows=211330 width=33) -> Seq Scan on t1 (cost=0.00..492.00 rows=30000 width=22) -> Memoize (cost=0.44..3.82 rows=7 width=11) Cache Key: t1.x Cache Mode: logical Estimated Capacity: 1001 Estimated Distinct Lookup Keys: 1001 -> Index Scan using t2_x_idx2 on t2 (cost=0.43..3.81 rows=7) Index Cond: (x = t1.x) At first, I began to look for documentation because it was unclear what both new parameters specifically meant. Okay, there was no documentation but trivial code, and after a short discovery, I realised the meaning. The first fact I see from this EXPLAIN is that Postgres estimates it has enough memory to fit all the entries. Okay, but what does it give me? I may just increase work_mem and provide the query with more memory if needed. My main concern is how frequently this cache is planned to be used. Doing some mental effort, I found the line "rows=30000." Calculating a bit more, I may suppose it means that we have a 95% chance to reuse the cache. Okay, I got it. Now, see: 1. I needed to discover the meaning of the new parameters because they were different from the earlier "hit" and "miss." 2. I need to find a common JOIN for keys of this node. Imagine a typical 200-row EXPLAIN with 2-3 Memoization keys from different tables. 3. I need to make calculations On the opposite, the hit ratio, written instead, already known by analogy, already provides me with necessary cache efficacy data; no need to watch outside the node; it may be easily compared with the actual value. Am I wrong? Both approaches provide the data, but each one is more usable? I think we may ask more people, for example, Nikolay Samokhvalov, who, as I heard, works hard with explains. > > Once again, I'm not necessarily objecting to hit and evict ratios > being shown, I just want to know they're actually useful enough to > show and don't just bloat the EXPLAIN output needlessly. So far your > arguments aren't convincing me that they are. I'm -1 for this redundancy. -- regards, Andrei Lepikhov
В списке pgsql-hackers по дате отправления: