Re: new correlation metric
От | Jeff Davis |
---|---|
Тема | Re: new correlation metric |
Дата | |
Msg-id | 1225051712.5795.1.camel@localhost.localdomain обсуждение исходный текст |
Ответ на | Re: new correlation metric (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: new correlation metric
Re: new correlation metric |
Список | pgsql-hackers |
On Sun, 2008-10-26 at 12:44 -0400, Tom Lane wrote: > I wonder whether we ought to rethink the problem entirely. [...] What you say makes a lot of sense. We would have to take a sample of index leaf pages, but I think we could get a really useful number from it. For BTree, we can just read the values in order, and maintain 3 counts: * same_count: this TID points to the same page as the last one did * near_count: points to a page that is close enough that readahead or caching will help * far_count: points to a page where we don't have a reason to think readahead or caching will be a major factor Then, we could use a formula like: run_cost = min_IO_cost + (max_IO_cost - min_IO_cost)*(far_count/total_count) + small_factor*(max_IO_cost - min_IO_cost)*(near_count/total_count) small_factor should be the one minus the percentage of the near_count that we expect to find in cache (because it was recently read or due to readahead). I think from these numbers most of the concerns are taken into account. Heikki's almost-in-order case is solved because we could recognize that the pages are close enough to be in cache still. Interleaving might still cause a lower estimate, but it seems like any optimistic estimate we could make for that case is pretty risky. If the caching effects or readahead aren't as helpful as we expected, it would mean very bad performance. Of course, I agree that we need some empiral testing. > useful stats into someplace or other. We might need to invent some > other catalog besides pg_statistic if we want to represent per-index > properties like correlation. A minimal solution would be to add a > correlation column to pg_class or pg_index, but that doesn't scale well > if you imagine that different index AMs might want different sorts of > stats. Why can't we just use pg_statistic with the starelid set to the index oid? Regards,Jeff Davis
В списке pgsql-hackers по дате отправления: