Re: Thoughts on statistics for continuously advancing columns

Поиск
Список
Период
Сортировка
От Chetan Suttraway
Тема Re: Thoughts on statistics for continuously advancing columns
Дата
Msg-id d26e86811001050449y4bd30e4m1df9b16588d00ea1@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Thoughts on statistics for continuously advancing columns  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Thoughts on statistics for continuously advancing columns  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

My suggestion is to keep two sets of histograms. One which is generated by running ANALYZE and
the other which is dynamically generated histograms using the entries from logging (that is done
in insert/update/delete operations).
I am not sure how difficult is it to read  such record details from logs.

Basically from the details mentioned here what i understand is that the table data (timestamp) is added
in incremental way, ie existing data is not modified to great extent and the new data is
merely appended to old data.
In this case, the only work for analyse/statistics generation is to merge the histograms of newly added records to old histograms.

if we can treat this case as similar to that of merging of histograms in case of joins involving 2 tables and generating the histograms for the cartesian (result) node, then all we need to do is somehow generate temporary histogram for the new set of records and merge them with the old histogram.
The information of interesting columns from the new records can be read from the logging section.
We must be already having the part of merging of histograms and I hope that it wont be very costly
to make these calls so as to effect planner.
(Further my opinion is to calculate this cost of histogram generation and use it in costing in some way)

Further we can put some threshold limit to make this merge happen automatically. Say if the temporary histograms reach some set threshold, only then these will be merged with the older histograms.

Please pass on your inputs.

Regards,
Chetan


On Wed, Dec 30, 2009 at 8:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Josh Berkus <josh@agliodbs.com> writes:
> My thoughts on dealing with this intelligently without a major change to
> statstics gathering went along these lines:

> 1. add columns to pg_statistic to hold estimates of upper and lower
> bounds growth between analyzes.

This seems like a fundamentally broken approach, first because "time
between analyzes" is not even approximately a constant, and second
because it assumes that we have a distance metric for all datatypes.
(Note that convert_to_scalar does not assume that it can measure
arbitrary distances, but only fractions *within* a histogram bucket;
and even that is pretty shaky.)

I don't have a better idea at the moment :-(

                       regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Markus Wanner
Дата:
Сообщение: Re: Testing with concurrent sessions
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: quoting psql varible as identifier