Re: Specifying attribute slot for storing/reading statistics
| От | Tomas Vondra |
|---|---|
| Тема | Re: Specifying attribute slot for storing/reading statistics |
| Дата | |
| Msg-id | 20190910133035.fecpdiynqqvcszpj@development обсуждение исходный текст |
| Ответ на | Re: Specifying attribute slot for storing/reading statistics (Esteban Zimanyi <ezimanyi@ulb.ac.be>) |
| Ответы |
Re: Specifying attribute slot for storing/reading statistics
|
| Список | pgsql-hackers |
Hi,
Please don't top-post. If you're not responding to parts of the e-mail,
then don't quote it.
On Fri, Sep 06, 2019 at 12:50:33PM +0200, Esteban Zimanyi wrote:
>Dear Tom
>
>Many thanks for your quick reply. Indeed both solutions you proposed can be
>combined together in order to solve all the problems. However changes in
>the code are needed. Let me now elaborate on the solution concerning the
>combination of stakind/staop first and I will elaborate on adding a new
>kind identifier after.
>
>In order to understand the setting, let me explain a little more about the
>different kinds of temporal types. As explained in my previous email these
>are types whose values are composed of elements v@t where v is a
>PostgreSQL/PostGIS type (float or geometry) and t is a TimestampTz. There
>are four kinds of temporal types, depending on the their duration
>* Instant: Values of the form v@t. These are used for example to represent
>car accidents as in Point(0 0)@2000-01-01 08:30
>* InstantSet: A set of values {v1@t1, ...., vn@tn} where the values between
>the points are unknown. These are used for example to represent checkins in
>FourSquare or RFID readings
>* Sequence: A sequence of values [v1@t1, ...., vn@tn] where the values
>between two successive instants vi@ti vj@tj are (linearly) interpolated.
>These are used to represent for example GPS tracks.
>* SequenceSet: A set of sequences {s1, ... , sn} where there is a temporal
>gap between them. These are used to represent for example GPS tracks where
>the signal was lost during a time period.
>
So these are 4 different data types (or classes of data types) that you
introduce in your extension? Or is that just a conceptual view and it's
stored in some other way (e.g. normalized in some way)?
>To compute the selectivity of temporal types we assume that time and space
>dimensions are independent and thus we can reuse all existing analyze and
>selectivity infrastructure in PostgreSQL/PostGIS. For the various durations
>this amounts to
>* Instant: Use the functions in analyze.c and selfuncs.c independently for
>the value and time dimensions
>* InstantSet: Use the functions in array_typanalyze.c, array_selfuncs.c
>independently for the value and time dimensions
>* Sequence and SequenceSet: To simplify, we do not take into account the
>gaps, and thus use the functions in rangetypes_typanalyze.c,
>rangetypes_selfuncs.c independently for the value and time dimensions
>
OK.
>However, this requires that the analyze and selectivity functions in all
>the above files satisfy the following
>* Set the staop when computing statistics. For example in
>rangetypes_typanalyze.c the staop is set for
>STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM but not for
>STATISTIC_KIND_BOUNDS_HISTOGRAM
>* Always call get_attstatsslot with the operator Oid not with InvalidOid.
>For example, from the 17 times this function is called in selfuncs.c only
>two are passed with an operator. This also requires to pass the operator as
>an additional parameter to several functions. For example, the operator
>should be passed to the function ineq_histogram_selectivity in selfuncs.c
>* Export several top-level functions which are currently static. For
>example, var_eq_const, ineq_histogram_selectivity, eqjoinsel_inner and
>several others in the file selfuncs.c should be exported.
>
>That would solve all the problems excepted for
>STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM, since in this case the staop will
>always be Float8LessOperator, independently of whether we are computing
>lengths of value ranges or of tstzranges. This could be solved by using a
>different stakind for the value and time dimensions.
>
I don't think we're strongly against changing the code to allow this, as
long as it does not break existing extensions/code (unnecessarily).
>If you want I can prepare a PR in order to understand the implications of
>these changes. Please let me know.
>
I think having an actual patch to look at would be helpful.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: