Обсуждение: [RFC] ASOF Join

Поиск
Список
Период
Сортировка

[RFC] ASOF Join

От
Alexander Kuzmenkov
Дата:
Hi hackers,

There was some interest in implementing ASOF joins in Postgres, see
e.g. this prototype patch by Konstantin Knizhnik:
https://www.postgresql.org/message-id/flat/bc494762-26bd-b100-e1f9-a97901ddad57%40postgrespro.ru
I't like to discuss the possible ways of implementation, if there is
still any interest in that.


Introduction

ASOF join is often used to work with time series data such as stock
quotes or IoT sensors. It is an interpolation where we want to relate
two different time series measured at different points in time. For
each value of the first time series, we take the most recent value of
the second.

Besides an inequality condition on timestamp, such join can also have
equality conditions on other columns. For example, this query joins
two tables that contain bids and asks, finding the most recent task
for each bid for given financial instrument:

```sql
SELECT bids.ts timebid, bid, ask
FROM bids
ASOF JOIN asks ON bid.instrument = ask.instrument
AND ask.ts <= bid.ts;
```

Semantically, this is equivalent to the following correlated subquery:
```sql
SELECT bids.ts timebid, bid, ask
FROM bids,
    LATERAL (select * from asks
        WHERE asks.instrument = bids.instrument AND asks.ts <= bids.ts
        ORDER BY ts DESC LIMIT 1) t;
```
This form is useful to think about which optimizations we can perform
with an ASOF join, how it behaves with respect to other joins, and so
on.

QuestDB has some good docs on this with more examples:
https://questdb.io/docs/reference/sql/join/#asof-join


What Conditions Work with ASOF Join

Conditions for an ASOF join consist of one inequality condition (>=
etc), and optionally a number of equality conditions. All these
conditions must be "mergejoinable" in PG terms -- they must belong to
a btree operator family, which means there is a sorting operator that
corresponds to the condition, which means we can perform a merge join.
They also must support hashing because we'll probably need both
sorting and hashing for implementation (see below). This holds for the
usual data types like numeric. It is natural to think of the
inequality column as "time", but technically it can be any column,
even a string one, w/o changing the algorithm.


Join variants

The purpose of ASOF join is interpolation of one time series to match
another, so it is natural to think of it as an INNER join. The outer
variants might be less useful. Technically, it is easy to extend it to
LEFT ASOF JOIN, where we would output nulls for the right hand columns
if we haven’t yet seen a match. RIGHT and FULL variants also make
sense, but the implementation may be impossible, depending on the
algorithm -- merge and hash joins can support these variants, but the
nested loop cannot.


Use in Combination with Normal Joins

The difference of ASOF join from normal join is that for the
inequality condition, it does not output all the rows that match it,
but only the most recent one. We can think of it as first performing a
normal join and then applying a filter that selects the latest right
hand row. Which row is the "latest" depends on the entire set of rows
that match the join conditions (same as with LIMIT). This means that
the result of ASOF join may depend on the place in the join tree where
it is evaluated, because other joins may remove some rows. Similar to
outer joins, we must respect the user-specified join order for an ASOF
join. It is useful to think about pushing another join below an ASOF
join as pushing a join below a correlated subquery with LIMIT (see
above). This transformation might be correct in some cases, so we
might later think about adding some optimization for join order for
ASOF join.


Proposed Syntax

ASOF join is semantically distinct from a normal join on the same
conditions, so it requires separate grammar. ASOF modifier + listing
all the conditions in the ON section, looks like a good baseline:
`bids ASOF JOIN asks ON asks.timestamp <= bids.timestamp AND
asks.instrument = bids.instrument`


Algorithms

Let's see which algorithm we can use to perform an ASOF join if we
have a "<=" condition on timestamp and several "=" conditions on other
columns (equi-columns).

1. Hash on Equi-keys

This is what ClickHouse uses. It builds a hash table on equi columns,
then for each equi-key builds an array of timestamps, sorted on
demand. This requires bringing the entire right hand table into
memory, so not feasible for large tables.


2. Merge Join on (equi-keys, timestamp) Sorting

This is a natural extension of the merge join algorithm, but instead
of returning all keys for the timestamp column, it returns only the
latest one. A drawback of this algorithm is that the data must be
sorted on timestamp last, so we can't reuse the natural ordering of
the time series data encoded by a (timestamp) index. We will have to
sort both tables entirely in different order, which is prohibitively
costly for large tables. Another way is to create an index on
(equi-keys, timestamp). This would allow us to perform a merge ASOF
join in linear time, but has several drawbacks. First, it requires
maintaining an additional index which costs space and time (the
(timestamp) index we have to have anyway). Second, the time series
data is naturally ordered on timestamp, so even w/o CLUSTER, the
locality in time translates somewhat into the locality in page space.
Reading the table in (equi-keys, timestamp) order would require
essentially random access with frequent switching between chunks, in
contrast to reading in (timestamp) order which reads from a single
chunk. So this algorithm is probably going to be less performant than
the one using (timestamp) sorting, described next. The good part of
this algorithm is that with a dedicated (equi-keys, timestamp) index,
it requires constant memory, so it still can be useful in case of high
cardinality of equi-keys.


3. Merge-Hash on (timestamp) Sorting

If we sort first on timestamp, we can reuse the natural order of
time-series data, often encoded by the index on (timestamp). This
approach would allow us to process data in streaming fashion, w/o
sorting everything again, which makes it feasible for really large
tables. Let's see what algorithm we can use to perform an ASOF join in
this case. Suppose we have left and right input stream sorted on
(timestamp). We will need to use an additional data structure -- a
hash table indexed by the equi keys. The algorithm is as follows:

a. For a given left row, advance the right table until right timestamp
> left timestamp.

b. While we advance the right table, put each right hand row into the
hash table indexed by the equi keys. Overwrite the previous row with
the same keys, if there was any.

c. We have finished advancing the right table. The hash table now
contains the most recent right hand row for every value of equi-keys.
Most recent because the right hand table is sorted by (timestamp).

d. For the left row, look up a right row that matches it by the equi
keys in the hash table. This is the right hand row that matches the
ASOF join conditions (equi-keys are equal, left timestamp >= right
timestamp, right timestamp is maximal for the given equi-keys). Output
the result.

e. Go to the next left row. The left table is also sorted on
(timestamp), so we won't need to rewind the right table, only to
advance it forward.

Given the sorted input paths, this algorithm is linear time in size of
the tables. A drawback of this algorithm is that it requires memory
proportional to the cardinality of the equi-columns. A possible
optimization is to split the equi-key hash table into hot and cold
parts by LRU, and dump the cold part to disk. This would help if each
equi-key only occurs for a small period of time.


4. Nested Loop

An efficient nested loop plan has to have a fast right-side subplan,
such as an index lookup. Unfortunately, there seems to be no way to
efficiently perform a last-point lookup for given equi-keys, if we
have separate btree indexes on timestamp and equi-keys. The nested
loop plan could work if we have a (timestamp, equi-keys) btree index.


Prototype Implementation

For a prototype, I'd go with #3 "merge-something with a hash table of
most recent rows for equi-keys", because it works for big tables and
can reuse the physical data ordering.


I'll be glad to hear your thoughts on this.


--
Alexander Kuzmenkov
Timescale



Re: [RFC] ASOF Join

От
Ilya Anfimov
Дата:
On Thu, Nov 18, 2021 at 05:11:16PM +0300, Alexander Kuzmenkov wrote:
> Hi hackers,
> 
> There was some interest in implementing ASOF joins in Postgres, see
> e.g. this prototype patch by Konstantin Knizhnik:
> https://www.postgresql.org/message-id/flat/bc494762-26bd-b100-e1f9-a97901ddad57%40postgrespro.ru
> I't like to discuss the possible ways of implementation, if there is
> still any interest in that.
> 
> 
> Introduction
> 
> ASOF join is often used to work with time series data such as stock
> quotes or IoT sensors. It is an interpolation where we want to relate
> two different time series measured at different points in time. For
> each value of the first time series, we take the most recent value of
> the second.

DISCLAIMER:  I  am both seeing this first time and I don't have a
good understanding of the PosgreSQL development practices.

 But at a first glance the syntax looks like  pure  evil.  I  see
that  this  is somewhat common as of now (clickhouse, your refer-
enced questdb, quasar db) -- but it's bad anyway.  And not really
a standard.

 This  introduces  a  new  keyword to the ridiculous list of key-
words.
 The syntax loosely defines what preference it wants, by extract-
ing some vague set of ordering operators from the join condition.
 Also,  only  one  asof value is allowed (there may be more of it
sometimes).

 Perhaps, if we've got to some syntax -- than something like
 ORDER  BY  ... LIMIT 
in  joins,  just before the ON join_condition or USING() could be
much better.
 This allows to
   1) Easily separate just conditions from the likehood ones.  No
deep  analysis  of  a  condition expression necessary. No need to
have another list of all the possible ranking functions and oper-
ators.
   2) Have ordering preference for many ranking conditions.  Like
ORDER BY secondary_field IS NOT NULL DESC, time_difference, reli-
ability
   3) Have more than one row returned for a joined table.


 But  anyways  this  looks  like  just a syntactic sugar. LATERAL
JOINS should logically work just fine.  Any  optimisation  should
deal with the LATERAL syntax style anyway.


> 
> Besides an inequality condition on timestamp, such join can also have
> equality conditions on other columns. For example, this query joins
> two tables that contain bids and asks, finding the most recent task
> for each bid for given financial instrument:
> 
> ```sql
> SELECT bids.ts timebid, bid, ask
> FROM bids
> ASOF JOIN asks ON bid.instrument = ask.instrument
> AND ask.ts <= bid.ts;
> ```
> 
> Semantically, this is equivalent to the following correlated subquery:
> ```sql
> SELECT bids.ts timebid, bid, ask
> FROM bids,
>     LATERAL (select * from asks
>         WHERE asks.instrument = bids.instrument AND asks.ts <= bids.ts
>         ORDER BY ts DESC LIMIT 1) t;
> ```
> This form is useful to think about which optimizations we can perform
> with an ASOF join, how it behaves with respect to other joins, and so
> on.
> 
> QuestDB has some good docs on this with more examples:
> https://questdb.io/docs/reference/sql/join/#asof-join
> 
> 
> What Conditions Work with ASOF Join
> 
> Conditions for an ASOF join consist of one inequality condition (>=
> etc), and optionally a number of equality conditions. All these
> conditions must be "mergejoinable" in PG terms -- they must belong to
> a btree operator family, which means there is a sorting operator that
> corresponds to the condition, which means we can perform a merge join.
> They also must support hashing because we'll probably need both
> sorting and hashing for implementation (see below). This holds for the
> usual data types like numeric. It is natural to think of the
> inequality column as "time", but technically it can be any column,
> even a string one, w/o changing the algorithm.
> 
> 
> Join variants
> 
> The purpose of ASOF join is interpolation of one time series to match
> another, so it is natural to think of it as an INNER join. The outer
> variants might be less useful. Technically, it is easy to extend it to
> LEFT ASOF JOIN, where we would output nulls for the right hand columns
> if we haven???t yet seen a match. RIGHT and FULL variants also make
> sense, but the implementation may be impossible, depending on the
> algorithm -- merge and hash joins can support these variants, but the
> nested loop cannot.
> 
> 
> Use in Combination with Normal Joins
> 
> The difference of ASOF join from normal join is that for the
> inequality condition, it does not output all the rows that match it,
> but only the most recent one. We can think of it as first performing a
> normal join and then applying a filter that selects the latest right
> hand row. Which row is the "latest" depends on the entire set of rows
> that match the join conditions (same as with LIMIT). This means that
> the result of ASOF join may depend on the place in the join tree where
> it is evaluated, because other joins may remove some rows. Similar to
> outer joins, we must respect the user-specified join order for an ASOF
> join. It is useful to think about pushing another join below an ASOF
> join as pushing a join below a correlated subquery with LIMIT (see
> above). This transformation might be correct in some cases, so we
> might later think about adding some optimization for join order for
> ASOF join.
> 
> 
> Proposed Syntax
> 
> ASOF join is semantically distinct from a normal join on the same
> conditions, so it requires separate grammar. ASOF modifier + listing
> all the conditions in the ON section, looks like a good baseline:
> `bids ASOF JOIN asks ON asks.timestamp <= bids.timestamp AND
> asks.instrument = bids.instrument`
> 
> 
> Algorithms
> 
> Let's see which algorithm we can use to perform an ASOF join if we
> have a "<=" condition on timestamp and several "=" conditions on other
> columns (equi-columns).
> 
> 1. Hash on Equi-keys
> 
> This is what ClickHouse uses. It builds a hash table on equi columns,
> then for each equi-key builds an array of timestamps, sorted on
> demand. This requires bringing the entire right hand table into
> memory, so not feasible for large tables.
> 
> 
> 2. Merge Join on (equi-keys, timestamp) Sorting
> 
> This is a natural extension of the merge join algorithm, but instead
> of returning all keys for the timestamp column, it returns only the
> latest one. A drawback of this algorithm is that the data must be
> sorted on timestamp last, so we can't reuse the natural ordering of
> the time series data encoded by a (timestamp) index. We will have to
> sort both tables entirely in different order, which is prohibitively
> costly for large tables. Another way is to create an index on
> (equi-keys, timestamp). This would allow us to perform a merge ASOF
> join in linear time, but has several drawbacks. First, it requires
> maintaining an additional index which costs space and time (the
> (timestamp) index we have to have anyway). Second, the time series
> data is naturally ordered on timestamp, so even w/o CLUSTER, the
> locality in time translates somewhat into the locality in page space.
> Reading the table in (equi-keys, timestamp) order would require
> essentially random access with frequent switching between chunks, in
> contrast to reading in (timestamp) order which reads from a single
> chunk. So this algorithm is probably going to be less performant than
> the one using (timestamp) sorting, described next. The good part of
> this algorithm is that with a dedicated (equi-keys, timestamp) index,
> it requires constant memory, so it still can be useful in case of high
> cardinality of equi-keys.
> 
> 
> 3. Merge-Hash on (timestamp) Sorting
> 
> If we sort first on timestamp, we can reuse the natural order of
> time-series data, often encoded by the index on (timestamp). This
> approach would allow us to process data in streaming fashion, w/o
> sorting everything again, which makes it feasible for really large
> tables. Let's see what algorithm we can use to perform an ASOF join in
> this case. Suppose we have left and right input stream sorted on
> (timestamp). We will need to use an additional data structure -- a
> hash table indexed by the equi keys. The algorithm is as follows:
> 
> a. For a given left row, advance the right table until right timestamp
> > left timestamp.
> 
> b. While we advance the right table, put each right hand row into the
> hash table indexed by the equi keys. Overwrite the previous row with
> the same keys, if there was any.
> 
> c. We have finished advancing the right table. The hash table now
> contains the most recent right hand row for every value of equi-keys.
> Most recent because the right hand table is sorted by (timestamp).
> 
> d. For the left row, look up a right row that matches it by the equi
> keys in the hash table. This is the right hand row that matches the
> ASOF join conditions (equi-keys are equal, left timestamp >= right
> timestamp, right timestamp is maximal for the given equi-keys). Output
> the result.
> 
> e. Go to the next left row. The left table is also sorted on
> (timestamp), so we won't need to rewind the right table, only to
> advance it forward.
> 
> Given the sorted input paths, this algorithm is linear time in size of
> the tables. A drawback of this algorithm is that it requires memory
> proportional to the cardinality of the equi-columns. A possible
> optimization is to split the equi-key hash table into hot and cold
> parts by LRU, and dump the cold part to disk. This would help if each
> equi-key only occurs for a small period of time.
> 
> 
> 4. Nested Loop
> 
> An efficient nested loop plan has to have a fast right-side subplan,
> such as an index lookup. Unfortunately, there seems to be no way to
> efficiently perform a last-point lookup for given equi-keys, if we
> have separate btree indexes on timestamp and equi-keys. The nested
> loop plan could work if we have a (timestamp, equi-keys) btree index.
> 
> 
> Prototype Implementation
> 
> For a prototype, I'd go with #3 "merge-something with a hash table of
> most recent rows for equi-keys", because it works for big tables and
> can reuse the physical data ordering.
> 
> 
> I'll be glad to hear your thoughts on this.
> 
> 
> --
> Alexander Kuzmenkov
> Timescale
> 



Re: [RFC] ASOF Join

От
Todd Hubers
Дата:
>But  anyways  this  looks  like  just a syntactic sugar. LATERAL
>JOINS should logically work just fine.  Any  optimisation  should
>deal with the LATERAL syntax style anyway.

Agreed.

However, if a rewrite is implemented, it then becomes encoded into PostgreSQL code what ASOF maps to. Anyone who wants to use ASOF can use it, and anyone who wants to directly use LATERAL can do so also.

On Sun, 21 Nov 2021 at 15:54, Ilya Anfimov <ilan@tzirechnoy.com> wrote:
On Thu, Nov 18, 2021 at 05:11:16PM +0300, Alexander Kuzmenkov wrote:
> Hi hackers,
>
> There was some interest in implementing ASOF joins in Postgres, see
> e.g. this prototype patch by Konstantin Knizhnik:
> https://www.postgresql.org/message-id/flat/bc494762-26bd-b100-e1f9-a97901ddad57%40postgrespro.ru
> I't like to discuss the possible ways of implementation, if there is
> still any interest in that.
>
>
> Introduction
>
> ASOF join is often used to work with time series data such as stock
> quotes or IoT sensors. It is an interpolation where we want to relate
> two different time series measured at different points in time. For
> each value of the first time series, we take the most recent value of
> the second.

DISCLAIMER:  I  am both seeing this first time and I don't have a
good understanding of the PosgreSQL development practices.

 But at a first glance the syntax looks like  pure  evil.  I  see
that  this  is somewhat common as of now (clickhouse, your refer-
enced questdb, quasar db) -- but it's bad anyway.  And not really
a standard.

 This  introduces  a  new  keyword to the ridiculous list of key-
words.
 The syntax loosely defines what preference it wants, by extract-
ing some vague set of ordering operators from the join condition.
 Also,  only  one  asof value is allowed (there may be more of it
sometimes).

 Perhaps, if we've got to some syntax -- than something like
 ORDER  BY  ... LIMIT
in  joins,  just before the ON join_condition or USING() could be
much better.
 This allows to
   1) Easily separate just conditions from the likehood ones.  No
deep  analysis  of  a  condition expression necessary. No need to
have another list of all the possible ranking functions and oper-
ators.
   2) Have ordering preference for many ranking conditions.  Like
ORDER BY secondary_field IS NOT NULL DESC, time_difference, reli-
ability
   3) Have more than one row returned for a joined table.


 But  anyways  this  looks  like  just a syntactic sugar. LATERAL
JOINS should logically work just fine.  Any  optimisation  should
deal with the LATERAL syntax style anyway.


>
> Besides an inequality condition on timestamp, such join can also have
> equality conditions on other columns. For example, this query joins
> two tables that contain bids and asks, finding the most recent task
> for each bid for given financial instrument:
>
> ```sql
> SELECT bids.ts timebid, bid, ask
> FROM bids
> ASOF JOIN asks ON bid.instrument = ask.instrument
> AND ask.ts <= bid.ts;
> ```
>
> Semantically, this is equivalent to the following correlated subquery:
> ```sql
> SELECT bids.ts timebid, bid, ask
> FROM bids,
>     LATERAL (select * from asks
>         WHERE asks.instrument = bids.instrument AND asks.ts <= bids.ts
>         ORDER BY ts DESC LIMIT 1) t;
> ```
> This form is useful to think about which optimizations we can perform
> with an ASOF join, how it behaves with respect to other joins, and so
> on.
>
> QuestDB has some good docs on this with more examples:
> https://questdb.io/docs/reference/sql/join/#asof-join
>
>
> What Conditions Work with ASOF Join
>
> Conditions for an ASOF join consist of one inequality condition (>=
> etc), and optionally a number of equality conditions. All these
> conditions must be "mergejoinable" in PG terms -- they must belong to
> a btree operator family, which means there is a sorting operator that
> corresponds to the condition, which means we can perform a merge join.
> They also must support hashing because we'll probably need both
> sorting and hashing for implementation (see below). This holds for the
> usual data types like numeric. It is natural to think of the
> inequality column as "time", but technically it can be any column,
> even a string one, w/o changing the algorithm.
>
>
> Join variants
>
> The purpose of ASOF join is interpolation of one time series to match
> another, so it is natural to think of it as an INNER join. The outer
> variants might be less useful. Technically, it is easy to extend it to
> LEFT ASOF JOIN, where we would output nulls for the right hand columns
> if we haven???t yet seen a match. RIGHT and FULL variants also make
> sense, but the implementation may be impossible, depending on the
> algorithm -- merge and hash joins can support these variants, but the
> nested loop cannot.
>
>
> Use in Combination with Normal Joins
>
> The difference of ASOF join from normal join is that for the
> inequality condition, it does not output all the rows that match it,
> but only the most recent one. We can think of it as first performing a
> normal join and then applying a filter that selects the latest right
> hand row. Which row is the "latest" depends on the entire set of rows
> that match the join conditions (same as with LIMIT). This means that
> the result of ASOF join may depend on the place in the join tree where
> it is evaluated, because other joins may remove some rows. Similar to
> outer joins, we must respect the user-specified join order for an ASOF
> join. It is useful to think about pushing another join below an ASOF
> join as pushing a join below a correlated subquery with LIMIT (see
> above). This transformation might be correct in some cases, so we
> might later think about adding some optimization for join order for
> ASOF join.
>
>
> Proposed Syntax
>
> ASOF join is semantically distinct from a normal join on the same
> conditions, so it requires separate grammar. ASOF modifier + listing
> all the conditions in the ON section, looks like a good baseline:
> `bids ASOF JOIN asks ON asks.timestamp <= bids.timestamp AND
> asks.instrument = bids.instrument`
>
>
> Algorithms
>
> Let's see which algorithm we can use to perform an ASOF join if we
> have a "<=" condition on timestamp and several "=" conditions on other
> columns (equi-columns).
>
> 1. Hash on Equi-keys
>
> This is what ClickHouse uses. It builds a hash table on equi columns,
> then for each equi-key builds an array of timestamps, sorted on
> demand. This requires bringing the entire right hand table into
> memory, so not feasible for large tables.
>
>
> 2. Merge Join on (equi-keys, timestamp) Sorting
>
> This is a natural extension of the merge join algorithm, but instead
> of returning all keys for the timestamp column, it returns only the
> latest one. A drawback of this algorithm is that the data must be
> sorted on timestamp last, so we can't reuse the natural ordering of
> the time series data encoded by a (timestamp) index. We will have to
> sort both tables entirely in different order, which is prohibitively
> costly for large tables. Another way is to create an index on
> (equi-keys, timestamp). This would allow us to perform a merge ASOF
> join in linear time, but has several drawbacks. First, it requires
> maintaining an additional index which costs space and time (the
> (timestamp) index we have to have anyway). Second, the time series
> data is naturally ordered on timestamp, so even w/o CLUSTER, the
> locality in time translates somewhat into the locality in page space.
> Reading the table in (equi-keys, timestamp) order would require
> essentially random access with frequent switching between chunks, in
> contrast to reading in (timestamp) order which reads from a single
> chunk. So this algorithm is probably going to be less performant than
> the one using (timestamp) sorting, described next. The good part of
> this algorithm is that with a dedicated (equi-keys, timestamp) index,
> it requires constant memory, so it still can be useful in case of high
> cardinality of equi-keys.
>
>
> 3. Merge-Hash on (timestamp) Sorting
>
> If we sort first on timestamp, we can reuse the natural order of
> time-series data, often encoded by the index on (timestamp). This
> approach would allow us to process data in streaming fashion, w/o
> sorting everything again, which makes it feasible for really large
> tables. Let's see what algorithm we can use to perform an ASOF join in
> this case. Suppose we have left and right input stream sorted on
> (timestamp). We will need to use an additional data structure -- a
> hash table indexed by the equi keys. The algorithm is as follows:
>
> a. For a given left row, advance the right table until right timestamp
> > left timestamp.
>
> b. While we advance the right table, put each right hand row into the
> hash table indexed by the equi keys. Overwrite the previous row with
> the same keys, if there was any.
>
> c. We have finished advancing the right table. The hash table now
> contains the most recent right hand row for every value of equi-keys.
> Most recent because the right hand table is sorted by (timestamp).
>
> d. For the left row, look up a right row that matches it by the equi
> keys in the hash table. This is the right hand row that matches the
> ASOF join conditions (equi-keys are equal, left timestamp >= right
> timestamp, right timestamp is maximal for the given equi-keys). Output
> the result.
>
> e. Go to the next left row. The left table is also sorted on
> (timestamp), so we won't need to rewind the right table, only to
> advance it forward.
>
> Given the sorted input paths, this algorithm is linear time in size of
> the tables. A drawback of this algorithm is that it requires memory
> proportional to the cardinality of the equi-columns. A possible
> optimization is to split the equi-key hash table into hot and cold
> parts by LRU, and dump the cold part to disk. This would help if each
> equi-key only occurs for a small period of time.
>
>
> 4. Nested Loop
>
> An efficient nested loop plan has to have a fast right-side subplan,
> such as an index lookup. Unfortunately, there seems to be no way to
> efficiently perform a last-point lookup for given equi-keys, if we
> have separate btree indexes on timestamp and equi-keys. The nested
> loop plan could work if we have a (timestamp, equi-keys) btree index.
>
>
> Prototype Implementation
>
> For a prototype, I'd go with #3 "merge-something with a hash table of
> most recent rows for equi-keys", because it works for big tables and
> can reuse the physical data ordering.
>
>
> I'll be glad to hear your thoughts on this.
>
>
> --
> Alexander Kuzmenkov
> Timescale
>




--
--
Todd Hubers

Re: [RFC] ASOF Join

От
Alexander Kuzmenkov
Дата:
On 21.11.2021 07:53, Ilya Anfimov wrote:
> DISCLAIMER:  I  am both seeing this first time and I don't have a
> good understanding of the PosgreSQL development practices.

> pure evil
> ridiculous
No worries, at least you got the etiquette just right.


There are two points in your mail that I'd like to discuss. First, the ASOF grammar being bad because it's implicit. I
doagree on the general idea that explicit is better UX than implicit, especially when we're talking about SQL where you
spendhalf the time battling the query planner already. However, in the grammar I proposed it's unambiguous which
conditionsare ASOF and which are not -- all inequalities are ASOF, all equalities are not, and there can be no other
kindsof conditions for this type of join. It can also support any number of ASOF conditions. Which grammar exactly do
yousuggest? Maybe something like this:
 

asks JOIN bids ON asks.instrument = bids.instrument ASOF asks.timestamp <= bids.timestamp

This still does require a keyword.


Second, you say that we must first optimize the corresponding LATERAL. I was thinking about this as well, but _that_ is
what'snot explicit. I'm not sure if this optimization would have any value outside of optimizing ASOF joins. We might
givebetter UX if we embrace the fact that we're doing an ASOF join and allow the user to state this explicitly and get
anefficient and predictable plan, or an error, instead of trying to guess this from the rewritten queries and silently
fallingback to an inefficient plan for cryptic reasons.
 


--
Alexander Kuzmenkov
Timescale




Re: [RFC] ASOF Join

От
Ilya Anfimov
Дата:
On Mon, Nov 22, 2021 at 03:44:37PM +0300, Alexander Kuzmenkov wrote:
> On 21.11.2021 07:53, Ilya Anfimov wrote:
> > DISCLAIMER:  I  am both seeing this first time and I don't have a
> > good understanding of the PosgreSQL development practices.
> 
> > pure evil
> > ridiculous
> No worries, at least you got the etiquette just right.
> 
> 
>  There  are  two  points in your mail that I'd like to discuss.
> First, the ASOF grammar being bad because  it's  implicit.  I  do
> agree on the general idea that explicit is better UX than implic-
> it, especially when we're talking about SQL where you spend  half
> the  time  battling  the  query  planner already. However, in the
> grammar I proposed it's unambiguous which conditions are ASOF and
> which  are  not  -- all inequalities are ASOF, all equalities are

 I see at least two operators in postgres that implement ordering
while they are not being <= ( ~<=~ -- for text  compare  byte-by-
byte, and *<= for internal record compare)
 and  four  cases that are literally <= , but don't implement or-
dering -- box, lseg, path and circle are compared by  length  and
fuzzy floating-point comparision.

 Are  you sure an implementor and a programmer will easily decide
what is  just a boolean test, and what is an order?

 What's worse, preference of values doesn't have a lot in  common
with  filters  you want on them. Let's get your example of a time
matching: another reasonable business case is to match the  near-
est time point in any direction, within a reasonable time limit.
 Like timea BETWEEN timeb - '1s' AND timeb + '1s' ,
 and to choose something like min(@(timea-timeb)) among them (*We
strangely don't have an absolute value operator on interval,  but
I think you've got the point*).

> not, and there can be no other kinds of conditions for this  type
> of join. It can also support any number of ASOF conditions. Which
> grammar exactly do you suggest? Maybe something like this:




> 
> asks JOIN bids ON asks.instrument = bids.instrument ASOF asks.timestamp <= bids.timestamp

 I suggest JOIN bids ORDER BY asks.timestamp DESC LIMIT 1
                 ON asks.instrument = bids.instrument AND asks.timestamp <= bids.timestamp

 LIMIT 1 could also be implied.





Re: [RFC] ASOF Join

От
Chapman Flack
Дата:
On 11/23/21 02:29, Ilya Anfimov wrote:
> (*We
> strangely don't have an absolute value operator on interval,  but
> I think you've got the point*).

Although tangential to the topic, that might be because a PG interval
is a triple of independently-signed months/days/seconds components.
An interval like '1 month -31 days +12:00:00' is positive or negative
depending on the absolute date you apply it to, so what its absolute
value should be isn't clear in isolation.

That's also why there's no everywhere-defined conversion from PG interval
to XML xs:duration or to java.time.Duration, etc.</tangent>

Regards,
-Chap



Re: [RFC] ASOF Join

От
Isaac Morland
Дата:
On Tue, 23 Nov 2021 at 09:44, Chapman Flack <chap@anastigmatix.net> wrote:
On 11/23/21 02:29, Ilya Anfimov wrote:
> (*We
> strangely don't have an absolute value operator on interval,  but
> I think you've got the point*).

Although tangential to the topic, that might be because a PG interval
is a triple of independently-signed months/days/seconds components.
An interval like '1 month -31 days +12:00:00' is positive or negative
depending on the absolute date you apply it to, so what its absolute
value should be isn't clear in isolation.

Umm, it's definitely negative:

odyssey=> select '1 month -31 days +12:00:00'::interval < '0 months'::interval;
 ?column?
----------
 t
(1 row)

It's just that due to the complexities of our calendar/time systems, adding it to a timestamp can move the timestamp in either direction:

odyssey=> select '2021-02-01'::timestamp + '1 month -31 days +12:00:00'::interval;
      ?column?      
---------------------
 2021-01-29 12:00:00
(1 row)

odyssey=> select '2021-03-01'::timestamp + '1 month -31 days +12:00:00'::interval;
      ?column?      
---------------------
 2021-03-01 12:00:00
(1 row)

I'm working on a patch to add abs(interval) so I noticed this. There are lots of oddities, including lots of intervals which compare equal to 0 but which can change a timestamp when added to it, but as presently designed, this particular interval compares as negative.

Re: [RFC] ASOF Join

От
Chapman Flack
Дата:
On 11/23/21 10:41, Isaac Morland wrote:
> Umm, it's definitely negative:
> 
> odyssey=> select '1 month -31 days +12:00:00'::interval < '0
> months'::interval;
> ----------
>  t

Well, what you've shown here is that it's "negative" according to
an arbitrary total ordering imposed in interval_cmp_value for the purpose
of making it indexable in a btree ...

> It's just that due to the complexities of our calendar/time systems, adding
> it to a timestamp can move the timestamp in either direction:

... and this is just another way of saying that said arbitrary choice of
btree ordering can't be used to tell you whether the interval is
semantically positive or negative. (Of course, for a great many intervals,
the two answers will be the same, but they're still answers to different
questions.)

> I'm working on a patch to add abs(interval) so I noticed this. There are
> lots of oddities, including lots of intervals which compare equal to 0 but
> which can change a timestamp when added to it, but as presently designed,
> this particular interval compares as negative.

It's no use—it's oddities all the way down. You can shove them off to one
side of the desk or the other depending on your intentions of the moment,
but they're always there. If you want to put intervals in a btree, you
can define a total ordering where all days are 24 hours and all months
are 30 days, and then there are no oddities in your btree, they're just
everywhere else. Or you can compare your unknown interval to a known
one like '0 months' and say you know whether it's "negative", you just
don't know whether it moves a real date forward or back. Or you can see
what it does to a real date, but not know whether it would precede or
follow some other interval in a btree.

Regards,
-Chap