Обсуждение: Question about RUM-index

Поиск

Список

Период

Сортировка

Question about RUM-index

От

Andreas Joseph Krogh

Дата:

15 июня 2016 г., 10:56:20

Hi.

First; Is this the correct forum to ask questions about the Postgres Pro's new RUM-index?

If not, please point me to the right forum.

I'm trying to do this (done with GIN):

create extension if not exists btree_gin;
drop table if EXISTS delivery;

create table delivery(    id BIGSERIAL primary key,    fts_all TSVECTOR not null,    folder_id BIGINT NOT NULL,    sent TIMESTAMP not null,    message varchar not null
);


create index gin_idx on delivery using GIN(fts_all, folder_id);

CREATE OR REPLACE FUNCTION update_delivery_tsvector_tf() RETURNS TRIGGER AS $$
BEGIN
    NEW.fts_all = to_tsvector('simple', NEW.message);
    return NEW;
END;
$$ LANGUAGE PLPGSQL;

CREATE TRIGGER update_delivery_tsvector_t BEFORE INSERT OR UPDATE ON delivery
FOR EACH ROW EXECUTE PROCEDURE update_delivery_tsvector_tf();

insert into delivery(folder_id, sent, message)
values (1, '2015-01-01', 'Yes hit four')    , (1, '2014-01-01', 'Hi man')    , (2, '2013-01-01', 'Hi man')    , (2, '2013-01-01', 'fish')
;

analyze delivery;

set ENABLE_SEQSCAN to off;

explain analyze SELECT del.id                    , del.sent                FROM delivery del                WHERE 1 = 1
                      AND del.fts_all @@ to_tsquery('simple', 'hi:*')                      AND del.folder_id = ANY(ARRAY[2,3]::BIGINT[])
                ORDER BY  del.sent DESC LIMIT 101 OFFSET 0;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Limit (cost=3.63..3.63 rows=1 width=16) (actual time=0.025..0.025 rows=1 loops=1)
  -> Sort (cost=3.63..3.63 rows=1 width=16) (actual time=0.024..0.024 rows=1 loops=1)
        Sort Key: sent DESC
        Sort Method: quicksort Memory: 25kB
        -> Bitmap Heap Scan on delivery del (cost=2.40..3.62 rows=1 width=16) (actual time=0.019..0.019 rows=1 loops=1)
              Recheck Cond: ((fts_all @@ '''hi'':*'::tsquery) AND (folder_id = ANY ('{2,3}'::bigint[])))
              Heap Blocks: exact=1
              -> Bitmap Index Scan on gin_idx (cost=0.00..2.40 rows=1 width=0) (actual time=0.015..0.015 rows=1 loops=1)
                    Index Cond: ((fts_all @@ '''hi'':*'::tsquery) AND (folder_id = ANY ('{2,3}'::bigint[])))
Planning time: 0.153 ms
Execution time: 0.047 ms
(11 rows)

Note that GIN does almost what I want, except use the index when sorting by "sent"-timestamp.

So I wonder if RUM can do any better?

What I don't understand is how to have "folder_id" as part of the RUM-index so that I can search in an array of folders using the index, AND have the whole result sorted by "sent"-timestamp also using the RUM-index.

In the (limited) documentation sorting using timestamp is done like this:

ORDER BY sent <-> '2000-01-01'::TIMESTAMP

which I don't understand; Why must one specify a value here, and how does that value affect the result?

Thanks.

Andreas Joseph Krogh

CTO / Partner - Visena AS

Mobile: +47 909 56 963

andreas@visena.com

www.visena.com

Вложения

msg-1550244-3.png

Re: Question about RUM-index

От

"David G. Johnston"

Дата:

15 июня 2016 г., 13:27:40

On Wed, Jun 15, 2016 at 6:56 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:

Hi.

First; Is this the correct forum to ask questions about the Postgres Pro's new RUM-index?

-general is fairly broad in scope and the authors do monitor here as far as I am aware. I'm not sure this is strictly related to that extension though, anyway.

If not, please point me to the right forum.

I'm trying to do this (done with GIN):

create extension if not exists btree_gin;
drop table if EXISTS delivery;

create table delivery(   id BIGSERIAL primary key,   fts_all TSVECTOR not null,   folder_id BIGINT NOT NULL,   sent TIMESTAMP not null,   message varchar not null
);


create index gin_idx on delivery using GIN(fts_all, folder_id);

In general I'm not sure what you are so focused on multi-column indexes through this email.

CREATE OR REPLACE FUNCTION update_delivery_tsvector_tf() RETURNS TRIGGER AS $$
BEGIN
    NEW.fts_all = to_tsvector('simple', NEW.message);
    return NEW;
END;
$$ LANGUAGE PLPGSQL;

CREATE TRIGGER update_delivery_tsvector_t BEFORE INSERT OR UPDATE ON delivery
FOR EACH ROW EXECUTE PROCEDURE update_delivery_tsvector_tf();

insert into delivery(folder_id, sent, message)
values (1, '2015-01-01', 'Yes hit four')   , (1, '2014-01-01', 'Hi man')   , (2, '2013-01-01', 'Hi man')   , (2, '2013-01-01', 'fish')
;

analyze delivery;

set ENABLE_SEQSCAN to off;

explain analyze SELECT del.id                   , del.sent               FROM delivery del               WHERE 1 = 1
                      AND del.fts_all @@ to_tsquery('simple', 'hi:*')                     AND del.folder_id = ANY(ARRAY[2,3]::BIGINT[])
                ORDER BY  del.sent DESC LIMIT 101 OFFSET 0;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Limit (cost=3.63..3.63 rows=1 width=16) (actual time=0.025..0.025 rows=1 loops=1)
  -> Sort (cost=3.63..3.63 rows=1 width=16) (actual time=0.024..0.024 rows=1 loops=1)
        Sort Key: sent DESC
        Sort Method: quicksort Memory: 25kB
        -> Bitmap Heap Scan on delivery del (cost=2.40..3.62 rows=1 width=16) (actual time=0.019..0.019 rows=1 loops=1)
              Recheck Cond: ((fts_all @@ '''hi'':*'::tsquery) AND (folder_id = ANY ('{2,3}'::bigint[])))
              Heap Blocks: exact=1
              -> Bitmap Index Scan on gin_idx (cost=0.00..2.40 rows=1 width=0) (actual time=0.015..0.015 rows=1 loops=1)
                    Index Cond: ((fts_all @@ '''hi'':*'::tsquery) AND (folder_id = ANY ('{2,3}'::bigint[])))
Planning time: 0.153 ms
Execution time: 0.047 ms
(11 rows)

Note that GIN does almost what I want, except use the index when sorting by "sent"-timestamp.

What index? You don't have "sent" in the one CREATE INDEX statement you specified above.

So I wonder if RUM can do any better?
What I don't understand is how to have "folder_id" as part of the RUM-index so that I can search in an array of folders using the index, AND have the whole result sorted by "sent"-timestamp also using the RUM-index.

A bit out of my experience here...but see comment on multi-column indexes above.

In the (limited) documentation sorting using timestamp is done like this:

ORDER BY sent <-> '2000-01-01'::TIMESTAMP

which I don't understand; Why must one specify a value here, and how does that value affect the result?

You don't (I think...); <ORDER BY sent> is perfectly valid, it will use the "<=, and >=" operators in the btree operator family to perform the ordering...see comment regarding multi-column indexes. Even if you stick sent into such an index it would not (I guess it could...but your where clause doesn't filter on it so doing so seem counter-productive) be a leading column so it is unlikely that the index would be helpful in supplying tuples to the sort node in order - the sort would still have work to perform.

Maybe its simple ignorance but AFAIK the sorting node never consults an index to perform its work. Where indexes come into to play is the node under the sort is able to supply its tuples in sorted order then the sort node will have nothing to do.

https://www.postgresql.org/docs/9.6/static/indexes-ordering.html

"""

In addition to simply finding the rows to be returned by a query, an index may be able to deliver them in a specific sorted order. This allows a query's ORDER BY specification to be honored without a separate sorting step. Of the index types currently supported by PostgreSQL, only B-tree can produce sorted output — the other index types return matching rows in an unspecified, implementation-dependent order.

"""

HTH

David J.

Re: Question about RUM-index

От

Andreas Joseph Krogh

Дата:

15 июня 2016 г., 16:15:56

På onsdag 15. juni 2016 kl. 15:27:32, skrev David G. Johnston <david.g.johnston@gmail.com>:

On Wed, Jun 15, 2016 at 6:56 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
Hi.

First; Is this the correct forum to ask questions about the Postgres Pro's new RUM-index?

-general is fairly broad in scope and the authors do monitor here as far as I am aware. I'm not sure this is strictly related to that extension though, anyway.
If not, please point me to the right forum.

I'm trying to do this (done with GIN):
create extension if not exists btree_gin;
drop table if EXISTS delivery;

create table delivery(    id BIGSERIAL primary key,    fts_all TSVECTOR not null,    folder_id BIGINT NOT NULL,    sent TIMESTAMP not null,    message varchar not null
);


create index gin_idx on delivery using GIN(fts_all, folder_id);
In general I'm not sure what you are so focused on multi-column indexes through this email.
[snip]
What index? You don't have "sent" in the one CREATE INDEX statement you specified above.

So I wonder if RUM can do any better?
What I don't understand is how to have "folder_id" as part of the RUM-index so that I can search in an array of folders using the index, AND have the whole result sorted by "sent"-timestamp also using the RUM-index.

A bit out of my experience here...but see comment on multi-column indexes above.

In the (limited) documentation sorting using timestamp is done like this:

ORDER BY sent <-> '2000-01-01'::TIMESTAMP

which I don't understand; Why must one specify a value here, and how does that value affect the result?

You don't (I think...); <ORDER BY sent> is perfectly valid, it will use the "<=, and >=" operators in the btree operator family to perform the ordering...see comment regarding multi-column indexes. Even if you stick sent into such an index it would not (I guess it could...but your where clause doesn't filter on it so doing so seem counter-productive) be a leading column so it is unlikely that the index would be helpful in supplying tuples to the sort node in order - the sort would still have work to perform.

Maybe its simple ignorance but AFAIK the sorting node never consults an index to perform its work. Where indexes come into to play is the node under the sort is able to supply its tuples in sorted order then the sort node will have nothing to do.

https://www.postgresql.org/docs/9.6/static/indexes-ordering.html

"""
In addition to simply finding the rows to be returned by a query, an index may be able to deliver them in a specific sorted order. This allows a query's ORDER BY specification to be honored without a separate sorting step. Of the index types currently supported by PostgreSQL, only B-tree can produce sorted output — the other index types return matching rows in an unspecified, implementation-dependent order.
"""
HTH

David J.

I'm focused on using *one* index for this query for this to be as efficient as possible. I'm searching thru millions of records and the user expects "millisecond-response", we're building an web-based email-system and people expect search to be fast, as GMail and other services are. No index-type can do this yet but RUM is on its way and seems promising.

The reason I don't have "sent" in the (GIN) index is that I'm only using it for sorting and GIN cannot be used for sorting, which is why I'm looking into using the RUM-index; To hopefully get the best of GIN (able to index BIGINT-arrays, JSONB and tsvector) and be able to sort on timestamp (which RUM is able to do).

Based on your answer it doesn't seem you are very familiar with the RUM-extention, so much of your reply doesn't really seem relevant.

Andreas Joseph Krogh

CTO / Partner - Visena AS

Mobile: +47 909 56 963

andreas@visena.com

www.visena.com

Вложения

msg-1550248-3.png

Re: Question about RUM-index

От

Jeff Janes

Дата:

15 июня 2016 г., 22:50:57

On Wed, Jun 15, 2016 at 3:56 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:

Hi.

First; Is this the correct forum to ask questions about the Postgres Pro's new RUM-index?

If not, please point me to the right forum.

I think that https://github.com/postgrespro/rum/issues might be the best forum.

Note that GIN does almost what I want, except use the index when sorting by "sent"-timestamp.

So I wonder if RUM can do any better?
What I don't understand is how to have "folder_id" as part of the RUM-index so that I can search in an array of folders using the index, AND have the whole result sorted by "sent"-timestamp also using the RUM-index.

I think you would have to implement an operator for integers for RUM much like btree_gin does for GIN. Sorry don't know how to do that, except to say look in the RUM code to see how it does it for time-stamps.

In the (limited) documentation sorting using timestamp is done like this:

ORDER BY sent <-> '2000-01-01'::TIMESTAMP

which I don't understand; Why must one specify a value here, and how does that value affect the result?

This is essentially identical to ORDER BY ABS(sent - '2000-01-01'::TIMESTAMP); except it can use the index.

So maybe pick a constant outside the range of possible values, and use that as one argument to <->.

Cheers,

Jeff

Re: Question about RUM-index

От

Andreas Joseph Krogh

Дата:

17 июня 2016 г., 06:33:07

På torsdag 16. juni 2016 kl. 00:50:45, skrev Jeff Janes <jeff.janes@gmail.com>:

On Wed, Jun 15, 2016 at 3:56 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
Hi.

First; Is this the correct forum to ask questions about the Postgres Pro's new RUM-index?

If not, please point me to the right forum.

I think that https://github.com/postgrespro/rum/issues might be the best forum.

Oleg and friends; Should we use GitHub-issues as forum (one issue per question/thread?), pgsql-general or something else?

Note that GIN does almost what I want, except use the index when sorting by "sent"-timestamp.

So I wonder if RUM can do any better?
What I don't understand is how to have "folder_id" as part of the RUM-index so that I can search in an array of folders using the index, AND have the whole result sorted by "sent"-timestamp also using the RUM-index.

I think you would have to implement an operator for integers for RUM much like btree_gin does for GIN. Sorry don't know how to do that, except to say look in the RUM code to see how it does it for time-stamps.

In the (limited) documentation sorting using timestamp is done like this:

ORDER BY sent <-> '2000-01-01'::TIMESTAMP

which I don't understand; Why must one specify a value here, and how does that value affect the result?

This is essentially identical to ORDER BY ABS(sent - '2000-01-01'::TIMESTAMP); except it can use the index.

So maybe pick a constant outside the range of possible values, and use that as one argument to <->.

This should be unnecessary and hidden from the user. Maybe some "ORDER BY rum_timestamp(sent)" or something could abstract away stuff to make it much clearer to the user?

Andreas Joseph Krogh

CTO / Partner - Visena AS

Mobile: +47 909 56 963

andreas@visena.com

www.visena.com

Вложения

msg-1550251-3.png

Re: Question about RUM-index

От

Oleg Bartunov

Дата:

17 июня 2016 г., 11:10:58

On Fri, Jun 17, 2016 at 9:32 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:

På torsdag 16. juni 2016 kl. 00:50:45, skrev Jeff Janes <jeff.janes@gmail.com>:
On Wed, Jun 15, 2016 at 3:56 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
Hi.

First; Is this the correct forum to ask questions about the Postgres Pro's new RUM-index?

If not, please point me to the right forum.

I think that https://github.com/postgrespro/rum/issues might be the best forum.

Oleg and friends; Should we use GitHub-issues as forum (one issue per question/thread?), pgsql-general or something else?

Andreas,

we are hardly working on our internal version of rum and will open it after resolving some issues. I think the best place to discuss it is -hackers.

Note that GIN does almost what I want, except use the index when sorting by "sent"-timestamp.

So I wonder if RUM can do any better?
What I don't understand is how to have "folder_id" as part of the RUM-index so that I can search in an array of folders using the index, AND have the whole result sorted by "sent"-timestamp also using the RUM-index.

I think you would have to implement an operator for integers for RUM much like btree_gin does for GIN. Sorry don't know how to do that, except to say look in the RUM code to see how it does it for time-stamps.

In the (limited) documentation sorting using timestamp is done like this:

ORDER BY sent <-> '2000-01-01'::TIMESTAMP

which I don't understand; Why must one specify a value here, and how does that value affect the result?

This is essentially identical to ORDER BY ABS(sent - '2000-01-01'::TIMESTAMP); except it can use the index.

So maybe pick a constant outside the range of possible values, and use that as one argument to <->.

This should be unnecessary and hidden from the user. Maybe some "ORDER BY rum_timestamp(sent)" or something could abstract away stuff to make it much clearer to the user?
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com
www.visena.com

Вложения

msg-1550254-3.png

Re: Question about RUM-index

От

Oleg Bartunov

Дата:

17 июня 2016 г., 11:54:04

On Fri, Jun 17, 2016 at 2:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

On Fri, Jun 17, 2016 at 9:32 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På torsdag 16. juni 2016 kl. 00:50:45, skrev Jeff Janes <jeff.janes@gmail.com>:
On Wed, Jun 15, 2016 at 3:56 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
Hi.

First; Is this the correct forum to ask questions about the Postgres Pro's new RUM-index?

If not, please point me to the right forum.

I think that https://github.com/postgrespro/rum/issues might be the best forum.

Oleg and friends; Should we use GitHub-issues as forum (one issue per question/thread?), pgsql-general or something else?

Andreas,

we are hardly working on our internal version of rum and will open it after resolving some issues. I think the best place to discuss it is -hackers.

Ah, as someone corrected me, we are working hard !

Note that GIN does almost what I want, except use the index when sorting by "sent"-timestamp.

So I wonder if RUM can do any better?
What I don't understand is how to have "folder_id" as part of the RUM-index so that I can search in an array of folders using the index, AND have the whole result sorted by "sent"-timestamp also using the RUM-index.

I think you would have to implement an operator for integers for RUM much like btree_gin does for GIN. Sorry don't know how to do that, except to say look in the RUM code to see how it does it for time-stamps.

In the (limited) documentation sorting using timestamp is done like this:

ORDER BY sent <-> '2000-01-01'::TIMESTAMP

which I don't understand; Why must one specify a value here, and how does that value affect the result?

This is essentially identical to ORDER BY ABS(sent - '2000-01-01'::TIMESTAMP); except it can use the index.

So maybe pick a constant outside the range of possible values, and use that as one argument to <->.

This should be unnecessary and hidden from the user. Maybe some "ORDER BY rum_timestamp(sent)" or something could abstract away stuff to make it much clearer to the user?
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com
www.visena.com

Вложения

msg-1550257-3.png

Re: Question about RUM-index

От

Andreas Joseph Krogh

Дата:

17 июня 2016 г., 12:52:46

På fredag 17. juni 2016 kl. 13:53:34, skrev Oleg Bartunov <obartunov@gmail.com>:

On Fri, Jun 17, 2016 at 2:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

On Fri, Jun 17, 2016 at 9:32 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På torsdag 16. juni 2016 kl. 00:50:45, skrev Jeff Janes <jeff.janes@gmail.com>:
On Wed, Jun 15, 2016 at 3:56 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
Hi.

First; Is this the correct forum to ask questions about the Postgres Pro's new RUM-index?

If not, please point me to the right forum.

I think that https://github.com/postgrespro/rum/issues might be the best forum.

Oleg and friends; Should we use GitHub-issues as forum (one issue per question/thread?), pgsql-general or something else?

Andreas,

we are hardly working on our internal version of rum and will open it after resolving some issues. I think the best place to discuss it is -hackers.

Ah, as someone corrected me, we are working hard !

He he, I figured that was what you meant:-)

Ok, so basically - will RUM-index support the same indexing-properties as GIN (being able to index tsvector, BIGINT-arrays, JSONB etc.) and be able to use index for sorting on ie. timestamp, tsrank or some BIGINT-column?

Like my example, will it be possible to issue a query like this:

SELECT del.id                    , del.sent                FROM delivery del                WHERE 1 = 1
                      AND del.fts_all @@ to_tsquery('simple', 'hi:*')                      AND del.folder_id = ANY(ARRAY[2,3]::BIGINT[])
                ORDER BY  del.sent DESC LIMIT 101 OFFSET 0;

and have it use one RUM-index so the whole operation, including sorting, is as efficient as possible?

Thanks.

Andreas Joseph Krogh

CTO / Partner - Visena AS

Mobile: +47 909 56 963

andreas@visena.com

www.visena.com

Вложения

msg-1550263-3.png

Re: Question about RUM-index

От

Oleg Bartunov

Дата:

17 июня 2016 г., 13:47:34

On Fri, Jun 17, 2016 at 3:52 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:

På fredag 17. juni 2016 kl. 13:53:34, skrev Oleg Bartunov <obartunov@gmail.com>:

On Fri, Jun 17, 2016 at 2:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

On Fri, Jun 17, 2016 at 9:32 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På torsdag 16. juni 2016 kl. 00:50:45, skrev Jeff Janes <jeff.janes@gmail.com>:
On Wed, Jun 15, 2016 at 3:56 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
Hi.

First; Is this the correct forum to ask questions about the Postgres Pro's new RUM-index?

If not, please point me to the right forum.

I think that https://github.com/postgrespro/rum/issues might be the best forum.

Oleg and friends; Should we use GitHub-issues as forum (one issue per question/thread?), pgsql-general or something else?

Andreas,

we are hardly working on our internal version of rum and will open it after resolving some issues. I think the best place to discuss it is -hackers.

Ah, as someone corrected me, we are working hard !

He he, I figured that was what you meant:-)

Ok, so basically - will RUM-index support the same indexing-properties as GIN (being able to index tsvector, BIGINT-arrays, JSONB etc.) and be able to use index for sorting on ie. timestamp, tsrank or some BIGINT-column?

Like my example, will it be possible to issue a query like this:

SELECT del.id , del.sent FROM delivery del WHERE 1 = 1 AND del.fts_all @@ to_tsquery('simple', 'hi:*') AND del.folder_id = ANY(ARRAY[2,3]::BIGINT[]) ORDER BY del.sent DESC LIMIT 101 OFFSET 0;

and have it use one RUM-index so the whole operation, including sorting, is as efficient as possible?

we have implementation for timestamp. One need to write opclass to deal with arrays, it shouldn't be difficult.

Thanks.

--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com
www.visena.com

Вложения

msg-1550266-3.png

Re: Question about RUM-index

От

Andreas Joseph Krogh

Дата:

17 июня 2016 г., 13:51:02

På fredag 17. juni 2016 kl. 15:47:08, skrev Oleg Bartunov <obartunov@gmail.com>:

On Fri, Jun 17, 2016 at 3:52 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På fredag 17. juni 2016 kl. 13:53:34, skrev Oleg Bartunov <obartunov@gmail.com>:

On Fri, Jun 17, 2016 at 2:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

On Fri, Jun 17, 2016 at 9:32 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På torsdag 16. juni 2016 kl. 00:50:45, skrev Jeff Janes <jeff.janes@gmail.com>:
On Wed, Jun 15, 2016 at 3:56 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
Hi.

First; Is this the correct forum to ask questions about the Postgres Pro's new RUM-index?

If not, please point me to the right forum.

I think that https://github.com/postgrespro/rum/issues might be the best forum.

Oleg and friends; Should we use GitHub-issues as forum (one issue per question/thread?), pgsql-general or something else?

Andreas,

we are hardly working on our internal version of rum and will open it after resolving some issues. I think the best place to discuss it is -hackers.

Ah, as someone corrected me, we are working hard !

He he, I figured that was what you meant:-)

Ok, so basically - will RUM-index support the same indexing-properties as GIN (being able to index tsvector, BIGINT-arrays, JSONB etc.) and be able to use index for sorting on ie. timestamp, tsrank or some BIGINT-column?

Like my example, will it be possible to issue a query like this:
SELECT del.id                    , del.sent                FROM delivery del                WHERE 1 = 1
                      AND del.fts_all @@ to_tsquery('simple', 'hi:*')                      AND del.folder_id = ANY(ARRAY[2,3]::BIGINT[])
                ORDER BY  del.sent DESC LIMIT 101 OFFSET 0;
and have it use one RUM-index so the whole operation, including sorting, is as efficient as possible?
we have implementation for timestamp. One need to write opclass to deal with arrays, it shouldn't be difficult.

Will the opclass dealing with bigint-arrays also handle the single-element case, that is only one bigint?

Andreas Joseph Krogh

CTO / Partner - Visena AS

Mobile: +47 909 56 963

andreas@visena.com

www.visena.com

Вложения

msg-1550269-3.png

Re: Question about RUM-index

От

Oleg Bartunov

Дата:

17 июня 2016 г., 14:07:58

On Fri, Jun 17, 2016 at 4:50 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:

På fredag 17. juni 2016 kl. 15:47:08, skrev Oleg Bartunov <obartunov@gmail.com>:
On Fri, Jun 17, 2016 at 3:52 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På fredag 17. juni 2016 kl. 13:53:34, skrev Oleg Bartunov <obartunov@gmail.com>:

On Fri, Jun 17, 2016 at 2:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

On Fri, Jun 17, 2016 at 9:32 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
På torsdag 16. juni 2016 kl. 00:50:45, skrev Jeff Janes <jeff.janes@gmail.com>:
On Wed, Jun 15, 2016 at 3:56 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
Hi.

First; Is this the correct forum to ask questions about the Postgres Pro's new RUM-index?

If not, please point me to the right forum.

I think that https://github.com/postgrespro/rum/issues might be the best forum.

Oleg and friends; Should we use GitHub-issues as forum (one issue per question/thread?), pgsql-general or something else?

Andreas,

we are hardly working on our internal version of rum and will open it after resolving some issues. I think the best place to discuss it is -hackers.

Ah, as someone corrected me, we are working hard !

He he, I figured that was what you meant:-)

Ok, so basically - will RUM-index support the same indexing-properties as GIN (being able to index tsvector, BIGINT-arrays, JSONB etc.) and be able to use index for sorting on ie. timestamp, tsrank or some BIGINT-column?

Like my example, will it be possible to issue a query like this:
SELECT del.id                    , del.sent                FROM delivery del                WHERE 1 = 1
                      AND del.fts_all @@ to_tsquery('simple', 'hi:*')                      AND del.folder_id = ANY(ARRAY[2,3]::BIGINT[])
                ORDER BY  del.sent DESC LIMIT 101 OFFSET 0;
and have it use one RUM-index so the whole operation, including sorting, is as efficient as possible?
we have implementation for timestamp. One need to write opclass to deal with arrays, it shouldn't be difficult.
Will the opclass dealing with bigint-arrays also handle the single-element case, that is only one bigint?

this is up to opclass author.

--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com
www.visena.com

Вложения

msg-1550272-3.png

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Question about RUM-index

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения