Обсуждение: Efficiently query for the most recent record for a given user

Поиск

Список

Период

Сортировка

Efficiently query for the most recent record for a given user

От

Robert DiFalco

Дата:

07 августа 2013 г., 21:12:57

Let's say I have a table something like this:

create table call_activity (

id int8 not null,

called timestamp,

user_id int8 not null,

primary key (id)

foreign key (user_id) references my_users

)

I want to get the last call_activity record for a single user. Is there ANY way to efficiently retrieve the last record for a specified user_id, or do I need to de-normalize and update a table with a single row for each user each time a new call_activity record is inserted? I know I how to do the query without the summary table (subquery or GROUP BY with MAX) but that seems like it will never perform well for large data sets. Or am I full of beans and it should perform just fine for a huge data set as long as I have an index on "called"?

Thanks in advance!

Re: Efficiently query for the most recent record for a given user

От

Claudio Freire

Дата:

07 августа 2013 г., 21:19:38

On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:
> Let's say I have a table something like this:
>
>    create table call_activity (
>         id int8 not null,
>         called timestamp,
>         user_id int8 not null,
>         primary key (id)
>         foreign key (user_id) references my_users
>     )
>
>
> I want to get the last call_activity record for a single user.  Is there ANY
> way to efficiently retrieve the last record for a specified user_id, or do I
> need to de-normalize and update a table with a single row for each user each
> time a new call_activity record is inserted? I know I how to do the query
> without the summary table (subquery or GROUP BY with MAX) but that seems
> like it will never perform well for large data sets. Or am I full of beans
> and it should perform just fine for a huge data set as long as I have an
> index on "called"?


Create an index over (user_id, called desc), and do

select * from call_activity where user_id = blarg order by called desc limit 1

Re: Efficiently query for the most recent record for a given user

От

Tom Lane

Дата:

07 августа 2013 г., 21:34:21

Claudio Freire <klaussfreire@gmail.com> writes:
> On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:
>> I want to get the last call_activity record for a single user.

> Create an index over (user_id, called desc), and do
> select * from call_activity where user_id = blarg order by called desc limit 1

Note that there's no particular need to specify "desc" in the index
definition.  This same index can support searches in either direction
on the "called" column.

            regards, tom lane

Re: Efficiently query for the most recent record for a given user

От

Igor Neyman

Дата:

07 августа 2013 г., 21:35:13

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
> performance-owner@postgresql.org] On Behalf Of Claudio Freire
> Sent: Wednesday, August 07, 2013 2:20 PM
> To: Robert DiFalco
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Efficiently query for the most recent record for a
> given user
>
> On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco <robert.difalco@gmail.com>
> wrote:
> > Let's say I have a table something like this:
> >
> >    create table call_activity (
> >         id int8 not null,
> >         called timestamp,
> >         user_id int8 not null,
> >         primary key (id)
> >         foreign key (user_id) references my_users
> >     )
> >
> >
> > I want to get the last call_activity record for a single user.  Is
> > there ANY way to efficiently retrieve the last record for a specified
> > user_id, or do I need to de-normalize and update a table with a single
> > row for each user each time a new call_activity record is inserted? I
> > know I how to do the query without the summary table (subquery or
> > GROUP BY with MAX) but that seems like it will never perform well for
> > large data sets. Or am I full of beans and it should perform just fine
> > for a huge data set as long as I have an index on "called"?
>
>
> Create an index over (user_id, called desc), and do
>
> select * from call_activity where user_id = blarg order by called desc limit 1
>

And most recent call for every user:

SELECT id, user_id, MAX(called) OVER (PARTITION BY user_id) FROM call_activity;

Regards,
Igor Neyman

Re: Efficiently query for the most recent record for a given user

От

Robert DiFalco

Дата:

07 августа 2013 г., 21:39:12

Thanks guys!

On Wed, Aug 7, 2013 at 11:35 AM, Igor Neyman <ineyman@perceptron.com> wrote:

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
> performance-owner@postgresql.org] On Behalf Of Claudio Freire
> Sent: Wednesday, August 07, 2013 2:20 PM
> To: Robert DiFalco
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Efficiently query for the most recent record for a
> given user
>
> On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco <robert.difalco@gmail.com>
> wrote:
> > Let's say I have a table something like this:
> >
> > create table call_activity (
> > id int8 not null,
> > called timestamp,
> > user_id int8 not null,
> > primary key (id)
> > foreign key (user_id) references my_users
> > )
> >
> >
> > I want to get the last call_activity record for a single user. Is
> > there ANY way to efficiently retrieve the last record for a specified
> > user_id, or do I need to de-normalize and update a table with a single
> > row for each user each time a new call_activity record is inserted? I
> > know I how to do the query without the summary table (subquery or
> > GROUP BY with MAX) but that seems like it will never perform well for
> > large data sets. Or am I full of beans and it should perform just fine
> > for a huge data set as long as I have an index on "called"?
>
>
> Create an index over (user_id, called desc), and do
>
> select * from call_activity where user_id = blarg order by called desc limit 1
>

And most recent call for every user:

SELECT id, user_id, MAX(called) OVER (PARTITION BY user_id) FROM call_activity;

Regards,
Igor Neyman

Re: Efficiently query for the most recent record for a given user

От

Claudio Freire

Дата:

07 августа 2013 г., 21:39:52

On Wed, Aug 7, 2013 at 3:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>> On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco <robert.difalco@gmail.com> wrote:
>>> I want to get the last call_activity record for a single user.
>
>> Create an index over (user_id, called desc), and do
>> select * from call_activity where user_id = blarg order by called desc limit 1
>
> Note that there's no particular need to specify "desc" in the index
> definition.  This same index can support searches in either direction
> on the "called" column.

Yeah, but it's faster if it's in the same direction, because the
kernel read-ahead code detects sequential reads, whereas it doesn't
when it goes backwards. The difference can be up to a factor of 10 for
long index scans.

Though... true... for a limit 1... it wouldn't matter that much. But
it's become habit to match index sort order by now.

Re: Efficiently query for the most recent record for a given user

От

Tom Lane

Дата:

07 августа 2013 г., 22:04:14

Claudio Freire <klaussfreire@gmail.com> writes:
> On Wed, Aug 7, 2013 at 3:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Note that there's no particular need to specify "desc" in the index
>> definition.  This same index can support searches in either direction
>> on the "called" column.

> Yeah, but it's faster if it's in the same direction, because the
> kernel read-ahead code detects sequential reads, whereas it doesn't
> when it goes backwards. The difference can be up to a factor of 10 for
> long index scans.

Color me skeptical.  Index searches are seldom purely sequential block
accesses.  Maybe if you had a freshly built index that'd never yet
suffered any inserts/updates, but in practice any advantage would
disappear very quickly after a few index page splits.

> Though... true... for a limit 1... it wouldn't matter that much.

That's the other point.

            regards, tom lane

Re: Efficiently query for the most recent record for a given user

От

Alvaro Herrera

Дата:

07 августа 2013 г., 22:05:59

Claudio Freire escribió:
> On Wed, Aug 7, 2013 at 3:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> > Note that there's no particular need to specify "desc" in the index
> > definition.  This same index can support searches in either direction
> > on the "called" column.
>
> Yeah, but it's faster if it's in the same direction, because the
> kernel read-ahead code detects sequential reads, whereas it doesn't
> when it goes backwards. The difference can be up to a factor of 10 for
> long index scans.

That might be true when an index is new, but as it grows, the leaf pages
are not going to be sequential anymore.  And this doesn't much apply for
an equality lookup anyway, does it?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Efficiently query for the most recent record for a given user

От

Claudio Freire

Дата:

07 августа 2013 г., 22:13:52

On Wed, Aug 7, 2013 at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, but it's faster if it's in the same direction, because the
>> kernel read-ahead code detects sequential reads, whereas it doesn't
>> when it goes backwards. The difference can be up to a factor of 10 for
>> long index scans.
>
> Color me skeptical.  Index searches are seldom purely sequential block
> accesses.  Maybe if you had a freshly built index that'd never yet
> suffered any inserts/updates, but in practice any advantage would
> disappear very quickly after a few index page splits.

Maybe.

I've tested on pgbench test databases, which I'm not sure whether
they're freshly built indexes or incrementally built ones, and it
applies there (in fact backward index-only scans was one of the
workloads the read-ahead patch improved the most).

Re: Efficiently query for the most recent record for a given user

От

Claudio Freire

Дата:

08 августа 2013 г., 23:09:01

On Thu, Aug 8, 2013 at 5:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Wed, Aug 7, 2013 at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Yeah, but it's faster if it's in the same direction, because the
>>>> kernel read-ahead code detects sequential reads, whereas it doesn't
>>>> when it goes backwards. The difference can be up to a factor of 10 for
>>>> long index scans.
>>>
>>> Color me skeptical.  Index searches are seldom purely sequential block
>>> accesses.  Maybe if you had a freshly built index that'd never yet
>>> suffered any inserts/updates, but in practice any advantage would
>>> disappear very quickly after a few index page splits.
>>
>> Maybe.
>>
>> I've tested on pgbench test databases, which I'm not sure whether
>> they're freshly built indexes or incrementally built ones, and it
>> applies there (in fact backward index-only scans was one of the
>> workloads the read-ahead patch improved the most).
>
> It's been a while, but when I was touching the btree code for the
> SSI implementation I thought I saw something about a reverse scan
> needing to visit the parent page in cases where a forward scan
> doesn't, due to the locking techniques used in btree.  I don't know
> how significant those extra trips up and down the tree are, but
> they must cost *something*.

From my benchmarks at the time (with pgbench), they seldom ever
happen, so even if they cost a lot, they don't add up to much.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Efficiently query for the most recent record for a given user