Обсуждение: Last event per user

Поиск
Список
Период
Сортировка

Last event per user

От
Luís Roberto Weck
Дата:
Hey guys,

So I have two tables: users and events. It is very common for my 
application to request the last user event.

Usually, what I'll do is get the user, and then SELECT * from events 
WHERE user_id = :user order by timestamp_inc desc LIMIT 1.

I have a big problem, however:

My app uses a ORM for SQL execution and generation and it cant create 
subselects at all. The Ideal solution for me would be a view which has 
all the users last events.

I tried:

creating a view (last_user_event_1) on "SELECT DISTINCT ON (user_id) * 
FROM events ORDER BY user_id, timestamp_inc DESC" and another one 
(last_user_event_2) which is a view on users with a lateral join on the 
last event.

Running the query with lateral join by itself is very fast, and exactly 
what I need. It usually runs < 1ms. The one with "distinct on (user_id)" 
takes around 20ms to complete which is just too slow for my needs.

My problem is that when I run a query JOINing users with 
last_user_event_2, it takes about 2 seconds:

This is the explain output from joining users with "last_user_event_2":

https://explain.depesz.com/s/oyEp

And this is with "last_user_event_1":

https://explain.depesz.com/s/hWwF

Any help would be greatly appreciated.



Re: Last event per user

От
Michael Lewis
Дата:
The obfuscation makes it difficult to guess at the query you are writing and the schema you are using. Can you provide any additional information without revealing sensitive info?

1) Do you have an index on ( user_id ASC, timestamp_inc DESC ) ?
2) Sub-queries can't be re-written inline by the optimizer when there is an aggregate inside the subquery, and I think DISTINCT ON would behave the same. So, that might explain the significant change in behavior when the lateral is used. I am guessing at how you wrote the two versions of the view though.

Obviously not best design, but you could insert events as "is_latest" and update any prior events for that user via trigger as is_latest = false.

Re: Last event per user

От
Luís Roberto Weck
Дата:

The obfuscation makes it difficult to guess at the query you are writing and the schema you are using. Can you provide any additional information without revealing sensitive info?

 
1) Do you have an index on ( user_id ASC, timestamp_inc DESC ) ?
2) Sub-queries can't be re-written inline by the optimizer when there is an aggregate inside the subquery, and I think DISTINCT ON would behave the same. So, that might explain the significant change in behavior when the lateral is used. I am guessing at how you wrote the two versions of the view though.
 
Obviously not best design, but you could insert events as "is_latest" and update any prior events for that user via trigger as is_latest = false.

Thanks for the reply!

the schema is basically this (simplified):

table users (user_id,user_group,user_name)

table events (user_id,user_group,event_id,timestamp_inc,event_description)

Views:

"last_user_event_2"

SELECT e.*

   FROM users u

        JOIN LATERAL (SELECT *

                        FROM events

                       WHERE user_id = u.user_id

                         AND user_group = u.user_group 

                       ORDER BY timestamp_inc DESC

                       LIMIT 1 ) e ON TRUE


"last_user_event_1"


SELECT DISTINCT ON (user_id)

       *

  FROM events

 ORDER BY user_id, timestamp_inc DESC

 

The query itself  is:


SELECT *

  FROM users u

       JOIN last_user_event_(1|2) e USING (user_id,user_group)



This explain plan: https://explain.depesz.com/s/oyEp is what Postgres uses with "last_user_event_2" and https://explain.depesz.com/s/hWwF, "last_user_event_1"


I do have a btree index on user_id,user_group,timestamp_inc DESC.




Re: Last event per user

От
Michael Lewis
Дата:
It seems like it should be-
SELECT * FROM users u JOIN last_user_event_1 e USING (user_id,user_group);
--OR--
SELECT * FROM last_user_event_2 e;

for them to produce the same result set, since the last_user_event_2 already (could) have users info in it very simply by select * instead of e.* in that view definition.

Are there other important joins/where/order by/limits that would be on this "main query" that is just SELECT * FROM ____ right now which you have dropped to try to simplify the example?

Re:

От
Luís Roberto Weck
Дата:

It seems like it should be-

SELECT * FROM users u JOIN last_user_event_1 e USING (user_id,user_group);
--OR--
SELECT * FROM last_user_event_2 e;
 
for them to produce the same result set, since the last_user_event_2 already (could) have users info in it very simply by select * instead of e.* in that view definition.
 
Are there other important joins/where/order by/limits that would be on this "main query" that is just SELECT * FROM ____ right now which you have dropped to try to simplify the example?

You're right about the queries, I made a mistake.

Yes, I'm going to filter them by user_id and user_group, possibly (but not likely) using LIMIT 1. In the explain examples I am using user_id = 1272897 and user_group = 19117

Re:

От
Michael Lewis
Дата:
If you modify last_user_event_2 to select user and event info in the view, and just put there where clause directly on the view which is not joined to anything, instead of on the "extra copy" of the users table like you were showing previously, I would expect that the performance should be excellent.

Re: Last event per user

От
Luís Roberto Weck
Дата:

If you modify last_user_event_2 to select user and event info in the view, and just put there where clause directly on the view which is not joined to anything, instead of on the "extra copy" of the users table like you were showing previously, I would expect that the performance should be excellent.

But I need user_id and user_group to be outside of the view definition. user_id and user_group are dynamic values, as in, I need to call this query multiple times for different user_ids and user_groups .

Re: Last event per user

От
Michael Lewis
Дата:
On Mon, Aug 12, 2019 at 5:03 PM Luís Roberto Weck <luisroberto@siscobra.com.br> wrote:

If you modify last_user_event_2 to select user and event info in the view, and just put there where clause directly on the view which is not joined to anything, instead of on the "extra copy" of the users table like you were showing previously, I would expect that the performance should be excellent.

But I need user_id and user_group to be outside of the view definition. user_id and user_group are dynamic values, as in, I need to call this query multiple times for different user_ids and user_groups .


I don't follow. Perhaps there is something within the limitations of the ORM layer that I am not expecting. If you have this view-

"last_user_event_2"

SELECT u.*, e.*

   FROM users u

        JOIN LATERAL (SELECT *

                        FROM events

                       WHERE user_id = u.user_id

                         AND user_group = u.user_group 

                       ORDER BY timestamp_inc DESC

                       LIMIT 1 ) e ON TRUE


And you execute a query like this-
SELECT * FROM last_user_event_2 e WHERE  user_id = 1272897 and user_group = 19117;


Then I would expect very good performance.

Re: Last event per user

От
Luís Roberto Weck
Дата:
On Mon, Aug 12, 2019 at 5:03 PM Luís Roberto Weck <luisroberto@siscobra.com.br> wrote:

If you modify last_user_event_2 to select user and event info in the view, and just put there where clause directly on the view which is not joined to anything, instead of on the "extra copy" of the users table like you were showing previously, I would expect that the performance should be excellent.

But I need user_id and user_group to be outside of the view definition. user_id and user_group are dynamic values, as in, I need to call this query multiple times for different user_ids and user_groups .


I don't follow. Perhaps there is something within the limitations of the ORM layer that I am not expecting. If you have this view-

"last_user_event_2"

SELECT u.*, e.*

   FROM users u

        JOIN LATERAL (SELECT *

                        FROM events

                       WHERE user_id = u.user_id

                         AND user_group = u.user_group 

                       ORDER BY timestamp_inc DESC

                       LIMIT 1 ) e ON TRUE


And you execute a query like this-
SELECT * FROM last_user_event_2 e WHERE  user_id = 1272897 and user_group = 19117;


Then I would expect very good performance.

You're right, thanks! I just had to do a little adjustment on the lateral join. Since both users and events have user_id and user_group, PostgreSQL complains that I can't have more than one column with the same name. I fixed it by changing the LATERAL condition from "ON TRUE" to "USING (user_id,user_group)" (which I didn't even knew I could do).