Обсуждение: Last event per user
Hey guys, So I have two tables: users and events. It is very common for my application to request the last user event. Usually, what I'll do is get the user, and then SELECT * from events WHERE user_id = :user order by timestamp_inc desc LIMIT 1. I have a big problem, however: My app uses a ORM for SQL execution and generation and it cant create subselects at all. The Ideal solution for me would be a view which has all the users last events. I tried: creating a view (last_user_event_1) on "SELECT DISTINCT ON (user_id) * FROM events ORDER BY user_id, timestamp_inc DESC" and another one (last_user_event_2) which is a view on users with a lateral join on the last event. Running the query with lateral join by itself is very fast, and exactly what I need. It usually runs < 1ms. The one with "distinct on (user_id)" takes around 20ms to complete which is just too slow for my needs. My problem is that when I run a query JOINing users with last_user_event_2, it takes about 2 seconds: This is the explain output from joining users with "last_user_event_2": https://explain.depesz.com/s/oyEp And this is with "last_user_event_1": https://explain.depesz.com/s/hWwF Any help would be greatly appreciated.
The obfuscation makes it difficult to guess at the query you are writing and the schema you are using. Can you provide any additional information without revealing sensitive info?
1) Do you have an index on ( user_id ASC, timestamp_inc DESC ) ?2) Sub-queries can't be re-written inline by the optimizer when there is an aggregate inside the subquery, and I think DISTINCT ON would behave the same. So, that might explain the significant change in behavior when the lateral is used. I am guessing at how you wrote the two versions of the view though.Obviously not best design, but you could insert events as "is_latest" and update any prior events for that user via trigger as is_latest = false.
Thanks for the reply!
the schema is basically this (simplified):
table users (user_id,user_group,user_name)
table events (user_id,user_group,event_id,timestamp_inc,event_description)
Views:
"last_user_event_2"
SELECT e.*
FROM users u
JOIN LATERAL (SELECT *
FROM events
WHERE user_id = u.user_id
AND user_group = u.user_group
ORDER BY timestamp_inc DESC
LIMIT 1 ) e ON TRUE
"last_user_event_1"
SELECT DISTINCT ON (user_id)
*
FROM events
ORDER BY user_id, timestamp_inc DESC
The query itself is:
SELECT *
FROM users u
JOIN last_user_event_(1|2) e USING (user_id,user_group)
This explain plan: https://explain.depesz.com/s/oyEp is what Postgres uses with "last_user_event_2" and https://explain.depesz.com/s/hWwF, "last_user_event_1"
I do have a btree index on user_id,user_group,timestamp_inc DESC.
			
		
It seems like it should be-
SELECT * FROM users u JOIN last_user_event_1 e USING (user_id,user_group);--OR--SELECT * FROM last_user_event_2 e;for them to produce the same result set, since the last_user_event_2 already (could) have users info in it very simply by select * instead of e.* in that view definition.Are there other important joins/where/order by/limits that would be on this "main query" that is just SELECT * FROM ____ right now which you have dropped to try to simplify the example?
You're right about the queries, I made a mistake.
Yes, I'm going to filter them by user_id and user_group, possibly (but not likely) using LIMIT 1. In the explain examples I am using user_id = 1272897 and user_group = 19117
If you modify last_user_event_2 to select user and event info in the view, and just put there where clause directly on the view which is not joined to anything, instead of on the "extra copy" of the users table like you were showing previously, I would expect that the performance should be excellent.
But I need user_id and user_group to be outside of the view definition. user_id and user_group are dynamic values, as in, I need to call this query multiple times for different user_ids and user_groups .
If you modify last_user_event_2 to select user and event info in the view, and just put there where clause directly on the view which is not joined to anything, instead of on the "extra copy" of the users table like you were showing previously, I would expect that the performance should be excellent.
But I need user_id and user_group to be outside of the view definition. user_id and user_group are dynamic values, as in, I need to call this query multiple times for different user_ids and user_groups .
"last_user_event_2"
SELECT u.*, e.*
FROM users u
JOIN LATERAL (SELECT *
FROM events
WHERE user_id = u.user_id
AND user_group = u.user_group
ORDER BY timestamp_inc DESC
LIMIT 1 ) e ON TRUE
And you execute a query like this-
SELECT * FROM last_user_event_2 e WHERE  user_id = 1272897 and user_group = 19117;
On Mon, Aug 12, 2019 at 5:03 PM Luís Roberto Weck <luisroberto@siscobra.com.br> wrote:If you modify last_user_event_2 to select user and event info in the view, and just put there where clause directly on the view which is not joined to anything, instead of on the "extra copy" of the users table like you were showing previously, I would expect that the performance should be excellent.
But I need user_id and user_group to be outside of the view definition. user_id and user_group are dynamic values, as in, I need to call this query multiple times for different user_ids and user_groups .
I don't follow. Perhaps there is something within the limitations of the ORM layer that I am not expecting. If you have this view-"last_user_event_2"
SELECT u.*, e.*
FROM users u
JOIN LATERAL (SELECT *
FROM events
WHERE user_id = u.user_id
AND user_group = u.user_group
ORDER BY timestamp_inc DESC
LIMIT 1 ) e ON TRUE
And you execute a query like this-
SELECT * FROM last_user_event_2 e WHERE user_id = 1272897 and user_group = 19117;Then I would expect very good performance.
You're right, thanks! I just had to do a little adjustment on the lateral join. Since both users and events have user_id and user_group, PostgreSQL complains that I can't have more than one column with the same name. I fixed it by changing the LATERAL condition from "ON TRUE" to "USING (user_id,user_group)" (which I didn't even knew I could do).