Re: Efficiently searching for the most recent rows where a columnmatches any result from a different query

Поиск
Список
Период
Сортировка
От Nandakumar M
Тема Re: Efficiently searching for the most recent rows where a columnmatches any result from a different query
Дата
Msg-id CANcFUu4GJ3eOJCDeL6yna2ZToismk6zi+XdcvHXMmKFATe=_SA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Efficiently searching for the most recent rows where a columnmatches any result from a different query  (<mkslaf@keemail.me>)
Ответы Re: Efficiently searching for the most recent rows where a columnmatches any result from a different query  (Nandakumar M <m.nanda92@gmail.com>)
Список pgsql-performance
Hi,

I myself am new to performance tuning queries. But, from what you have
said it looks like Postgres has to go through all the posts using the
backward index scan and find out whether their author is amongst the
user's friends list.

Since the number of friends is arbitrary for any user, even if a user
has few friends (or no friends at all), the stats will not reflect
this and so the planner cannot take advantage of this to directly
fetch the posts from the small set of friends.

My suggestion (which involves changing the schema and query) is to
have a last_post_id or last_posted_time column in user table, find the
last 10 friends who have posted first and then use it to find the last
10 posts. Something like,

select * from posts where posts.author_id in (select id from users
where id in (select friend_id from user_friend where user_id = 1) and
last_posted_time is not null order by last_posted_time desc limit 10);

I am not sure if this is the best way to solve this. If there are
better solutions I would be happy to learn the same.

Regards
Nanda

On Thu, Feb 15, 2018 at 5:48 PM,  <mkslaf@keemail.me> wrote:
>
> Hello Hellmuth,
>
> Thank you for your response.
>
> I've uploaded the query plan for the first query (user_id=2) here:
> https://gist.github.com/anonymous/6d251b277ef71f8977b03cab91fedccd
> The query plan for the second query (user_id=1) can be found here:
> https://gist.github.com/anonymous/32ed485b40cce2651ddc52661f3e7f7b
>
> Just like in the original queries, posts_user_id_id_index is not used.
>
> Kind regards,
> Milo
>
> 13. Feb 2018 22:13 by hivs77@gmail.com:
>
> Hello:
>
>
> EXPLAIN (ANALYZE, BUFFERS)
> select * from (
> SELECT posts.id, users.name, posts.content
> FROM posts JOIN users ON posts.user_id = users.id
> WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id =
> 1)
>
> ORDER BY posts.id DESC
> ) as a
> ORDER BY a.id DESC
> LIMIT 10;
>
> ------
>
>
> EXPLAIN (ANALYZE, BUFFERS)
> select * from (
> SELECT posts.id, users.name, posts.content
> FROM posts JOIN users ON posts.user_id = users.id
> WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id =
> 2)
>
> ORDER BY posts.id DESC
> ) as a
> ORDER BY a.id DESC
> LIMIT 10;
>
> 2018-02-13 8:28 GMT-05:00 <mkslaf@keemail.me>:
>>
>> Hello,
>>
>> I have the following schema:
>>
>>     CREATE TABLE users (
>>         id   BIGSERIAL PRIMARY KEY,
>>         name TEXT      NOT NULL UNIQUE
>>     );
>>
>>     CREATE TABLE friends (
>>         user_id        BIGINT NOT NULL REFERENCES users,
>>         friend_user_id BIGINT NOT NULL REFERENCES users,
>>         UNIQUE (user_id, friend_user_id)
>>     );
>>
>>     CREATE TABLE posts (
>>         id      BIGSERIAL PRIMARY KEY,
>>         user_id BIGINT    NOT NULL REFERENCES users,
>>         content TEXT      NOT NULL
>>     );
>>     CREATE INDEX posts_user_id_id_index ON posts(user_id, id);
>>
>> Each user can unilaterally follow any number of friends. The posts table
>> has a large number of rows and is rapidly growing.
>>
>> My goal is to retrieve the 10 most recent posts of a user's friends. This
>> query gives the correct result, but is inefficient:
>>
>>     SELECT posts.id, users.name, posts.content
>>     FROM posts JOIN users ON posts.user_id = users.id
>>     WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE
>> user_id = 1)
>>     ORDER BY posts.id DESC LIMIT 10;
>>
>> If the user's friends have recently posted, the query is still reasonably
>> fast (https://explain.depesz.com/s/6ykR). But if the user's friends haven't
>> recently posted or the user has no friends, it quickly deteriorates
>> (https://explain.depesz.com/s/OnoG).
>>
>> If I match only a single post author (e.g. WHERE posts.user_id = 5),
>> Postgres uses the index posts_user_id_id_index. But if I use IN, the index
>> doesn't appear to be used at all.
>>
>> How can I get these results more efficiently?
>>
>> I've uploaded the schema and the queries I've tried to dbfiddle at
>> http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cf1489b7f6d53c3fe0b55ed7ccbad1f0.
>> The output of "SELECT version()" is "PostgreSQL 9.6.5 on
>> x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit" for
>> me.
>>
>> Thank you in advance for any insights, pointers or suggestions you are
>> able to give me.
>>
>> Regards,
>> Milo
>
>
>
>
> --
> Cordialmente,
>
> Ing. Hellmuth I. Vargas S.
> Esp. Telemática y Negocios por Internet
> Oracle Database 10g Administrator Certified Associate
> EnterpriseDB Certified PostgreSQL 9.3 Associate
>


В списке pgsql-performance по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: pgpool 2 rotate logs
Следующее
От: Nandakumar M
Дата:
Сообщение: Re: Efficiently searching for the most recent rows where a columnmatches any result from a different query