Re: newsfeed type query

Поиск
Список
Период
Сортировка
От Jonathan Vanasco
Тема Re: newsfeed type query
Дата
Msg-id 2E228BD0-0EE0-46AD-BFBF-14E5B630A169@2xlp.com
обсуждение исходный текст
Ответ на Re: newsfeed type query  (Ladislav Lenart <lenartlad@volny.cz>)
Ответы Re: newsfeed type query  (Ladislav Lenart <lenartlad@volny.cz>)
Re: newsfeed type query  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Список pgsql-general
On Apr 29, 2015, at 12:25 PM, Ladislav Lenart wrote:

> Could you please explain to me the error(s) in my reasoning?

Let me just flip your list in reverse... and add in some elements (marked with a *):

posting  ts  context
p60      60  friend
p55      55 friend*
p54      54 friend*
p50      50  group
p50      49  group*
p50      49  group*
p40      40  friend
p30      30  group
p20      20  friend
p10      10  group
p0        0  friend

With the 2 limited subqueries, the results would be:
    60F, 55F, 50G, 49G

But the "most recent" data is
    50F, 55F, 54F, 50G

So we end up showing 49 which is less relevant than 54.

In some situations this isn't much of an issue, but in others it is detrimental.
For example, one of my "feeds" contains a distribution of events according-to-type that is very uneven.  While "friend"
and"group" might be relatively close in time to one another, "system" or other events may be months old -- and that
oldercontent gets pulled in with this style of query.   

If you need to paginate the data and select the next 10 overall items, it gets even more complicated.

IIRC, the best mix of performance and "product" that I've found is do something like this:

    SELECT * FROM (
        SELECT a,b,c FROM table_a ORDER BY TIMESTAMP DESC LIMIT 10000;
        UNION
        SELECT a,b,c FROM table_b ORDER BY TIMESTAMP DESC LIMIT 10000;
    ) as unioned
     order by unioned TIMESTAMP DESC LIMIT 100 OFFSET 0;

by creating an artificial limit on the inner queries, you can save postgres from doing a lot of intensive i/o work and
memoryusage (like a lot) 
then, joining a few lists and sorting 20k (or even 100k) items is really cheap.
the downside is that you effectively limit the 'relevancy' of the query to whatever the inner limit is (ie, 10000 --
notthe combined total of 20000), but that number can be arbitrarily high enough that it is irrelevant while still
showingthe right amount of content for people. 







В списке pgsql-general по дате отправления:

Предыдущее
От: Ladislav Lenart
Дата:
Сообщение: Re: newsfeed type query
Следующее
От: John McKown
Дата:
Сообщение: New column modifier?